Trends in French sentence length?

« previous post | next post »

"Memoirs of a Woman of Long Sentences" (5/21/2022) reproduced a plot from my 5/20/2022 talk at SHEL 12:

In the talk's slides, I used that plot (without the outlier-marking arrow) as a way of  illustrating the obvious point that "Older texts in English tend to have longer sentences".

And in my final slide, I suggested that "French seems different". That (imprudent) suggestion was based on my subjective impression of a few 18th-century works, where it seemed to me that sentence (and especially paragraph) lengths were much shorter in French-language works than in English-language ones from the same period.

In the presentation, I  made it clear that this idea was tentative at best, without real empirical support. So over the past few days, I've spent my recreational moments tracking down and cleaning up the texts of 36 French novels (and a couple of novel-ish memoirs) published between 1581 and 1942. I also re-hacked some antique sentence-divided code, originally written for English, to deal with French punctuation, abbreviations, and so on.

The result indicates (or at least hints) that I was both right and wrong.

It seems to be true that in the 17th and 18th centuries, sentences indeed tended to be shorter in French texts than in English ones — though the amount of data so far compared is small. On the other hand, it's pretty clear that French as well as English shows an overall secular trend towards shorter sentences. Mostly, it's clear that we need more data.

Note: It probably won't surprise you that the blue x outlier (mean sentence length of 37.9 words, published in 1919) is Proust's À l'ombre des jeunes filles en fleurs.

 



7 Comments

  1. Chris Button said,

    May 26, 2022 @ 9:18 pm

    I’d like to see some comparisons with more modern French. My hunch is that contemporary French style favors much longer sentences than contemporary English style. I would have put French more like Japanese for example.

  2. AntC said,

    May 27, 2022 @ 5:23 am

    Thank you Mark. Are you running this analysis whilst at a conference, and getting interrupted by fire alarms?

    Again without real empirical support, I was under the impression French needed ~20% more words to express the same idea as an English text. (Something to do with French overall having a smaller vocabulary/each word has a wider range of meanings, so needing more words to get the specific meaning of an English text.) Is there any truth to that or is it just chauvinism?

    If true, does it get expressed as more words per sentence; or more sentences to express the same ideas? And do your plots suggest that whether the conjecture holds for translation, it doesn't hold for texts originating from French speakers(?) Perhaps they don't worry about being so precise? (Proust being the exception.)

    An hundred years should go to praise
    Thine eyes and on thy forehead gaze;

    [(myl) A good point — counting "words" across languages, whether types or tokens, is notoriously problematic. And to the extent that there are systematic differences in the number of (orthographically defined) "words" used to express the same ideas in French and English, the possible differences suggested by my little graph might be magnified or reduced, by a factor that depends on linguistic/orthographic differences rather than stylistic differences.

    For some general discussion of text length comparison across languages, see "Information content of text in English and Chinese", 10/9/2017, and the other posts linked there, especially "Comparing communication efficiency across languages", 4/4/2008, which includes this quote from Alex Baumans about word counts in French vs. Dutch:

    [A]ny attempt to compare languages will fail, if the word formation rules of the languages differ too much.

    I work as a journalist for a HVAC magazine in Belgium. As Belgium is bilingual, out publication exists in parallel Dutch and French versions. For obvious reasons, articles are supposed to be about as long in both languages. However, this provides endless problems, the Dutch text being on average 15-20% shorter, and the word count is way out.

    One of the reasons (besides French orthography insisting on writing lots of letters that are not pronounced) is that Dutch, like German, can form compounds on the spot. Usually these are written together. Especially in technical terms, this is useful. A wall hung gas fired boiler is simply gaswandketel, as opposed to chaudière murale à gaz. How many words is that?

    I'll check letter-count and word-count relationships in some English/French parallel text corpora, when I have a few minutes.

    Update 5/28/2022 — "Comparing phrase lengths in French and English".]

  3. Coby L said,

    May 27, 2022 @ 10:43 am

    It could also be that French has adopted the use of more compound nouns under the influence of German and English, at least in science and technology. For example, non-ferrous metals are nowadays called métaux non-ferreux, but in the 19th century they were called métaux autres que le fer.

  4. john burke said,

    May 27, 2022 @ 10:56 am

    From watching French movies and TV programs I have the impression–this is NOT a tested hypothesis–that a certain number of conventional usages that once made for long sentences have been truncated. Example; "Qu'est-ce que c'est la musique? becomes "La musique, c'est quoi?" Still not as terse as English "What is music?' but shorter and (to my ear) more direct.

  5. Chris Button said,

    May 27, 2022 @ 12:19 pm

    Would a better measure be the number of subordinate clauses in a sentence rather than the number of “words”. That was what I was really thinking about.

  6. Mike D said,

    May 28, 2022 @ 3:25 pm

    For material, I see that project Gutenberg claims it has 1834 French books.

  7. Laura Morland said,

    May 30, 2022 @ 4:19 pm

    @John Burke –

    I live in France six months a year, and you're absolutely correct. However, the corpus of written French wouldn't include "La musique, c'est quoi?" (or more commonly, "C'est quoi, la musique?") unless a conversation was being quoted.

    Here's my own hypothesis: the "décalage" between written and spoken French is greater than that between written and spoken English, at least in most milieux.

RSS feed for comments on this post