The Latinometer
« previous post | next post »
From David Frauenfelder:
Here’s an item from the land of language: the "Latinometer".
Have you seen it? You enter text into the query box, it analyzes how Latinate your English vocabulary is, and then tells you whether you sound “concrete,” educated, pretentious, or mendacious. The more Latin-derived terms in your text, the more likely you are to be a liar.
Your most recent Language Log post scored 53% on the Latinometer, pretentious, and dangerously close to the “You are probably lying” zone. I still don’t know if the author, a Latin professor, is trying to be ironic.
Somebody needs to do a LL post on this. I find it utterly ridiculous, and I’m a Latin teacher. Or maybe I find it ridiculous because I’m a Latin teacher. I wonder what a linguist would say?
I'll merely observe that the Preamble to the U.S. Constitution scores 68% on the Latinometer, well over the "probably lying" threshold. At 63%, the first two paragraphs of Federalist #1 are not far behind.
I should note in passing that the Latinometer owes a debt to the segment on "Pretentious Diction" in George Orwell's "Politics and the English Language":
Pretentious diction. Words like phenomenon, element, individual (as noun), objective, categorical, effective, virtual, basic, primary, promote, constitute, exhibit, exploit, utilize, eliminate, liquidate, are used to dress up a simple statement and give an air of scientific impartiality to biased judgements. Adjectives like epoch-making, epic, historic, unforgettable, triumphant, age-old, inevitable, inexorable, veritable, are used to dignify the sordid process of international politics, while writing that aims at glorifying war usually takes on an archaic color, its characteristic words being: realm, throne, chariot, mailed fist, trident, sword, shield, buckler, banner, jackboot, clarion. Foreign words and expressions such as cul de sac, ancien regime, deus ex machina, mutatis mutandis, status quo, gleichschaltung, weltanschauung, are used to give an air of culture and elegance. Except for the useful abbreviations i.e., e.g., and etc., there is no real need for any of the hundreds of foreign phrases now current in the English language. Bad writers, and especially scientific, political, and sociological writers, are nearly always haunted by the notion that Latin or Greek words are grander than Saxon ones, and unnecessary words like expedite, ameliorate, predict, extraneous, deracinated, clandestine, subaqueous, and hundreds of others constantly gain ground from their Anglo-Saxon numbers. The jargon peculiar to Marxist writing (hyena, hangman, cannibal, petty bourgeois, these gentry, lackey, flunkey, mad dog, White Guard, etc.) consists largely of words translated from Russian, German, or French; but the normal way of coining a new word is to use Latin or Greek root with the appropriate affix and, where necessary, the size formation. It is often easier to make up words of this kind (deregionalize, impermissible, extramarital, non-fragmentary and so forth) than to think up the English words that will cover one's meaning. The result, in general, is an increase in slovenliness and vagueness.
And before that, we could explore a couple of centuries of Romantic linguistic xenophobia.
Orwell's lists of problem words are somewhat strange, at least to today's reader. "Basic" and "promote" are "used to dress up a simple statement and give an air of scientific impartiality to biased judgments"? "Predict" and "clandestine" are "unnecessary words" used by "bad writers"?
Elonkareon said,
August 15, 2014 @ 4:51 pm
Orwell lays the blame on foreign and Latinate words, and yet unless I am mistaken, several of his example words (especially the words used for archaic colour) are of at least partial Anglo-Saxon origin. Forget and age-old, sword and shield, are each "pure English" but still susceptible to being used in "pretentious diction". And how, I wonder, does Orwell plan to express "pretentious diction" in Modern Anglo-Saxon?
The problem with preferring Anglo-Saxon words to Latinate ones is that there simply aren't enough left in common usage to carry on even the most basic of conversations. The ones that are left are often near synonyms (e.g. holt, wold, glen, wood, though I suppose for most people the first two are anything but common) and specific to either rural life or war.
I do personally prefer the old words, and make a point of using the few that aren't too far gone (and maybe a few that are), but merely because I like their sound and the sense of connection to a lost past, not because they are any less pretentious than the words that have replaced them.
J. W. Brewer said,
August 15, 2014 @ 5:07 pm
I guess no one had access to the google books n-gram viewer to fact-check Orwell's complaints when they were first published, but it turns out that "subaqueous," for example, had and has not been constantly gaining ground on "underwater" (my best guess as to the corresponding Anglo-Saxonism?), but quite to the contrary. Back in earlier times (starting in the 1820's) "subaqueous" was more common than "underwater." But the trendlines crossed a century ago at the outbreak of WWI (probably a coincidence?), and the Anglo-Saxonism proceeded to establish an increasingly commanding lead to the point that by 1945 (when Orwell published) "underwater" was more than ten times as common and by 2008 (most recent year I can check via the n-gram viewer) it was on the order of thirty times as common.
Jan Freeman said,
August 15, 2014 @ 7:58 pm
"Orwell's lists of problem words are somewhat strange, at least to today's reader." I find that *anyone's* list of problem words is somewhat strange; whenever a writer asserts that these words are still OK, but these are past their sell-by date, I wonder what rationale could possible govern the choice. No two people seem to have the same list of vogue words, cliches, pretentious words, peeves, etc. But instead of concluding that much word prejudice is a matter of individual experience and taste, the listmakers seem to conclude that nobody else is quite as sensitive to language as they are.
D.O. said,
August 15, 2014 @ 8:17 pm
No wonder the courts are largely ignoring it.
Jerry Friedman said,
August 15, 2014 @ 8:44 pm
Latin vocabulary frequency? Very strange measurement choice.
Ken said,
August 15, 2014 @ 8:56 pm
Has anyone read "Uncleftish Beholding" by Poul Anderson? It's an explanation of atomic theory and chemistry without any Latin or Greek roots. He had to coin a few terms…
Orwell's essay also reminds me of Nero Wolfe, who once burned a dictionary because it allowed "imply" for "infer", and who would not permit "contact" to be used as a verb in his house.
Jason said,
August 16, 2014 @ 12:10 am
Orwell is a much better amateur political philosopher than he is amateur linguist.
As far as the connection between Latin roots and lying, I'm reminded of a quote from Iain (M) Banks:
Of that list (dissemble, evade … etc) we have 9 of 11 terms that are either Latinate, or French, which is of course simply Latin chewed up by the perverse and untrustworthy Frogs, with only two words, "willfully misunderstand", being from good Anglo-Saxon stock.
I'm struck by the extreme stability of sociolinguistic judgements about the perceived pretentiousness, formality, dishonesty, etc of various Latinate roots. The lot of a Latinate root is not entirely a happy one. Most foreign roots never quite assimilate, no matter how hard they try to be accepted. You might have been part of English since before the Norman conquest, but you're still not quite considered "native." Children vastly prefer the native roots over you — they say "doggy", not "canine". Adults will casually pass such sociolinguistic judgement on you — mendacious, pretentious, formal, abstract, technical. Latinate words never quite get the hearth and home-style homeliness of "hearth" and "home". "Focus" and "Domicile" just don't cut it.
Gregory Kusnick said,
August 16, 2014 @ 1:38 am
So I'm guessing then that Latinometer is meant to be pronounced with stress on the third syllable (as in galvanometer).
My initial reading of it, with stress on the second syllable, would have a rather different meaning.
David Morris said,
August 16, 2014 @ 3:57 am
I would go as far as to say this: all else being equal, given a valid choice in the context, use the shorter and/or Germanic/Anglo-Saxon word rather than the longer and/or Latinate word. Often there is a genuine reason to use the longer and/or Latinate word; go for it. If there's not, don't.
I once asked a colleague if 'indubitably' accomplished anything that 'undoubtedly' didn't. He looked straight in my eye and said 'Indubitably!'!
(The spell check doesn't like 'Latinate'.)
pj said,
August 16, 2014 @ 6:25 am
Hm. I wondered how it would deal with Greek-rooted words, and just pasted in
from the first (ἀ- prefix) section of Wikipedia's 'list of Greek words with English derivatives' (I removed 'amoral').
Its Latinate density is 125%, apparently.
pj said,
August 16, 2014 @ 6:41 am
It also doesn't have a very large vocabulary. Of
all are of 'unidentified' origin apart from anthropology, misanthrope and misanthropy.
Victor Mair said,
August 16, 2014 @ 6:52 am
@Gregory Kusnick
I put the stress on the first syllable.
leoboiko said,
August 16, 2014 @ 7:45 am
@Morris: sucks to be us Romance speakers, I guess (Latinate words are so much easier…)
Jonathan Mayhew said,
August 16, 2014 @ 8:38 am
Orwell cannot avoid Latinate (and Greek origin) words for even a short phrase or sentence. Look at his own vocab here:
pretentious, diction, statement, simple, impartiality, scientific, bias, judgement, adjective, process, dignify, sordid, international, politics, glorifying, usually, archaic, color, characteristic, expression, notion, peculiar, normal, language…
Indubitably and doubtlessly both come from Latin dubitare.
Aelfric said,
August 16, 2014 @ 9:59 am
Given my moniker, I find it odd that I am (sort of) defending Latinisms, but here goes: you say "all things being equal…." yet it is exceedingly rare, at least in my idiolect, for latinate and Anglo-Saxon (or, for that matter, words of any other origin) to have the exact same semantic ranges. Let's take your example, though, as pointed out, both indubitably and undoubtedly have Latin roots. "Indubitably" to me is stronger, meaning something like "it cannot be doubted," while "undoubtedly" means something more akin to "I am aware of no doubts associated with this idea." Thus, to me, your colleague's reply makes perfect sense–he was (playfully, I suspect) arguing against you by saying that the difference cannot be doubted. Again, given my idiolect, "undoubtedly" would have been awkward as a reply, since you had just expressed doubt about the idea under discussion. While it is easy for me to think of words that have large, even overwhelming areas of overlap, it is very difficult for me to come up with words that are exact semantic clones.
Aelfric said,
August 16, 2014 @ 10:00 am
Argh. My last comment was obviously to Mr. David Morris. This is why I never comment on Saturday mornings.
Chips said,
August 16, 2014 @ 2:42 pm
Mmmm. I would reckon the Latinometer is pure crap (and I note the post is listed under "Humor"). Of the 167 words of text I submitted to the site (from a letter home), 98 were ruled "excluded". Of my 167 words, 10 were assigned "Latinate", even though 14 words were in Italian. The word "just", as in "just up the road" was described as Latinate: it's pretty distant. Curiously, not a single Italian word was billed as "Latinate". Maybe I am missing something. 10 more were "unidentified", along with the word "drownings", and six of the 14 Italian words.
As it is, my 167 words rated 28, the mid-range of "sounding educated". If the truly "Latinate" Italian words were included, I would be well in the pretentious range. Maybe subscribing to Language (Latinate) Log (Germanic) would qualify as pretentious.
I just submitted the above text. The words Italian, Latinate and Germanic are listed as "excluded"; the word "Latinometer" was listed as Germanic
Mary Margolies DeForest said,
August 16, 2014 @ 3:11 pm
As the mother of the Latinometer, I want to thank you all for taking the time to make these comments.
On the website, I refer to Orwell's paragraph on the use of Latinate words to gloss over horror (http://latinometer.com/o_e3066bcd4fb27f67.html). I read his "Politics and the English Language" in college and that paragraph stayed with me. Later, when I began to look at Austen's English, I found that she, too, connected Latinate words with deception, though she used Latinate words to show admirable qualities, as well, like reason and self-control. I used an early version of the Latinometer for my paper on how she created convincing speeches by varying the density of Latinate words (https://ucdenver.academia.edu/MaryDeForest/Papers; summarized at http://latinometer.com/o_dc61078aaf2ab42e.html).
I don't suggest that the Founding Fathers deceived their readers by loading down their prose with Latinate words. The creation of a new country calls for lofty language. However, since most modern writers aren't building countries, I suggest using the Latinate level (42%) of Mary Bennet, the pedant of Austen's Pride and Prejudice as a limit in most circumstances. The Latinometer is not about rules but about flexibility in choosing one's tone and voice for each occasion.
As PJ wrote, the Latinometer does not yet include every word in the English language. I made the Latinometer out of an online dictionary with 110,000 words with American spellings, and have been adding words to it every night. I am adding words spelled the British way as well. If you send the essay back the next day, the words will be there, a day late, but better belated than never. Names are excluded automatically so it does not matter whether they are catalogued or not, but I am filling them in, as well. I give Greek words a slightly higher value than Latin. I give words changed by the French (like mountain) a slightly higher value than straight Germanic. I tried different values for these, but finally settled on 1.25 for Greek and .25 for Latin because they ended up working for Jane Austen's characters.
I am finding it hard to fix definite edges. For instance, should a word like sunstuff from Poul Anderson's "Uncleftish Beholding" be included as a Germanic word meaning helium, even though it's not a real word? And what do I do with potassium: a word made up to sound Latin because potash did not sound sufficiently scientific? My address is crypto@ecentral.com, if anyone would like to help me.
Jason, I have never read Iain Banks. Thank you for the quotation.
John Walden said,
August 16, 2014 @ 3:45 pm
The Latinometer needs another name, one that is not so Latin.
Mary Margolies DeForest said,
August 16, 2014 @ 5:13 pm
Latin is the measure of all things.
Greek words came to the West through Latin, so they fit into a Latinometrical environment. Words that I call French, the hybrid words that modified Latin stems adding a vowel here or cutting off a syllable there derive mostly from Latin. The consonants maintain the outlines of the original Latin, but the edges are blurred.
Jonathan said,
August 16, 2014 @ 5:45 pm
Indubitably is, to me at any rate, rather more high-register (what you might call pretentious) than undoubtedly. The morphological roots of both words are derived ultimately from Latin dubitare, but the former word is directly borrowed from the Latin derived adjective indubitabilis, with only the addition of the Anglo-Saxon suffix -ly, while the root of the latter word is only indirectly related to the Latin, being borrowed from Old French dote; all the other morphemes in un-doubt-ed-ly are firmly Germanic and Anglo-Saxon. Even the doubt root has undergone common English sound changes since it was borrowed from French (though its spelling has acquired an unhistorical b owing to its relationship with its Latin cognate).
All in all, then, at least with respect to this pair of words, the connotation between more Latinate words and pretention is fairly robust. Moreover, if you are going to measure the Latinity of some English word, you need to be able to distinguish between later borrowings directly from the Latin, which are manifestly less integrated into the English lexicon and grammar, and earlier borrowings that have undergone internal English changes in the meantime, or borrowings from Romance languages that have undergone their own changes since becoming separate from Latin.
Jerry Friedman said,
August 16, 2014 @ 7:15 pm
Two Latin-derived words that falute lower (as John Lawler might say) than their closest Germanic-derived synonyms: very compared to quite, most, and so forth (I almost wrote "etc."); and pigeon compared to dove.
Jerry Friedman said,
August 16, 2014 @ 7:19 pm
Mary Margolies DeForest: I wouldn't add any words that are only in "Uncleftish Beholding". And potash couldn't have been used for potassium; they needed a word for the metal that was different from the word for its carbonate. I suppose Davy could have come up with something more English-sounding, though.
Jonathan said,
August 17, 2014 @ 1:52 am
Great examples, Jerry! We need a faluting index for English words and compute the relative falutedness of Latinate vocabulary from that. That could bring in grant money, couldn't it?
GH said,
August 17, 2014 @ 6:27 am
Use clear, plain, simple, familiar, easy, basic, normal, regular, common language! Avoid fancy Latin-derived terms!
chris said,
August 17, 2014 @ 9:51 am
then tells you whether you sound “concrete,” educated, pretentious, or mendacious.
Am I missing something, or is that scale composed *entirely* of Latin-derived words? Seems like a bad start for the idea that you don't need and should avoid Latin-derived words when you can't even describe the concept you are trying to explain without resorting to them.
Or to neologisms and seldom-used words, which Poul Anderson can pull off, but most writers probably can't. Apparently including Orwell.
Mary Margolies DeForest said,
August 17, 2014 @ 1:05 pm
The Latinometer can reduce high densities (Including the homepage of my website, unfortunately). Baseball has the Mendoza line, eliminating batters whose score falls below .200. Why not have a Mary Bennet line for writers whose score falls over 42%?
Latinomater
Brett said,
August 17, 2014 @ 3:13 pm
The meaningful question is not, "Why not…?" but rather, "Why?"
Jerry Friedman said,
August 17, 2014 @ 3:45 pm
Jonathan: In hopes of being included on the grant, I'll mention that The Economist's style guide (which quotes Orwell's six rules) says, "Use the language of everyday speech, not that of spokesmen, lawyers or bureaucrats (so prefer let to permit, people to persons, buy to purchase, colleague to peer, way out to exit, present to gift, rich to wealthy, show to demonstrate, break to violate)."
This suggests that wealthy and present falute higher than rich and gift. However, I'm not sure that's true in American English for present and gift, at least if the Academic corpus is higher-class than the Spoken corpus. (I'm more sure we can hardly ever substitute colleague for peer.)
Christmas present(s):
spoken: 148, academic: 22
Christmas gift(s):
spoken: 125, academic: 14
Jerry Friedman said,
August 17, 2014 @ 3:48 pm
Sorry, I take back rich (which is from Old English) and my earlier mention of quite (which is ultimately from Latin and related to quit and quiet, they tell me). Next time I'll check etymologies before commenting.
Bloix said,
August 17, 2014 @ 10:13 pm
"In the nineteenth century the Dorset poet William Barnes, the author of The Speechcraft of the English Language, was one of the adherents of Saxonism. In his zeal for “English English” he advocated, with little success, the replacement of the Latin term "adjective" by his own creation "markword of suchness" and the word "omnibus" (also of Latin origin; later abbreviated to bus) by "folk-wain."
"William Morris was another prominent poet who did his best to promote the native element in the language. However, his artificial coinages like, for example, faith-heat (enthusiasm), fore-ween (anticipate), sundersome (divisible), and word-strain (accent) failed to uproot their Romance derived equivalents."
http://www.1066andallthat.com/english_contemporary/diglossia_10.asp
richardelguru said,
August 18, 2014 @ 6:30 am
I put in a random example of one of my essays (36%, borderline!) and Latin-O-Meter didn't recognize "cervidae" or "rangifer tarandus". Not sure what to make of that.
Mary Margolies DeForest said,
August 18, 2014 @ 9:27 am
The Latinometer does not measure the Latinity of Latin words, only of English words. My Latin dictionary does not have rangifer or tarandus.
Congratulations on scoring 36% (Mr. Darcy's density)!
Latinomater
Elonkareon said,
August 18, 2014 @ 3:30 pm
As long as you don't attach a recommendation to this count, I don't see a problem with it. It's an interesting statistic to see. But if someone is looking to alter the tone of their writing, etymology is not the first (nor second, but perhaps third) place to go looking.
J. W. Brewer said,
August 18, 2014 @ 4:04 pm
There is no doubt *some* positive correlation between a word being Latinate and a word being high-falutin', and as is often the case it is much easier to measure the crude and imperfect proxy than the thing one ought to actually be interested in. Using more (or for that matter less) high-falutin' lexical items than would be typical in the relevant speech community for the relevant sort of discourse is plausibly a sociolinguistic signal of *something*, but exactly what that something might turn out to be in a particular instance may be difficult to predict (at least if one assumes that, e.g., pretension and dishonesty are two different things, rather than merely different facets of the same Ur-flaw). But even beyond that, what the baseline (Latinometrically), and how far away from it one needs to be before the variation should be considered a sociolinguistic signal rather than noise are going to vary considerably based on contextual factors.
As has been pointed out above, trying to reduce ones percentage of Latinate vocabulary too far can itself seem an affectation and/or produce odd-sounding results (thus itself sending some sort of sociolinguistic signal). I am reminded of the new study saying something like "yes, too much salt is in fact associated with various medical risks, but an excessively low-salt (and certainly a no-salt) diet can also cause medical problems, so don't get carried away."
Bathrobe said,
August 19, 2014 @ 9:36 pm
In my youth I was something of a Saxonist, for a short while, at least. The problem with being a Saxonist is that using native Anglo-Saxon words is no guarantee of being direct and down to earth. There are many cases where the Anglo-Saxon term is highflown or poetic and the Latin or French term is in common use and thus down to earth. People who make up examples to prove that Saxonist prose is more direct or honest or homely (or whatever) are picking their examples to fit their case. If adherents wrote to thoroughly Saxonist principles, the actual effect would be quite different from the one desired. Consistently substituting 'kingly' for 'royal' or 'regal', for instance, would sound weird, not honest or direct.
The real opposition is that between colloquial English and written English (and there are many other subtle gradations involved). The way to make your language accessible is not to avoid Latinate words; it is to choose familiar words. This is not necessarily the same as choosing Anglo-Saxon words.
Zubon said,
August 20, 2014 @ 7:51 am
I'm still kind of stuck on "predict" as unnecessary and pretentious. What word would he recommend in its place? "Forecast"? "Augur"?
Bathrobe said,
August 20, 2014 @ 9:12 am
How about 'foretell'.
Colin said,
August 20, 2014 @ 9:33 am
Reminds me a bit of the 'U versus non-U' discussion about English sociolects in the mid 20th century. It was often the case that the U word was Anglo-Saxon, but the non-U sociolect used a word of Romance (usually French) origin. For instance 'napkin' was posher than 'serviette', 'graveyard' posher than 'cemetery' and so on.
@Zubon: 'Foresee' would be the most basic I think. 'Augur' is Latin. The 'cast' part of 'forecast' is a borrowing from Norse apparently, so not quite Anglo-Saxon.