Distribution of acronym lengths
« previous post | next post »
Or maybe "initialism lengths"? Wiktionary defines initialism as "a term formed from the initial letters of several words or parts of words, which is itself pronounced letter by letter"; while some (fussy) people argue that the term acronym should be reserved for words like laser (= "Light Amplification by Stimulated Emission of Radiation") or NATO (= "North Atlantic Treaty Organization").
Acronyms/Initialisms are (mostly) words, under any reasonable definition. But this category has the special property that most items have multiple specific and distinct senses, generally known to small groups and/or used in very special circumstances.
For example, American linguists know that LSA stands for "The Linguistic Society of America" — but the LSA didn't act in time to lock up https://lsa.org, which belongs to the "Louisiana Sheriffs' Association". And Acronym Finder gives 123 interpretations for LSA, including the linguists but (curiously) not the sheriffs.
Mark Davies' NOW ("News on the Web") Corpus has 3,680 hits for the string LSA — quickly checking a few of them (literally) at random gives us references to the Liangmai Sports Association's Badminton team; the Law Students Association at McGill; a recipe's abbreviation for a mix of ground linseed, sunflower seeds and almonds; Lifesaving South Africa; the Law Society of Alberta; and so forth. In that corpus, the Linguistic Society of America gets 55 hits, and the Louisiana Sheriffs Association has 6.
Someday it would be fun to run an acronym-finding script over that dataset, or a similar one. But this morning, as a crude approximation to the (non-frequency-weighted) distribution of initialism length, I checked the entry counts for probes of Acronym Finder with random letter-string samples of different lengths, generated by this simple R script.
A sample 20 random single letters yielded a mean of 65.5 hits and a median of 64.5:
G 66
V 65
Y 31
E 77
L 64
W 60
H 64
V 65
X 48
D 115
A two-letter sample yielded a mean of 58.1 and a median of 25.5:
ZZ 13
BO 85
UO 26
ND 82
OY 10
WY 8
MM 248
JR 25
YI 6
SK 78
A three-letter sample has a mean of 47.7 and a median of 41:
KXS 2
WRK 4
DCL 63
KNU 6
NPN 37
IPE 60
PVP 45
CCB 154
BJH 4
MCM 102
A four-letter sample has a mean of 1.4 and a median of 0:
EKCK 0
EPRL 6
BLUE 6
WIXI 0
QLCS 1
DZCZ 0
YJGM 0
BTDW 1
CWJI 0
FVOE 0
(Though the AcronymFinder's "acronym attic" has one unverified entry for EKCK as "Embassy in Kuwait City Kuwait".)
And a five-letter sample has mean and median of 0 — though ARKEM has one "unvalidated" entry in the AcronymFinder's attic, listed as "alarm remote keyless entry module":
RDZCI 0
LPEYZ 0
TUWRX 0
WMHXQ 0
ARKEM 0
VCEGP 0
MZMKH 0
WTFAY 0
RDITH 0
DBRBY 0
If we believed the unreliable probability estimates derived from those mean values, we'd estimate 6.55*26=170 single-letter entries, 5.81 *26^2=3928 two-letter entries, 4.77*26^3=83838 three-letter entries, and 0.14*26^4=63977 four-letter entries. Implausible estimates that still confirm my prejudice that three-letter initialisms are the most commonly used.
For sequence lengths of six and above, traditional initialisms or acronyms are increasingly unlikely, though "backronyms" like DREAM and PATRIOT buck the trend. And social-media and email names sometimes involve initialisms combined with abbreviations, like @FmrRepMTG.
The longest example I 've ever seen is MMIWG2SLGBTQQIA+. For an explanation and motivation of all 16 characters in that one, see Lezard Dr, Percy, Noe Prefontaine, Dawn-Marie Cederwall, Corrina Sparrow, Sylvia Maracle, Albert Beck, and Albert McCleod. "2SLGBTQQIA+ Sub-Working Group MMIWG2SLGBTQQIA+ National Action Plan Final report." (2021).
cameron said,
April 12, 2026 @ 11:06 am
the Jargon File's entry for TLA is relevant here:
http://www.catb.org/jargon/html/T/TLA.html
Scott P. said,
April 12, 2026 @ 11:30 am
From Guinness:
The initials of the Syarikat Kerjasama Orang-orang Melayu Kerajaan Hilir Perak Kerana Jimat Cermat Dan Pinjam-meminjam Wang Berhad compose the longest abbreviation: S.K.O.M.K.H.P.K.J.C.D.P.W.B.. This is the Malay name for The Cooperative Company of the Lower State of Perak Governments Malay People for Money Savings and Loans Ltd, in Teluk Anson, Perak, West Malaysia (formerly Malaya). The abbreviation for this abbreviation is Skomk.
Y said,
April 12, 2026 @ 1:35 pm
MMIWG2SLGBTQQIA+ is a bit unusual. It's the fusion of two acronyms: MMIWG (Missing and Murdered Indigenous Women and Girls), and 2SLGBTQQIA+ (Two Spirit, Lesbian, Gay, Bisexual, Trans, Queer, Questioning, Intersex, Asexual and others). In the linked article, the two parts appear separated by a space on one or two occasions (pp. 17, 52). The two parts appear by themselves in various places in the document as well.
Garrett Wollman said,
April 12, 2026 @ 2:06 pm
Worth comparing here to the (apparently randomly generated) "brand names" of Amazon drop-shippers of generic junk products you could get on AliExpress for a tenth the cost. I extracted the following list of "brands" just from last Thursday's list of US Consumer Product Safety Commission recall notices:
LRIGYEH, Wolfcode, NBIIUYIGE, Happiness Light, Maitys, VEEKTOMX, SNOOZ, Easymake, Wybotics
J.W. Brewer said,
April 12, 2026 @ 4:33 pm
Re the Smomk (etc.) referenced above, the maximum tolerable length of formal names of organizations may vary by genre, with some impact on the maximum potential length of initialisms. In North America, labor unions sometimes have names that run fairly long compared to those of other sorts of potentially initialism-generating entities, so you have e.g. IASMARTW (the International Association of Sheet Metal, Air, Rail and Transportation Workers) and IABSORIW (the International Association of Bridge, Structural, Ornamental, and Reinforcing Iron Workers). Unfortunately, the twenty-word United Association of Journeymen and Apprentices of the Plumbing, Pipefitting and Sprinkler Fitting Industry of the United States and Canada* is often known by the clipped form United Association and thus the minimalist initials UA.
*Formerly the United Association of Journeymen Plumbers, Gas Fitters, Steam Fitters, and Steam Fitters' Helpers of the United States and Canada.
Mai Kuha said,
April 12, 2026 @ 10:21 pm
Just thinking aloud without evidence: maybe acronyms/initialisms serve not only to reduce length to save time and breath, but to compress more information/concepts into a smaller chunk in order to reify a complex concept so that we can keep just e.g. "laser" (instead of "light", "amplification"…) in short-term memory while saying more things about lasers. If so, MMIWG2SLGBTQQIA+ seems all the more unwieldy, and that likely typical length of 3-4 letters makes all the more sense.
ajay said,
April 13, 2026 @ 3:42 am
The abbreviation for this abbreviation is Skomk.
Which makes me think about what other abbreviations are commonly abbreviated. "The Y" is, or at least was, the informal way of talking about the YMCA. Any others?
And there are nested acronyms as well – an artillery observer, for example, might carry an LTD, a Laser Target Designator, or rather a Light Amplified by Stimulated Emission of Radiation Target Designator, which would have its own NSN, or NATO Stock Number.
Peter Cyrus said,
April 13, 2026 @ 5:00 am
One problem with using initialisms as abbreviations is that the letter names are different across languages (and even within languages, as in zed vs zee). Acronyms don't have that problem: laser is, AFAIK, always laser. Of course, the word order may also vary, as in NATO vs OTAN. Oddly, US becomes EU (or EEUU), but USA is always USA.
Is there a name for the type of abbreviation formed by extracting and concatenating the stressed or initial syllables of the phrase? Like Komsomol or Stasi, or like the US Navy's CINCLANT?
Philip Taylor said,
April 13, 2026 @ 5:09 am
Thinking back to a recent discussion (here) as to whether certain words are normally pronounced with an /s/ or a /z/, I thought it worth asking what is the predominant pronunciation of "laser" (and "maser", for that matter). For me, as a Briton, it is definitely a /z/ — ˈleɪz ə ǁ -${}^{ə}$r — but I wonder how others normally pronounce the word(s).
Bob Ladd said,
April 13, 2026 @ 7:51 am
@Peter Cyrus,
In Romanian, USA comes out as SUA.
Mai Kuha said,
April 13, 2026 @ 8:28 am
An old joke that resonates in the present:
¿Cuál es el país que primero te llama y luego te asusta?
¡EE! ¡UU!
ajay said,
April 13, 2026 @ 8:59 am
One problem with using initialisms as abbreviations is that the letter names are different across languages (and even within languages, as in zed vs zee).
Fans of Ealing comedies will remember the Lavender Hill Mob being foiled by exactly this problem, combined with non-rhotic pronunciation:
Stanley Holloway: "But I told you never to use a crate marked with an 'R'!"
French shop assistant: "But zat is not an 'A', monsieur! Zat is an 'R'!"
ajay said,
April 13, 2026 @ 9:04 am
One-letter acronyms seem somewhat pointless but they do exist. "Take the 'L'" means "take the loss". If you're a soldier and you're told to report to the Q, you go to see the quartermaster. "The U" is apparently the University of Miami.
ajay said,
April 13, 2026 @ 9:11 am
Is there a name for the type of abbreviation formed by extracting and concatenating the stressed or initial syllables of the phrase? Like Komsomol or Stasi, or like the US Navy's CINCLANT?
That is a "syllabic abbreviation", straightforwardly enough.
For some reason I associate it very strongly with the governments of the mid 20th century – COMINCH, Stasi, Gestapo, BatDiv, DesRon, SovNarKom, Komsomol, Minitrue, MinTech and so on – though it never really went away in Russia (Neftegaz, Gazprom etc) and has recently broken out again in the UK government's various Handmaid's-Tale-styled regulatory authorities such as Ofcom, Ofgas, and Ofwat.
Bob Ladd said,
April 13, 2026 @ 11:34 am
@Ajay: But the German ones are not really syllabic – they almost all use only the onset and nucleus of the syllables involved. This mechanism is fairly productive and has been for decades. Gestapo (GEheimSTAatsPOlizei) is now nearly a century old, Stasi (STAaatsSIcherheit) is more recent, but the process underlies lots of newer coinages and doesn't necessarily involve names of organizations. A good example is Kita (KInderTAgesstätte) 'nursery, daycare', which is now just an ordinary noun.
cervantes said,
April 13, 2026 @ 12:15 pm
I am happy to be able to tell you that the American Symphony Orchestra League changed its name to the League of American Orchestras.
John McNaught said,
April 13, 2026 @ 5:53 pm
A case of an apparent acronym that is not one is ISO, which ISO itself notes is not an acronym but a short form for International Organization for Standardization. Another example is UTC for Coordinated Universal Time. Perhaps readers know of others. Such phenomena make acronym to full form automatic mapping somewhat harder.
ajay said,
April 14, 2026 @ 3:59 am
"But the German ones are not really syllabic – they almost all use only the onset and nucleus of the syllables involved. This mechanism is fairly productive and has been for decades. Gestapo (GEheimSTAatsPOlizei) is now nearly a century old, Stasi (STAaatsSIcherheit) is more recent"
Yes, there seems to be a rule that you end each syllabic component with a vowel – probably to make it more pronounceable. Gestapo is easier to say than Gestaatspol. Staatspol, Kindtag the same.
Things that look like acronyms but aren't – I believe in some bits of America they're fond of middle initials that don't actually stand for anything. President Harry S Truman's middle name was apparently "S".
cervantes: the incredibly unpleasant Admiral Ernest King made at least one sound decision in his career: after becoming Commander In Chief, US Fleet, he changed his job abbreviation to COMINCH because he thought CINCUS sounded like "sink us" and that wasn't a good sound in late December 1941.
Tom Dawkes said,
April 14, 2026 @ 3:38 pm
Don't forget the Michael Caine film "The Ipcress file", based on Len Deighton's novel "The IPCRESS File": I P C Rtress
Tom Dawkes said,
April 14, 2026 @ 3:42 pm
Induction of Psycho-neuroses by Conditioned Reflex under Stress. I now know NOT to use angle brackets!
dainichi said,
April 14, 2026 @ 7:49 pm
> Gestaatspol. […] Kindtag
Were the system to follow German syllabification rules (at least if those coincide with hyphenation rules), I think that would be 'Gestaatspo' and 'Kinta'.
I think of the tendency to want to put intervocalic consonants in the preceding syllable as a very English phenomenon, presumably caused by analyses trying to make sure vowels that don't appear word-finally don't occur syllable-finally either.
ajay said,
April 15, 2026 @ 2:58 am
Serendipitously, this came across my Bluesky feed this morning: an entire thread of posts written in words of two letters. It starts:
"ok so my ex yc vp is vv go go on ai rn bc he is in an sv vc gc or we — my em is in on it w/ ai as an os to do ui qa in ci — so tl dr ig im tl of ai ui qa ?? rn ai ui qa v1 is cc in an hv vm on my pc on gh pr xd"
https://bsky.app/profile/roooooland.bsky.social/post/3mjhsbf5fck2e
The translation starts: OK, so my former venture capital vice president is very, very go-go on artificial intelligence right now because he is in a Silicon Valley venture capital group chat…