Where is *gaggig?

« previous post | next post »

My preliminary experiments with dictionary searching suggest that English has absolutely no words with roots of forms like *bobbib, *papoop, *tettit, *doded, *keckick, *gaggig, *mimmom, *naneen, *faffiff, *sussis, etc. These are simple CVCVC shapes that do not seem to contain any un-English sequence. They aren't hard to say. In fact there is an example of a verb with the shape dVd that has a regular preterite tense: deed has the preterite deeded (as in The farmer deeded back his farm to the bank [WSJ w7_016]). But the pronounceability of deeded only makes the puzzle more acute: why are there no roots with the phonological form CiVCiVCi (where Ci is some specific consononant sounds and the V positions are filled with vowel sounds). Why? Or is the generalization perhaps wrong? Have I missed some words of the shape in question?



87 Comments

  1. Faldone said,

    March 3, 2010 @ 12:04 pm

    No doodads?

  2. Sean Edison-Albright said,

    March 3, 2010 @ 12:05 pm

    This isn't a root, but susses fits the mold.

  3. Bruce Rusk said,

    March 3, 2010 @ 12:11 pm

    doodad? ninon?

  4. Stephen Chrisomalis said,

    March 3, 2010 @ 12:21 pm

    Baobab fits marginally – it's trisyllabic so may not fit the pattern you're looking at. Popup isn't a root word but works well otherwise.

  5. Rachael said,

    March 3, 2010 @ 12:21 pm

    Sean: not sure it does, because the final s is voiced. I thought of "sissies" at first, but had to reject it for the same reason.

  6. Sean Edison-Albright said,

    March 3, 2010 @ 12:21 pm

    Would baobab count? And rarer is another non-root example.

  7. JLR said,

    March 3, 2010 @ 12:22 pm

    Someone will probably correct me, but I don't think that Hebrew, where most of the roots are CVCVC, has any roots where all three consonants are the same. Not sure how germane that is.

  8. Ryan Denzer-King said,

    March 3, 2010 @ 12:25 pm

    Long-distance OCP effects? Perhaps it's similar to the Arabic prohibition on roots with identical consonants.

  9. Andrew Clegg said,

    March 3, 2010 @ 12:27 pm

    Hoohah?

  10. Sean Edison-Albright said,

    March 3, 2010 @ 12:27 pm

    Good catch, Rachael. And Stephen's popup inspired Mommom and Poppop.

  11. Andrew Clegg said,

    March 3, 2010 @ 12:29 pm

    Actually scratch that, the final h in hoohah is obviously silent. That's what you get for thinking about words visually.

  12. Sky Onosson said,

    March 3, 2010 @ 12:30 pm

    I thought of "doodad" right away, but that's about it. I think the "close calls" are interesting too, such as "bebop".

  13. Rachael said,

    March 3, 2010 @ 12:32 pm

    We came close to having a root word that fits the pattern: IIRC, when Dawkins coined the word "meme" at the end of "The Selfish Gene", he considered making it "mimeme", but decided "meme" was better.

  14. L. said,

    March 3, 2010 @ 12:34 pm

    not roots but "rarer" and "roarer"

  15. L. said,

    March 3, 2010 @ 12:37 pm

    A quick grep down /usr/dict/words gives
    baobab
    dadoed
    dauded
    deeded
    deedeed
    diiodid
    doodad
    duded
    lilial
    ninon
    nonion
    rarer
    rearer
    roarer
    seises
    sises
    souses
    tautit
    weewaw
    weewow

    [Yes. But you see what I mean: they are either clearly foreign, or not phonologically CVCVC, or inflected word forms of roots of CVC form, or (with the solitary exception of the placeholder word doodad "gadget or object of unknown name") words that nobody ever heard of in their life. Where are the ordinary words? —GKP]

  16. Jarek Weckwerth said,

    March 3, 2010 @ 12:47 pm

    A quick look through a good pronunciation dictionary doesn't reveal any other examples (in addition to doodad) apart from proper names: Noonan, and, for non-rhotic accents, Kirkuk (which, of course, isn't even English…).

  17. D Sky Onosson said,

    March 3, 2010 @ 1:01 pm

    How about "teetotal" (longer than you're looking for, but contains the right shape).

  18. marie-lucie said,

    March 3, 2010 @ 1:04 pm

    English, like the other Germanic languages, has preserved the typical PIE root CVC (where the first C is sometimes a cluster). Roots with dentical consonants, as in kick or bob, are rare and often have affective connotations. Practically all the examples given by the commenters are either borrowings, recent humorous creations, or inflected forms of monosyllabic words. Apart from obvious borrowings, a CVCVC form in English is not historically a "root" but comes from a CVC root with an prefix or suffix. Further repetition of the root consonants would be a form of reduplication, something which in English applies only to full roots (CVC or longer, as in zigzag or pitter patter; doodad should also be included here, because the two parts are full syllables, each with its own stress). The pattern CiVCiVCi is just not part of English word-formation.

    Among words with affective connotations there are a number of roots with identical consonants which are only found in derivations with an old verbal suffix, as in diddle, giggle or titter, and more recent forms such as doodle, but never with extra duplication of a root consonant.

  19. John said,

    March 3, 2010 @ 1:12 pm

    This strikes me as an interesting kind of question, and I'm wondering how much of this is due to the nature of word formation in English: how many words we've inherited, whence, and so on.

    Does Latin or Greek show this pattern, for example?

    And how many English roots (not quite sure what that means here, to be honest) are CVCVC where the choices are completely free?

  20. Michael Sappir said,

    March 3, 2010 @ 1:28 pm

    JLR, off the top of my head I can think of only one, infrequent, Modern Hebrew root where the radicals are all the same consonant (roots of the classical semitic kind [as opposed to some modern borrowings] never have any vowels; these are supplied by the patterns, "Binyanim".) However, this one root, /d.d/, seems to always be realized as just two instances of the consonant, e.g. infinitive "le-dado-t" (meaning to hop, skip, bobble), 3sg.m.pst "dide" etc. My knowledge of the language is almost entirely intuitive (I learned it when I was 3, and haven't yet read much of the literature on it) but I know the roots where the consonants's pattern is ABB, i.e. the first is different from the other two, are usually analyzed as actually being biradicals with the second radical spreading to a third position. This is particularly clear in cases like /ts.d/, which can be realized as the noun "tsad", meaning 'side', or as the verb "tsided ('im)", 'side (with)' in 3sg.m.pst. But these cases are probably the result of transitive patterns, not the composition of roots.

    So anyway, the point is that you're apparently right, Modern Hebrew basically lacks CiVCiVCi forms.

    p.s. it now occurs to me the "dide" verb may actually have the root /d.d.h/, at least in Classical Hebrew. But the /h/ is never realized as a consonant in modern speech.

  21. Richard Sabey said,

    March 3, 2010 @ 2:29 pm

    @marie-lucie: "Apart from obvious borrowings, a CVCVC form in English is not historically a "root" but comes from a CVC root with an prefix or suffix."

    Yes, "rarer" and many other English words are so, but many other English words are derived from compounds in other languages, so the rarity of certain patterns is a consequence of the nature of roots in those other languages, not English.

    All CVCVC forms in English, though, apart from obvious borrowings? How about, for example, "seven", "level", "human", "metal", "pilot", "canal"…?

  22. Maneki Nekko said,

    March 3, 2010 @ 2:46 pm

    It's a borrowing from Chamorro, but kakkak is "a small bittern (Ixobrychus sinensis) of Guam," according to Merriam-Webster's Unabridged.

    [Nice one. But it only underlines my point: where are the ordinary English words? An unassimilated (and excessively rare) loanword from Chamorro (the indigenous language of Saipan and Guam) is not exactly a knock-down case for this shape being well attested in the English lexicon, is it? —GKP]

  23. Aaron Toivo said,

    March 3, 2010 @ 2:53 pm

    marie-lucie is right that C1VC1 roots tend to have affective connotations, but not that they're all that rare. In addition to those already mentioned, we have bib, gig, gag, loll, mom, dad, tit, peep, pip, pop, roar, rear, and nun, among others. Of course many of these were borrowings or ex-nihilo creations, but I'm not sure why that should matter, as native roots maintained all the way from PIE are a small minority anyway.

  24. Karen said,

    March 3, 2010 @ 2:54 pm

    @Richard Sabey: "All CVCVC forms in English, though, apart from obvious borrowings? How about, for example, "seven", "level", "human", "metal", "pilot", "canal"…?"

    Except for "seven", those are all borrowed.

  25. JS Bangs said,

    March 3, 2010 @ 3:24 pm

    And "seven" was probably borrowed at the PIE level :).

  26. Yonatan Zunger said,

    March 3, 2010 @ 3:34 pm

    JLR, I can't think of any AAA roots in Hebrew either, but there are some interesting derived forms which are similar, e.g. "tetateh" (tṭ'ṭ') in Modern Hebrew. This doesn't quite fit the rule because the first 't' sound is aspirated while the other two aren't, but in standard Israeli MH pronunciation the difference vanishes.

    This does remind me of the orthographic rule in Hebrew about *ḵḵ always turning into ḵḥ. In fact, something I'm noticing from all of the examples in this thread is that apart from proper names, they seem to be either playful slang (e.g. doodad), contain vowel sequences which inject their own syllable boundaries (baobab), or end up pronounced in ways which give different sound values to the various repeated consonants. (roarer — in every accent I can think of the first 'r' is voiced differently than the others)

    I think that this goes beyond roots, as well. All of the non-root examples we've seen here, like "popup" and "fife-off", are compounds, and fairly open ones at that; I still see "pop-up" hyphenated as often as not. It seems that there may be a fairly strong bias against this sound pattern in English as a whole, with exceptions always sounding unusual and possibly a bit whimsical. (Which seems to be part of the point of the word "doodad")

    And this may be true in other languages, as well; the Hebrew example I gave above is actually very rarely used. It's the masc. 2nd-person singular future tense of "sweep," and MH normally uses this tense to indicate imperatives. The actual imperative mood is generally considered unusually brusque and slightly rude. This word is one of the few exceptions — in normal speech, people will actually use the imperative ṭ'ṭ' for this word. However, this might just be a shortening of the three-consonant version because that version is nearly unpronounceable in rapid speech. (And is in fact a favorite in tongue twisters)

  27. Geoffrey K. Pullum said,

    March 3, 2010 @ 3:50 pm

    It's no good citing words like "rarer": that's an inflected form, the comparative of the root rare. It is true that doodad has the right form, but unfortunately it isn't really a dictionary-recognized root, and it it has no real meaning except "thingummy": not exactly a good example of a solid, simple, recognized, Standard English word. And several of you have just confused yourselves by looking at the spelling (I used ordinary spellings, but I made it clear that I was concerned with phonological shape).

    No, you have largely convinced me that I am right: no triconsonantal roots in English where the three consonants are identical. If Hebrew has the same limitation, that's really fascinating. It was just a thought that came to me as I sat in a boring meeting this afternoon, in a room that luckily had wireless so I could log in and talk to you!

  28. Ernie in Berkeley said,

    March 3, 2010 @ 4:00 pm

    Noted Mad magazine lexicographer Don Martin has documented many examples over the years

    http://madcoversite.com/dmd-alphabetical.html

    [Forgive me, but this is a very lazy comment. You didn't in fact cite a single actual case of a word of the right shape in the huge compendium of onomatopeic comic-strip words, ranging from AAAAGH! EEEEEOOOW to ZZZZZZZZZZZ. I'm not actually sure there is one in there. —GKP]

  29. BrianM said,

    March 3, 2010 @ 4:36 pm

    amomum and latitat

  30. Richard Sabey said,

    March 3, 2010 @ 5:21 pm

    @Karen If even words derived (directly or indirectly) from or through Greek, Latin or French are to be excluded from consideration because they're borrowed, then you don't leave much; as Aaron Toivo said, "native roots maintained all the way from PIE are a small minority anyway".

  31. Mark F said,

    March 3, 2010 @ 5:21 pm

    What do you mean "it isn't a real dictionary-recognized root"? Merriam-Webster recognizes it here. And saying that it has "no real meaning except for thingummy" strikes me as special pleading. Maybe you can make a case that people will accept funny-sounding words in this semantic space — perhaps there's a reason GUI toolkits provide widget sets and not doodad sets. But I still think doodad is a legitimate counterexample.

    [If you're happy with that as the answer, fine. Be my guest. By calling it a counterexample, you're tacitly adopting the position is that there is no phonotactic restriction at all; and when anyone asks you for a list of all the words of CiVCiVCi shape to demonstrate this, your answer is going to be doodad. (And perhaps also a smattering of other words in larger dictionaries, none of which I have ever heard of in my life.) Forgive me if I'm a little less than satisfied. I probably know 40,000 words. How come the only CiVCiVCi word I have ever heard is a whimsical word of which dictionaries say it is "used in a vague way to refer to something whose name one cannot recall", and there are essentially no others whatsoever? Just seems fishy to me. (Despite the word teetotal, which does contain a CiVCiVCi sequence, though not a CiVCiVCi root.) —GKP]

  32. Jen said,

    March 3, 2010 @ 5:36 pm

    English simply doesn't like having the same consonant repeat throughout the word. Makes the word harder to understand and increases the chance for speech errors. With more phonological diversity comes greater intelligibilty. Linguistics 101 people!

    [And the evidence for this would be? —GKP]

  33. Paul said,

    March 3, 2010 @ 5:38 pm

    My best guess is "mimeme". Unfortunately I can't find its pronunciation in a dictionary, so I'm not sure if it fits the CVCVC pattern.

  34. Ran Ari-Gur said,

    March 3, 2010 @ 5:48 pm

    @Michael Sappir: The h in d-d-h isn't realized in Classical Hebrew, either; it's just written to indicate the vowel. (Classical Hebrew does have some roots that end with actual pronounced h's — g-b-h as in "gavoah", n-g-h as in "nogah", and so on — but not all that many. If d-d-h were one of them, then the verb would be /le.da'de.a/ ~ /di'de.a/ in Modern Hebrew, rather than /le.da'dot/ ~ /di'de/.) But there are actually a lot of Hebrew "roots", in the sense that Dr. Pullum means, of the form CiVCi: zaz (z-w-z), sus, gag, dod, dud, shesh, vav — just like in English. And there are plenty examples of the forms CiVCjVCj and CiVCjCiVCj; so if there aren't any of the form CiVCiVCi, then that's pretty interesting.

  35. Karen said,

    March 3, 2010 @ 5:54 pm

    @Richard Sabey: I've got no dog in this fight – all I was doing was pointing out that if you were objecting to Marie-Lucie's analysis by providing some words that weren't borrowed, you weren't actually doing that. Your and her definition of "obviously borrowed" may well differ; I'm not competent to speak to that!

  36. Yonatan Zunger said,

    March 3, 2010 @ 7:33 pm

    @Geoff: I think that there may be something interesting going on even with inflected forms. My off-the-cuff suspicion is that whenever a form CiVCiVCi appears in English (and maybe elsewhere), either

    (1) Pronunciation will naturally shift so that not all three of the Ci's will be pronounced identically (e.g. 'rarer'; very vividly so in non-rhotic accents, but I can hear a difference even in my own Western US, rhotic speech, with the initial 'r' being a far more pronounced sound), or

    (2) The word will sound rather odd to native speakers, and will be "usually considered whimsical." (e.g., "doodad;" if I had some more books to hand I would check to see if there's any evidence that this whimsicality was the deliberate purpose of the coinage, but I certainly wouldn't be surprised if it were)

    "Popup" is an interesting case to test this with, since it's a compound which is turning into a single word. I'm seeing a bit of the "rarer" effect in my own pronunciation of it, but it's hard to pronounce two "p's" very differently so the effect is smaller. OTOH, it's easier to pronounce multiple "p's" in a row than multiple "r's," so if this effect is entirely because of pronunciation difficulty then it may be consonant-dependent.

  37. Mike Hammond said,

    March 3, 2010 @ 7:46 pm

    Geoff

    This looks a lot like the Stuart Davis *skVk, *spVp effect. Berkley argued that this was statistically expected for English. Cootzee then showed the effect shows up in judgment tasks.

  38. Greg Morrow said,

    March 3, 2010 @ 8:05 pm

    Jen strikes me as having the right idea — repeated consonants make speech errors more damaging by removing information that could be used to repair the error.

    In addition, there's the general decay an unstressed vowel undergoes. As it degrades, the two like consonants bordering the vowel would have the urge to merge. I'd think that DODdid would reduce to DOD in fairly short order.

    But consonants that can serve as syllable nuclei would be more resistant — LAL'l, for example, with a final dark l like "candle", would seem to be relatively stable. Or nannen. So their absence is a little more mysterious than CVCVC with apical consonants or stops.

  39. Nathan Myers said,

    March 3, 2010 @ 8:48 pm

    The uniqueness of "doodad" seems equally as interesting as (indeed, identical to) the absence of any other examples.

    On its rootliness, I have certainly used "doodads" in speech in the past. What more confirmation does it need? "Doodadless"? "anti-doodad"? ""doodad-like"? "doodadliness"? Sorry.

  40. arthur said,

    March 3, 2010 @ 9:29 pm

    tut-tut!

  41. Peter said,

    March 3, 2010 @ 11:46 pm

    I know it's a compound, but tête-à-tête springs to mind as a quad-consonantal example

  42. Mark F. said,

    March 4, 2010 @ 1:33 am

    How many CVCVC roots are there in English? Assuming a random distribution of consonants, what's the expected number of CiVCiVCi roots?

    What languages would you expect to have more of these? Does Hawaiian have very many?

  43. Bob Ladd said,

    March 4, 2010 @ 2:30 am

    I think several commenters are right that this is an OCP effect, and that Jen and others are right that there are functional explanations for this kind of prohibition. An early example of such an effect is Grassman's Law. The other cases mentioned in the thread also fit. Semitic roots have been explicitly discussed in terms of a restriction on sequences of identical consonants by Stefan Frisch and others in one of the Laboratory Phonology books (can't find it right now). The English reluctance to write (and say) e.g. Socrates's or (even worse) Jesus's is probably also related. The English prohibition on *spVp, etc., mentioned by Mike Hammond is similar – and, as he says, it's a genuine constraint, not an accidental gap. I agree with several commenters that doodad verges on being an exception, though Marie-Lucie is right that it's relatively unusual for a root in having two full vowels.

  44. Aaron Toivo said,

    March 4, 2010 @ 2:52 am

    1/526 of all CVCVC roots should be CiVCiVCi, assuming 24 consonants and random distribution. But it isn't, so the proportion should be higher than that.

  45. Nicholas said,

    March 4, 2010 @ 2:54 am

    @arthur:

    "Tut-tut" is an orthographic representation of a dental click, I believe. It's also represented as "tsk-tsk". Both of these have given rise to spelling pronunciations, but calling either of them a root is more than a bit of a stretch.

  46. Brian said,

    March 4, 2010 @ 3:11 am

    Well, there's "nonane". But if "doodad" got rejected as too esoteric …

    [Not esoteric. But whimsical, what they call a placeholder word, like whatsit or whaddyacallit. Not exactly a normal lexical noun. —GKP]

  47. Tony said,

    March 4, 2010 @ 7:22 am

    Hungarian has "kakukk" 'cuckoo'. And its plural is "kakukkok" – five k's in eight letters.

  48. Lameen said,

    March 4, 2010 @ 9:09 am

    Apart from the familiar Semitic examples noted by Greenberg, I first heard about similar phenomena from Podzniakov and Segerer, later published as Similar Place Avoidance: A Statistical Universal, Linguistic Typology, 12, 2, 2007 (http://pozdniakov.free.fr/PozdniakovSegerer2.pdf). Summary: "in each of the dozens of [African] Atlantic languages covered by our calculation, labials avoided combinations with labials, dentals with dentals, palatals with palatals, and velars with velars in CVC combinations…" and the same turned out to be true in every language family they checked, including Eurasian ones.

  49. Jorge said,

    March 4, 2010 @ 9:28 am

    cha-cha-cha?

  50. Brett R said,

    March 4, 2010 @ 9:29 am

    I'm curious to know what instigated this search.

  51. Ian MacKay said,

    March 4, 2010 @ 9:36 am

    In his book The Origin of Speech, Peter MacNeilage makes the point that far from there being a continuity from babbling to adult speech, there is in fact a strong disjunction between the two. Much of babbling consists of CV syllables repeated, e.g. babababa, but that adult language actively avoids such sequences. He presents data to suggest that, as in this posting, there is underrepresentation of such sequences (I do not recall if this data is based only on English or not). When such sequences include minor variations (different vowels or clusters with different second elements), the motor planning function abruptly breaks down. This can be seen in tongue twisters such as “That bloke’s back brake-block broke.” There are obvious differences between this sequence and the original posting’s quest for a monomorphemic triliterate root with all three consonants the same. However, the more complex version (the tongue twister) strongly suggests that the limits of the motor planning function can quickly be reached if a series of syllables with similar onsets are concatenated. Thus the reason for the rarity of this sort of form may not be found in abstract phonology, but rather in the limits of the mental firmware that produces speech.

  52. John Atkinson said,

    March 4, 2010 @ 9:38 am

    @ Mark F.: What languages would you expect to have more of these? Does Hawaiian have very many?

    Hawaiian has zilch, of course, since no syllable in Hawaiian can end in a consonant.

    How about CVCVCV then? Very few, it seems:

    ninini, 'to pour'
    pa:papa, 'flat; beans, peas'
    pipipi 'small mollusc'
    popopo, 'rotten'
    pupupu, 'numerous, crowded'

    FWIW, all these have the same vowel repeated. An exception is:

    kikako, 'Chicago'

    — a borrowing, of course.

    CVCVCVCV is pretty common — but these tend to be reduplications of roots of form CVCV

  53. John Atkinson said,

    March 4, 2010 @ 9:49 am

    Thinking about it a bit more, the Hawaiian words of form CVCVCV seem to be reduplications too, though with only one syllable reduplicated — e.g., ninini apparently comes from nini 'ointment' etc.

  54. Acilius said,

    March 4, 2010 @ 9:52 am

    @Andrew Clegg: I'm not convinced the final H in "hoohah" is silent. I think the word may fit the pattern. Of course, "doodad" and "hoohah" aren't really counterexamples. They aren't supposed to sound like other words, but to be obviously silly little sounds, sort of anti-words. So a lexicon that begins and ends with the two of them simply reinforces the point that English avoids this phonological structure.

  55. Army1987 said,

    March 4, 2010 @ 10:33 am

    On Wikipedia article on English phonology, it is pointed out that no /sCVC/ (where the two C's are the same consonant, other than /t/) are found, and it is described as a phonotactical restriction.

  56. John Walden said,

    March 4, 2010 @ 10:39 am

    Who else has been thinking "chachech, chechach, chochich" and so on to themselves all day, and not getting on with other things?

    Shame there isn't "judgage". I thought I'd got one for a moment. I did come up with "saucisse" but it has the distinct drawback of being French.

  57. Army1987 said,

    March 4, 2010 @ 11:19 am

    Actually, right now I can't think of any single morpheme containing such a sequence as CVCVC (with three identical C's) in any language.

  58. Rolig said,

    March 4, 2010 @ 11:27 am

    Non-rhotically, "surcease" should get a consolation prize. One could even imagine a possessive form, "surcease's balm", for CiVCiVCiVCi.

    [Rolig is right. In a non-rhotic dialect like Southern British, or if the vowel of sur is treated as a unitary rhotacized vowel phoneme, surcease really does have the form sVsVs. A rather rare word, but nothing dubious or foreign about it, and it does fit the pattern. So there is at least one otherwise ordinary word of the specified form. But boy, are they rare! —GKP]

  59. Faldone said,

    March 4, 2010 @ 11:29 am

    Acilius: Of course, "doodad" and "hoohah" aren't really counterexamples. They aren't supposed to sound like other words, but to be obviously silly little sounds, sort of anti-words. So a lexicon that begins and ends with the two of them simply reinforces the point that English avoids this phonological structure.

    I would argue to the contrary that the fact that we can coin such words means that the formation is not forbidden even if there are no "legitimate" examples. Nobody coins words like smroog or nglarch.

  60. Ellen K. said,

    March 4, 2010 @ 11:38 am

    Doodad strikes me as more of a compound, just one that's made of two meaningless (in the context) roots.

  61. Lisa Davidson said,

    March 4, 2010 @ 12:12 pm

    Not that it contributes much to this conversation, but 'teetotal' isn't even a partial counterexample at least in American English since the last 't' is pronounced as a flap.

  62. speedwell said,

    March 4, 2010 @ 12:15 pm

    "Dated," the way I speak it and hear it spoken, has the T pronounced the same way as the two Ds.

  63. David said,

    March 4, 2010 @ 12:18 pm

    Swedish would seem to fail this test too, which is not surprising because it's a Germanic language (as Marie-Lucie said). It also fails the *spVp test, but not the *skVk test, as the words "skak" ("shake", as a noun) and "skock" ("gaggle", "herd", "group") attest. (There are other words spelled skVck like "skick" and "skäck", but before these front vowels [sk] turns into the infamous voiceless palatal-velar fricative [ɧ].)

    Though looking at the title of the post, "gaggig" is actually a word in Swedish, used as a term for old persons whose intellectual capacities aren't what they used to be. But this word is (1) a derivation (certainly the -ig ending is an adjective ending, like English -y), (2) used somewhat endearingly, and (3) is often pronounced without the final -g in fluent speech, like other adjectives. So it's not a good example.

    Somewhat like "popup" is "tittut" ("peekaboo", literally "look-out").

    So the conclusion is, not surprisingly, that Swedish doesn't display these kinds of roots either, at least not as far as I can see.

  64. Mark Liberman said,

    March 4, 2010 @ 1:25 pm

    A relevant paper on related topics is Stefan Frisch et al., "Similarity Avoidance and the OCP", NLLT 2004.

  65. Jerry Friedman said,

    March 4, 2010 @ 1:40 pm

    Speaking of surcease, success is a second interesting miss, less close than surcease.

  66. marie-lucie said,

    March 4, 2010 @ 2:17 pm

    My point was not that potential CVCVCV "roots" with identical consonants are unpronounceable (and that the lack of them is due to phonotactic constraints), since all of GKP's examples were perfectly pronounceable in English (as names of new products, for instance), but that they are not part of a normal morphological pattern of English word-formation.

    As for the objection that English does not have too many roots going back to PIE, my reasoning is not about individual roots, but about a general, enduring pattern of word-formation. English does have (in common with other Germanic languages) other still-current pattern features going back to PIE, such as root ablaut in the inflection of "strong" verbs (a minority of verbs, but by no means exceptional or wholly archaic) and in some derivations (eg long/length, broad/breadth, strong/strength, wide/width and a few others, including occasional humorous nonce-forms). These features all apply to monosyllabic roots. That some CVCVC forms with identical consonants are created by inflectional suffixation (as in deeded or rarer) is irrelevant since it is just a coincidence in the phonological shape of the base word and the suffix (and not a form of reduplication). In any case, GKP was hypothesizing unanalyzable CVCVCV "roots" with identical consonants, which do not occur as normal English words.

    ("Teetotal" also fails the test: it is obviously from "T-total", as in "D-day", plus the fact that "total" is a borrowing. And the name "Noonan" which has been mentioned before is Celtic, not Germanic.)

  67. Acilius said,

    March 4, 2010 @ 3:20 pm

    @Faldone: Clearly CVCVC is a legal phonological structure in English. The example "deeded" was sufficient to prove this point. However, if that structure appears only in two uninflected English forms, and both of those forms are words that gained currency because they sound funny, it would seem likely that English has, for whatever reason, a bias against forming uninflected words with that shape.

  68. Alex Fink said,

    March 4, 2010 @ 3:46 pm

    If CiVCi and CVC with homorganic Cs and so forth are cross-linguistically dispreferred root shapes, what happens to them? Roots must come to take such forms by sound change now and then. And I can't think of any regular sound change whose effect is place of articulation dissimilation in a CVC context, or with conditions blocking it from producing homorganic Cs in this context. (For phonation, we have e.g. Grassmann's law.)

    So, do such words tend to be subjected to irregular dissimilation? Do they tend to simply get lost? Do typical languages just have sufficiently many affective and other such new-coined words that all we're seeing is the strength of the *CiVCi dispreference on such words?

  69. Lou Hevly said,

    March 4, 2010 @ 4:18 pm

    As far as other languages go, the only common word fitting this pattern I could find in Catalan is cacic, and here the second 'c' is pronounced 's'.

  70. Acilius said,

    March 4, 2010 @ 4:45 pm

    @Marie-Lucie: You say it is obvious that "teetotal" originated as an alternate spelling of an expression "T-total" where the introductory "T" stands in for an emphatic repetition of "total" as in "D-Day." Yet over the years I'd heard many English speakers claim that "teetotal" came from an expression "tea-total," and I believed them until I read your comment and looked the word up in the dictionary to get the full story. If the word's actual origin were in any sense obvious, I doubt that the folk etymology would have circulated so widely or that a highly educated native speaker of English with a professional interest in etymology would have accepted it for so long.

    At any rate, thank you for sending me to the dictionaries where I found the interesting little story of Richard Turner and his temperance lectures. There's no telling where these comment threads will lead a person…

  71. Jason L. said,

    March 4, 2010 @ 4:59 pm

    These aren't "native" or roots, of course, but even than "nonane" is "nonenone". If you want to get ridiculous, you could specify the isomerism as "n-nonenone" or "neononenone".

  72. marie-lucie said,

    March 4, 2010 @ 6:35 pm

    Acilius: I'd heard many English speakers claim that "teetotal" came from an expression "tea-total,"

    So had I, but it did not sound right to me. If it came from "tea-total", why not spell it that way, at least in some cases? Were the teetotallers actively pushing tea as a beverage, rather than campaigning for total prohibition of alcohol? No, they consumed several kinds of non-alcoholic beverages, without a preference for tea. Folk etymology always trees to relate at least part of unusual words to similar words they already know.

    (After looking up Wikipedia under "teetotaler")
    The section on "etymology" gives several explanations but basically agrees that "tee" here means "T".

  73. Kapitano said,

    March 4, 2010 @ 6:39 pm

    Sussurus?

    [Sorry, no sale. This has the form CiVCiVCjVCi, where i and j are distinct. —GKP]

  74. Stephen Jones said,

    March 4, 2010 @ 8:10 pm

    As far as other languages go, the only common word fitting this pattern I could find in Catalan is cacic, and here the second 'c' is pronounced 's'.

    From the Spanish 'cacique' imported from the Taino language used in the Bahamas and Antilles.

  75. Tom said,

    March 4, 2010 @ 11:18 pm

    The earlier comment about babbling has me thinking more about language development/toddler speech here. Certainly, duplicating consonant sounds (or at least features) is a feature of my toddler speech — noodle->doodle, drink->nink, etc. That said, while I can think of lots of words in toddlerese that follow a CVC pattern and others that follow a CVCVCV pattern, I can't come up with any examples at the moment of CVCVC even there.

    I'm not sure how this would fit with the prohibition. On the one hand, it would seem that features or whole consonants migrating from final consonant to early consonant is a feature of toddler speech and might be a pattern that could happen in adult speech/word change as well. On the other, perhaps the frequency of these duplicative sequences in children's speech explains why they show up so frequently in words with strong affective connotations. All this, of course, seems to apply mostly to CVC, though, and not the CVCVC we're supposed to be searching for.

  76. Randy Alexander said,

    March 4, 2010 @ 11:59 pm

    Brett: I'm curious to know what instigated this search.

    This curious search was instigated by the conflux of a laptop with an internet connection, a boring meeting, and Geoff Pullum's exquisitely fertile mind. "It was just a thought that came to me as I sat in a boring meeting this afternoon, in a room that luckily had wireless so I could log in and talk to you!" More! More! More!

  77. Pekka K. said,

    March 5, 2010 @ 3:22 am

    Addition to the other languages file from a native Finnish speaker.

    I don't think Finnish has any genuine roots that fit this pattern, or a similar pattern ending in a vowel. All the examples I could come up with are either verbs in the dictionary form that include an extra t from the first infinitival ending, are causative constructions like totuttaa (to make something accustomed to something), which is derived from tottua (to become accustomed to something), or don't have all the repeated consonants in most of their inflected forms.

    An example of the last case is the word nainen (a woman; nominative singular), whose stem in most inflected forms is nai-.

    It's very easy to construct (and use) words in Finnish that have three repeated consonants with possible gemination. Kokkoko? Mummomme. Teetät! Sissisi etc.

    There is also the jocular word teetätytit (you had someone else do), which consists of multiple causative suffixes on the word tehdä (to do). As tempting as it would be to interpret it meaning very high-level delegation, nobody really counts how many causative suffixes it is supposed to have :)

  78. Acilius said,

    March 5, 2010 @ 9:37 am

    @Marie-Lucie: I don't doubt that the etymology is as you've described, and I'm grateful to you for calling it to my attention. My disagreement is with your claim that it is obvious. I would say that it is far from obvious.

  79. marie-lucie said,

    March 5, 2010 @ 11:39 am

    Acilius: OK. It just became "obvious" to me in "a flash of inspiration", but I had seen the word for years without particularly thinking about an alternative to the "tea" interpretation.

  80. JR said,

    March 5, 2010 @ 5:49 pm

    "Dated", also, but it's inflected. One might coin a form "noonin'" by analogy from "nooner".

    I was thinking about "meemaw" as a grandmother in the Ohio River Valley; I wonder how common "Pop-pop" as a children's name for a grandfather is.

    Example from Japanese: "Nannin", "how many people?" Depending on the moraic nasal. Other Japanese examples are probably very rare.

  81. marie-lucie said,

    March 5, 2010 @ 7:00 pm

    Pop-pop would not qualify as it has the shape CVCCVC, not CVCVC(V).
    Meemaw sounds like a borrowing from an Eastern European language ("Mima").

  82. Ellen K. said,

    March 7, 2010 @ 9:46 am

    Marie-Lucie, we are talking spoken words, not written words. I'm pretty sure for most people, when spoken, pop-pop would have just one P, not two. Although, as a reduplication, I believe it does not qualify as a root.

  83. Aaron Toivo said,

    March 7, 2010 @ 4:28 pm

    Okay, two days later and after a full re-read of all the comments, I have some serious issues that I hope Mr. Pullum might address more fully.

    1. Exactly where is the evidence that C1VC1VC1 appears less often than statistically expectable? Even if you accept none of the proposed examples, is zero examples actually less than statistically expected? I'm skeptical.

    2. You are apparently accepting only roots of exactly the form CVCVC. This raises several problems. We have relatively few CVCVC roots in the first place, so this reduces the chances of finding any particular combination of Cs in roots of this shape. Further, the CVCVC roots we do have are mostly non-native, so we must accept loanword examples or else the whole question becomes moot. (Most native roots are monosyllabic.) And finally, if there really is a phonological constraint against C1VC1VC1, you would not expect such minor quibbles as an extra vowel, or extra material at one end or the other, to make any big difference. Thus 'baobab' and 'teetotal' are being thrown away on thin excuses.

    3. Even if there is far less C1VC1VC1 than expected statistically, which I doubt, this could merely be historical accident rather than anything motivated against by our present phonological rules. Obviously everyone can pronounce 'doodad' and 'teetotal' and 'nonane' and 'popup' just fine. Contrast this with how many native Anglophones normally succeed in pronouncing initial velar nasals: not very many. That's a real phonological constraint.

  84. Mark Rosenfelder said,

    March 7, 2010 @ 8:32 pm

    Over here I happen to have a list of English words in phonemic spelling. I wrote a quick program to do some basic testing. (It's just a handy list of common lexemes; so things like plurals and past tenses won't occur, but I made no attempt to check for roots or foreign words.)

    Out of 5180 words, 510 (or 10%) matched the pattern CVCVC.

    There were 55 (or 1.1%) that had any two consonants the same. None had all three the same (the list doesn't happen to have doodad).

    I also counted the CVCVC words by final consonant; they prove to be highly unevenly distributed:
    r 136
    l 84
    n 80
    t 46
    s 37
    ng 33 (most of these are -ing words)
    d 18
    j 17
    k 13
    sh 12
    m 10
    v 9
    p 5
    z 3
    ch 2
    th 2
    f 2
    g 1
    b 0
    h 0
    w 0
    y 0

    It does seem possible that an absence of C1VC1VC1 is just to be expected given the relatively small frequency of CVCVC words. Uneven distribution of phonemes plays a role as well– you're just not going to find initial ng- or final -h, and while *gaggig doesn't appear, mere final -g is vanishingly rare. (The one example in the list I'm using is 'fatigue').

  85. Josh said,

    March 8, 2010 @ 3:08 am

    Coca Cola?

    The way I say it, (when I don't shorten it to Coke) it comes out as one word: Cocacola. Compare to Pepsicola.

  86. Joel Kalvesmaki said,

    March 8, 2010 @ 11:24 am

    Prompted by the query on whether this suits (ancient) Greek or not, I searched the Thesaurus Linguae Graecae and found that the -CVCVC- pattern occurs quite frequently. Many of these are inflected forms of verbs, others are proper nouns or made-up words that frequent magical texts. It is unclear to me how strictly to apply the no-inflected-forms rule from English to Greek. Nevertheless, the pattern appears frequently, even in ordinary words. Μίμημα is a perfect example. Or try on the inflected μεμίμημαι for size. At any rate, speakers and writers of ancient Greek would have had no aversion to words made of reduplicated consonants.

  87. Army1987 said,

    March 8, 2010 @ 11:52 am

    @Yonatan Zunger: different allophones for the same consonant depending on whether it's in the onset or in the rhyme of a syllable occur throughout the phonological system (Wells 1990) regardless of reduplication and even in absence of morpheme boundaries (titter and litter rhyme with fitter and sitter).
    @JR: Nannin is two morphemes, isn't it? And if we consider the zero onset of あ, い, う, え, お as a "consonant", there's aoi "blue"… that's two morphemes too, IIRC.
    Nonane reminds me of nonante which is Belgian (IIRC) French for 90 (for Standard French quatre-vingt-dix). Which in turn makes me think that the fact that Latin nonaginta became Italian novanta might show that the bias against CiVCiVCi is more real than one could imagine.

RSS feed for comments on this post