Words, letters, and an unusual Scrabble turn

« previous post | next post »

Last month, I taught a short course on "Corpus-based Linguistic Research" at the LSA Institute in Ann Arbor, in which the participants were asked to do individual projects. One of the undergraduates in the class, Alex R., undertook to examine the time-course of variability in English spelling, starting with the Paston Letters, which are "a collection of letters and papers consisting of the correspondence of members of the Paston family of Norfolk gentry, and others connected with them in England, between the years 1422 and 1509".

There's plenty of variation — here's Alex's inventory of the some of the ways that Wednesday is spelled in that collection:

For context, here's the start of one letter, from Agnes Paston to her son John, written in 1447:

Soon, I grete ȝow wel wyth Goddys blyssyng and myn; and I latte ȝow wette þat my cosyn Clere wrytted to me þat sche spake wyth Schrowpe aftyre þat he had byen wyth me at Norwyche, and tolde here what chere þat I had made hym; and he seyde to here he lyked wel by þe chere þat I made hym. He had swyche wordys to my cosyn Clere þat lesse þan þe made hym good chere and ȝaf hym wordys of conforth at London he wolde no more speke of þe matyre. My cosyn Clere thynkyth þat it were a foly to forsake hym lesse þan ȝe knew of on owdyre as good ore bettere, and I haue assayde ȝowre sustere and I fonde here neuer so wylly to noon as sche is to hym, ȝyf it be so þat his londe stande cleere. I sent ȝow a letter by Brawnton fore sylke and fore þis matyre be-fore my cosyn Clere wrote to me, þe qwyche was wrytten on þe Wednysday nexȝt aftyre Mydsomere Day. Ser Herry Ynglows is ryȝth besy a-bowt Schrowpe fore on of his doȝhteres.

The usual story about English spelling regularization is that it developed as a result of printing. Alex is interested is the hypothesis that spelling was already becoming more consistent in hand-written documents before printing would have had any effect, perhaps due to some of the same forces that tend to create linguistic consistency in a speech community.

One of the problems in this general area is that the available corpora are generally not lemmatized — that is, when a text says "Wendysday" there's no straightforward automatic way to determine that this represents the same word as other letter-strings like "Wendisdaye" or "Wednysdaye" or "Wednysday".

Although there are many programs that purport to "lemmatize" English text, none of them are adequate even for modern text in standard spellings, since there is no standard way to identify English words at the level of dictionary entries or major sub-entries — there's a commonly-accepted  fiction that the letter-string corresponding to the standard spelling of the stem ought to be good enough.  And for older texts, or for modern texts with non-standard spelling, even that inadequate solution is not easily available.

I pointed Alex in the direction of some interesting recent work by Jacob Eisenstein ("What to do about bad language on the internet", NAACL-HLT 2013) on the analogous problems in "normalizing" modern social-media text. Alex's background is in the humanities, so the world of computer text hacking is new to him, but he's making good progress, as you'll see below.

As far as I know, no one has yet made a serious attempt (for instance) to learn a weighted transducer that would connect letter-strings in historical texts to the corresponding modern spellings — much less to do what we really need, which is to connect such letter-strings to stable lexical identifiers at the level of entries and major sub-entries in a work like the Oxford English Dictionary. At the recent OED Symposium, I proposed that the OED should work with others to build a large historical corpus annotated with such identifiers, and of course also to create taggers that would do this annotation automatically. This would imply licensing the identifiers for appropriate use by others — an alternative approach would be to try to extend the Wiktionary in directions that would make such a project possible.

Anyhow, what reminded me of these issues today was an email from Alex, which I reproduce below:

I hope you are well. I'm halfway through my time in Edinburgh, working at the Festival doing technical production. It's been hectic and stressful but I'm still finding time to work on my Python skills and the Paston Letters. I'm hoping to get in touch with Jacob soon and learn more about finite state transducers and his work.

I thought the following might amuse you – my first breakfast experiment! I wanted to practise processing the XML file I have to remove all the guff and play around with lists/dictionaries/functions/loops/frequencies/tokenisation/etc so I wrote some stuff in Python to do this. It took around 30 minutes.

"Suppose you are playing a non-standard variant of Scrabble. The board is large with each side being over a million tiles wide. It has no bonus letter/word score tiles at all.

It is the first turn and you get to place your tiles first. By a stroke of luck, you notice that the 841,995 tiles you are currently holding in your hand will allow you to place the entire text of the Paston Letters (without spaces, numbers (unless in Roman numeral notation) or punctuation marks) in a straight line along the middle of the board.

Before doing so, you decide to calculate the total score.

The standard letter scores are the same as in Present Day English with the following additions based on frequency profiles: ȝ (yogh) is worth the same as "Q" at 10 points, þ (thorn) is worth the same as "K" at 5 points. French "é", despite having the same frequency as Q is only worth 1 point, the same as "e", on account of England's friction with France during the period the Paston Letters were written.

The final score is 1,594,464 (plus an extra 50 for using all your tiles)."

In my experience, the impulse to have fun programming is an excellent predictor of the rate of skill development.

By the way, the OED gives these variants for Wednesday:

α. OE Wodnesdæg, OE Wodnesdoeg (Northumbrian), OE Wodnessdæg, lOE Wodenesdei, lOE Wodnes dægge (dative), lOE Wodnesdæig, lOE Wodnesdeg, lOE Wodnesdeig, lOE Wodnosdæg, lOE–eME Wodnesdei, eME Wodnesdæȝ, eME Wodnesdawes (plural), ME Wodeinsday, ME Wodenesday, ME Wodenisday, ME Wodenysday, ME Wodinsdai, ME Wodnesday, ME Wodnysday, ME–15 Wodensday, 15 Wodinsday; Sc. pre-17 Vodenisday, pre-17 Vodinsday, pre-17 Vodnisday, pre-17 Vodynnis day, pre-17 Voidinisday, pre-17 Woddinnesdaye, pre-17 Woddinnisday, pre-17 Woddinsday, pre-17 Woddnesday, pre-17 Woddynsday, pre-17 Wodenisday, pre-17 Wodinsday, pre-17 Wodnisday, 18–19 Wodensday; N.E.D. (1926) also records a form lME Wodinsday.  β. eME Wednesdei, eME Weodnesdei, ME Weddenesday, ME Weddensdaye, ME Weddynisday, ME Wedenesday, ME Wedenisdai, ME Wedenysday, ME Wednesdai, ME Wednesseday, ME Wednysdaye, ME Wedonesday, ME 16 Wedensday, ME–15 Wedinsday, ME–15 Wednysday, ME–15 Wedynsday, ME–16 Wednisday, ME– Wednesday, lME Weddysday, 15 Weddinsday, 15 Weddynsday, 15 Wedensdaye, 15 Wedenysdaye, 15 Wednesdaie, 15 Wednisdaye, 15 Wednsdaye, 15 Wedynsdaye, 15–16 Wednesdaye, 16 Weddensday, 17 Wedonsday; Sc. pre-17 Vadinsday, pre-17 Vadynisday, pre-17 Veddensday, pre-17 Veddnesday, pre-17 Veddnsday, pre-17 Veddyinsday, pre-17 Veddynisday, pre-17 Vedenysday, pre-17 Vedinnisday, pre-17 Vedinsday, pre-17 Vednesday, pre-17 Vednisday, pre-17 Vednysday, pre-17 Waddinsday, pre-17 Wadinesday, pre-17 Wadinsdaye, pre-17 Wadnysdaye, pre-17 Weddansday, pre-17 Weddensday, pre-17 Weddenseday, pre-17 Weddinisday, pre-17 Weddinissday, pre-17 Weddinsday, pre-17 Weddnesday, pre-17 Weddnysday, pre-17 Weddynisday, pre-17 Weddynnisday, pre-17 Wedenisdaye, pre-17 Wedinday, pre-17 Wedinsday, pre-17 Wednisday, pre-17 Wednysday, pre-17 Wedynnisda, pre-17 Wedynsday, pre-17 Wedynysday, pre-17 Widinsday, pre-17 17 Wadinsday, pre-17 17– Wednesday, pre-17 18– Wadnesday, 17 Wedensday, 17 Wednsday, 17 Wednsdy, 17– Wadensday, 18 Wadnsday, 18 Wedsinday, 19– Wadsday; N.E.D. (1926) also records forms ME Wedonesdai, lME Weddynsday.  γ. eME Wendesdei, ME Wendesdai, ME Wendesday, ME Wendesdaye, ME Wendisday, ME Wendisdaye, 19– Wensdeh (Eng. regional (Yorks.)); Sc. pre-17 Wandisday, pre-17 Wendinsday, pre-17 Wendisday, pre-17 17 Wendsday; N.E.D. (1926) also records a form lME Wyndenesse day.  δ. ME Vennysday, ME Wannysday, ME Wanysday, ME Wennessday, ME Wenstay, ME Wenysday, ME Wonnysday, ME Wonysday, ME–15 Wenesday, ME–15 Wennesday, ME–15 Wennysday, ME–16 Wensdaie, ME–16 Wensdaye, ME–17 Wensday, lME Whenys day, lME Wonesday, lME Wonesdaye, 15 Wensdye, 16 Weansday, 18 Wennesdei (Irish English (Wexford)); Sc. pre-17 17– Wensday, 17– Wansday; N.E.D. (1926) also records a form ME Wannesdai.

I suspect that the list is incomplete, and hereby offer a free lifetime LLOG subscription to the first reader who can find a historically-attested variant that's missing from the OED's list.

 



35 Comments

  1. Bloix said,

    August 8, 2013 @ 10:04 am

    The last entry on the graph – Wythsonday – might not be Wednesday at all – it might be Whitsunday.

    BTW- I listen to recorded books. Today the readers are actors, but some of the older ones, before audio-books became a serious commercial enterprise, were read by school-teacher types. One of these readers – British, although I couldn't tell you from where – distinctly said "WED-ins- day." The OED says that the D is "not infrequently heard" in the North of England, but that was a hundred years ago.

  2. Levantine said,

    August 8, 2013 @ 10:24 am

    Bloix, the pronunciation of 'Wednesday' with the first D sounded is still common (perhaps even usual) in Scotland. I don't know about Northern England, but I wouldn't be surprised if the same pronunciation occurs there as well.

  3. wally said,

    August 8, 2013 @ 10:36 am

    I was surprised by the claim that ȝ (yogh) has the same frequency as Q, when I noticed it twice in the first line in the excerpt from the letter from Agnes Patton, and then counted it 10 times in that except. Well, at least as displayed on my computer.

    [(myl) That paragraph seems to have been atypical: the whole collection contains 900 instances of lower-case 'q', 46 of upper-case 'Q', and 1305 instances of (lower-case) yogh.]

  4. David Denison said,

    August 8, 2013 @ 10:42 am

    Hi Mark

    Do you know about VARD?

    http://www.comp.lancs.ac.uk/~barona/vard2/

    best
    David

    [(myl) I didn't — thanks for the pointer! But how about going beyond "modern letter string" as a substitute for "lemma ID"?]

  5. Sandy Nicholson said,

    August 8, 2013 @ 10:46 am

    I would have concurred with Levantine about the Scottish pronunciation, at least based on introspection of my own speech, though I reckon the ‘d’ is more likely to surface as /ʕ/ than as /d/ for me. However, I’ve just observed the speech of a couple of Edinburgh-dialect speakers (my wife and son), and they both produced /wɛnzde/ in the first instance. Probably outliers. :o)

  6. Eric P Smith said,

    August 8, 2013 @ 12:01 pm

    I'm a Scottish Standard English speaker, and I reckon the usual way to pronounce the /dn/ in 'Wednesday' in SSE is as follows. The /d/ is prepared in the usual way by stopping the mouth with the tongue on the alveolar ridge, but that stop is never released: instead, the back of the velum is lowered to make an /n/ with the oral stop still in place. I haven't a clue how to transcribe it. I do the same thing with words like 'student' and 'couldn't'. The pharynx and glottis are open throughout.
    Can anyone tell me how to transcribe it? I've often wondered.

  7. dw said,

    August 8, 2013 @ 12:27 pm

    @Eric P Smith:

    Your description sounds like a voiced alveolar stop with nasal release. The IPA is dⁿ

    It would be a common articulation in a word like "hidden", which could be transcribed, ultra-explicitly, as ['hɪdⁿn̩]

  8. Alon Lischinsky said,

    August 8, 2013 @ 1:47 pm

    Although there are many programs that purport to "lemmatize" English text, none of them are adequate even for modern text in standard spellings, since there is no standard way to identify English words at the level of dictionary entries or major sub-entries

    One important problem in this regard is that there is little consistency, even among lexicographers, as to the granularity of sub-entries. Take three different dictionaries and you will find the various meanings of, say, knight grouped in inconsistent manners. This is especially important when different spellings for the same underlying ‘entry’ eventually become different terms, as happened in English with curtsey and courtesy (both < MEn curtesie) or draught and draft (both < MEn draught).

    And the wisdom of the crowds does not seem to provide a solution. Experiments with semantic annotation using Mechanical Turk or similar non-expert annotators show pretty poor inter-rater reliability. Snow et al's famous 2008 paper was an exception in that it allowed only three, clearly-distinct word-senses; attempts to replicate their results in less constrained environments have been consistently negative.

  9. Eric Ringger said,

    August 8, 2013 @ 3:00 pm

    Alon, disagreement among lexicographers is indeed the bane of the word-sense disambiguation problem. That said, crowds can provide better aggregate wisdom about word senses than earlier Mechanical Turk studies (such as the well known but over-simplistic study you mention by Snow et al. (2008)) suggested. See the recent NAACL paper by UCLA's David Jurgens titled "Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels" (http://aclweb.org/anthology-new/N/N13/N13-1062.pdf ). Highlight from the abstract: "Our findings show that given the appropriate annotation task, untrained workers can obtain at least as high agreement as annotators in a controlled setting, and in aggregate generate equally as good of a sense labeling." I don't want to give you a reason not to go read the article, but the key idea is (again from the abstract) that "untrained annotators are allowed to use multiple labels and weight the senses".

  10. Eric Ringger said,

    August 8, 2013 @ 3:28 pm

    Mark, regarding your wish for "a serious attempt … to learn a weighted transducer that would connect letter-strings in historical texts to the corresponding modern spellings", I think you could be delighted by the ongoing work of Grzegorz Kondrak at the University of Alberta. Under Kondrak's advisement and with other collaborators, Sittichai Jiampojamarn introduced a method for learning transducers they call DirectTL+. Code is available at https://code.google.com/p/directl-p/ , and the primary paper can be found in the Proceedings of ACL 2010. Based on more recent work by Kondrak and his students, the model and method are holding up well and being put to good use (see NAACL 2013 for papers using DirectTL+). As for mining transliterations, their paper http://www.aclweb.org/anthology-new/W/W10/W10-2405.pdf may be useful to you and your student.

    [(myl) Thanks for the suggestion — I'll look forward to comparing it with some of the other transducer-learning approaches that I've encountered.]

    As for your wishes concerning linkage between the OED and a large historical corpus of English, we (collaborators, students, and I) are working on employing machine-assisted human annotation of a new historical corpus of classical Syriac with entries from a respected dictionary. With some prodding and the right collaborators, we could turn our sights on English. Happy to discuss further. Some background on our project can be found here: https://facwiki.cs.byu.edu/nlp/index.php/Machine-Assisted_Annotation

    [(myl) Thanks for this as well — and I will certainly read your pages and contact you for further discussion. Because Tim Buckwalter's Arabic morph analyzer has always produced more-or-less-stable lemma IDs (where "lemma" means roughly "Arabic word sense that corresponds to a given English gloss), the Penn Arabic Treebank has always involved human-disambiguated lemmas (or "lemmata", if you prefer), as well as morphosyntactic features. (See e.g. "A New Approach to Lexical Disambiguation of Arabic Text" EMNLP-2010 for some discussion of automated tagging in this framework.) The core problem for English would be deciding on a set of lemma IDs to use, in my opinion.]

  11. Rubrick said,

    August 8, 2013 @ 3:55 pm

    My turn. I swap 3 tiles.

  12. mollymooly said,

    August 8, 2013 @ 5:38 pm

    Cardinal Wolsey used "Wennsday" in a letter to Henry VIII in 1527.

  13. Eric P Smith said,

    August 8, 2013 @ 5:50 pm

    @dw: Thankyou.

  14. Ralph Hickok said,

    August 8, 2013 @ 7:17 pm

    In genealogy, something called the Soundex code is used to find variant spellings of names. Beyond that, I don't much about it, but I wonder if it might possibly help with variant spellings of other words.

    [(myl) There's are recent versions of such algorithms as well, e.g. Metaphone and Double Metaphone. I'm skeptical that these approaches will work well for vocabulary at large, where lexical neighborhood densities tend to be greater, and items that sound similar are much less likely to be related.

    A more reliable method, I think, would be to start with a large-enough collection of relevant historical training material; to learn a weighted string-to-string transducer; and then for each input string, to produce the N-best matches from the standard spellings in a word list. These transduction scores would be combined with scores from a simple local (e.g. bigram) language model, and perhaps a larger-scale topic model. There are some interesting issues involved in how to do all of this efficiently, but the basic parts and the way to combine them are well understood from speech recognition technology.

    Another advantage of this approach is that it could be combined with an attempt to do something about the dreadful state of OCR for older works…]

  15. Errorr said,

    August 8, 2013 @ 11:00 pm

    Wennisday

    @Ralph Hickock found his first, but from the same source John Lisle used Wennisday in 1546. I assume that is John Dudley (viscount Lisle) first Duke of Northumberland writing back to the privy council as Lord of the Admiralty on a diplomatic trip to Paris.

    He would become the de facto regent after Henery VIII's death and would be executed by Mary when he tried to put his daughter in law and cousin Jane Grey on the throne. (he was a Grey through his mother) He was popularly known as the "wicked Duke" until revisionism in the 1970's.

    http://books.google.ie/books?id=BPsUAAAAQAAJ&dq=Wennisday&pg=PA251#v=onepage&q=Wennisday&f=false

  16. Jon M said,

    August 9, 2013 @ 8:29 am

    There might be an alternative way to look at this question.

    In a corpus with lots of variant spellings we should see a different clustering structure (based on word similarities) than one where words have just one variant.

    That is you should find that defining clusters based on a very close similarity greatly reduces the number of words in the corpus, while using the same reduction on a non-variable spelling corpus would not reduce the number much.

    Even just a measure of how many words in the corpus are at hamming distances 1:10 over time might be helpful.

  17. Jon M said,

    August 9, 2013 @ 8:33 am

    Just finishing off the previous comment. Plotting over distances 0:10 would be more interesting. Modern text would see a large spike at zero (i.e. for the average word there are many other words in the corpus that are exactly the same as it but not many that are just a single change different, whereas in a non-regular spelling corpus the distribution might be more uniform).

    Sounds like an interesting project anyway!

  18. David Morris said,

    August 9, 2013 @ 8:44 am

    ESL students tend to pronounce 'Wednesday' as the full three syllables; correspondingly 'January' and 'February'. I tell them to get lazy about their pronunciation in these cases.

  19. languagehat said,

    August 9, 2013 @ 9:33 am

    correspondingly 'January' and 'February'.

    I (a native English speaker from the US) pronounce both words exactly as they are written. I understand lots of people omit the /r/ in the latter word, but how else would you say "January"?

  20. Mike said,

    August 9, 2013 @ 9:34 am

    I was surprised by the v-initial spellings of Wednesday. Does this reflect a pronunciation with an actual /v/, or is that the use of v as a substitute for u?

  21. Breffni said,

    August 9, 2013 @ 11:40 am

    I didn't know until I read this thread that the three-syllables-with-/d/ pronunciation wasn't universal. I'm sure I often reduce it in normal speech, but definitely in my mental lexicon it's Wed-nz-day. Now I'm not even sure how other Irish people say it.

    Does "Wed" for Wednesday seem an anomalous abbreviation for most of you, maybe along the lines of "Mic" for microphone?

  22. Eric Ringger said,

    August 9, 2013 @ 1:21 pm

    Jon M., I also think clustering alternate spellings is a good idea. There's some work in topic modeling for noisy (e.g., OCRed) documents that incorporates essentially this idea. A good example is a CIKM 2013 paper by Penn State's Yang and Lee (http://pike.psu.edu/publications/cikm13.pdf ).

  23. Eric Ringger said,

    August 9, 2013 @ 1:46 pm

    Mark, I like the idea of using learned transduction for correcting OCRed historical works. The errors from OCR are unlike spelling variation, so discriminatively training a correction transduction can be difficult. Our (students, colleagues, and I) approach to improving OCR transcriptions of machine-printed historical works involves choosing among multiple hypotheses (from multiple OCR engines or multiple image binarizations) using a discriminatively trained model. The training data consists of OCR output on synthetic document images with known transcriptions. Perhaps the same training data could be used for the method you are advocating. This page summarizes our efforts to date (minus two papers that are in press): https://facwiki.cs.byu.edu/nlp/index.php/Historical_Document_Recognition

  24. mollymooly said,

    August 9, 2013 @ 2:45 pm

    @languagehat: In BrEng, unstressed -ary is normally elided to -ry, so January and February have 3 syllables rather than 4. But 4 syllables is hardly "wrong" for EFL purposes, at least until you get to advanced dialect-specific fine-tuning.

  25. Bloix said,

    August 9, 2013 @ 3:03 pm

    Breffni, what you're saying is just astonishing for me. It's surprising enough to learn of actual living people who themselves pronounce the D, but to hear someone say that they thought everyone pronounces the D is truly mind-boggling. It's like having my neighbor tell me off-handedly that his new coat is made of woolly mammoth wool.

    By utter coincidence, I will be in Ireland for the first time in my life in a couple of weeks. I am going to provoke as many people as I can into saying Wednesday.

  26. Eric P Smith said,

    August 9, 2013 @ 5:26 pm

    @languagehat: ‘January’ can be /ˈdʒanjʊəri/ or /ˈdʒanjʊri/. An ESL student might even say /ˈdʒanjʊari/. David Morris may speak for himself, but until he does I would surmise that he is encouraging his students to 'get lazy' by saying /ˈdʒanjʊri/.

  27. Breffni said,

    August 9, 2013 @ 5:45 pm

    Bloix: I'm amazed myself, though in the opposite direction. My parents confirm they pronounce it the way I do, but at least they were aware of the alternative (because they lived several years in London, maybe). A Dublin friend born in the 1980s thinks I'm odd for pronouncing the D. I'm not in Ireland at the moment, so that's all I've got.

    So I don't know what you'll find in Ireland – regional variation? Generational? You'll have to elicit careful pronunciations, because I think the difference all but vanishes in fluent speech, which partly accounts for why I've missed it all these years. Anyway, that's what I'm telling myself by way of consolation.

  28. Suburbanbanshee said,

    August 9, 2013 @ 11:03 pm

    What saddens me is that I can't immediately think of any Celtic song that includes "Wednesday," except the one about "I got up on an X night, as drunk as drunk can be," and I think that one was sung by Dubliners. So presumably it doesn't count.

  29. languagehat said,

    August 11, 2013 @ 8:46 am

    @languagehat: ‘January’ can be /ˈdʒanjʊəri/ or /ˈdʒanjʊri/. An ESL student might even say /ˈdʒanjʊari/. David Morris may speak for himself, but until he does I would surmise that he is encouraging his students to 'get lazy' by saying /ˈdʒanjʊri/.

    I learn something every day. Being a Yank, I've never used or heard anything but the four-syllable full-vowel /ˈdʒænjʊˌɛri/. (Well, presumably I've heard the UK version in movies and TV shows but never paid attention to it.)

  30. Garrett Wollman said,

    August 12, 2013 @ 7:56 am

    @languagehat: I believe /'fEbri/ (two syllables) is also well-attested for BrE-speakers. (Sorry, I have no input method for true IPA; above transcription is ASCII IPA.) OED's first BrE pronunciation (which i can cut and paste) is /ˈfɛbr(ər)i/.

  31. Catanea said,

    August 13, 2013 @ 2:22 pm

    How can you sing "Shine on Harvest Moon" without "Jan-you-air-ee, Feb-roo-air-ee, June or July"?

  32. Davis said,

    August 13, 2013 @ 3:06 pm

    I listen to lots of audio books, many read by British actors/readers, usually using RP (or whatever it's called nowadays). They usually pronounce the D in Wednesday. Seems I hear this on the BBC a lot too, among the announcers speaking in the upper class pronunciation that used to be nearly universal there.

  33. Colin Fine said,

    August 15, 2013 @ 4:31 am

    I'm surprised to see Breffni describing their pronunciation of "Wednesday" as three syllables. I'm very familiar with the pronunciation Eric Smith describes, with the unreleased 'd' (though I don't normally use it myself), but I would certainly describe it as two syllables, with/dnz/ as a cluster.

  34. steve piantadosi said,

    August 16, 2013 @ 4:18 pm

    One quick hack is to segment each syllable and then check all possible recombinations of them. I played around a little like this just guessing at gaps and checking them on google books. It's funny how easy they are to find! "Widdensday" and "Woddensday" both occur.

    By the way, there are many OCR errors in google books that give other spurious Wednesdays that show up in the google search, but are not in the actual text (e.g. "wednetday")

  35. Breffni said,

    August 17, 2013 @ 5:11 pm

    Colin: a cluster, in the sense I'm familiar with, is a sequence of consonants that may occur within a syllable, and /dnz/ can't – certainly not in my English, and not in any other variety that I can think of. Supposing (perhaps wrongly) that you pronounce 'deadens' with a nasally-released [d], is that a monosyllable for you?

    And where would you put the syllable boundary in the /d/-ful version of 'Wednesday'? Does your /dnz/ cluster belong in the coda of the first syllable, or the onset of the second (along with the other /d/, giving a four-consonant cluster), or does it split, and if so, where?

RSS feed for comments on this post