{"id":3747,"date":"2012-02-05T01:30:59","date_gmt":"2012-02-05T06:30:59","guid":{"rendered":"http:\/\/languagelog.ldc.upenn.edu\/nll\/?p=3747"},"modified":"2012-02-05T10:27:16","modified_gmt":"2012-02-05T15:27:16","slug":"soundex-and-metaphone","status":"publish","type":"post","link":"https:\/\/languagelog.ldc.upenn.edu\/nll\/?p=3747","title":{"rendered":"Soundex and Metaphone"},"content":{"rendered":"<p>One of the earliest and best photographers in China was called <a href=\" http:\/\/boards.ancestry.com\/surnames.zumbrun\/23.1\/mb.ashx \">John Zumbrun<\/a>, but I have also seen his surname spelled various different ways, including Zumbrum.\u00a0 Some of his pictures <a href=\"http:\/\/hahn.zenfolio.com\/p421205104 \">may be seen here<\/a> (this site is run by Thomas H. Hahn, digital archivist of old photographs).<\/p>\n<p>As soon as I saw his surname, I suspected that it might be a variant of the Zumbrunnen among my own maternal relatives who were of Swiss German extraction.\u00a0 When I mentioned to my sister Heidi (who does intense genealogical research on our family) that I thought Zumbrun might be a variant of Zumbrunnen, she replied, \"Oh man, the variant spellings of Zumbrunnen are driving me batty.\u00a0 I have even seen Zum Pwunnen.\u00a0 Have you heard of the soundex?\u00a0 It is a way to index names &amp; deal with all of the variant spellings.\"<br \/>\n<!--more--><br \/>\nUpon looking up <a href=\" http:\/\/en.wikipedia.org\/wiki\/Soundex\">Soundex<\/a>, I found that it was developed around 1918 and was a method for indexing names in the 1880, 1900, 1910, and 1920 US Censuses.<\/p>\n<p>Soundex is still very much in use today and there is a neat <a href=\" http:\/\/resources.rootsweb.ancestry.com\/cgi-bin\/soundexconverter\">Soundex converter<\/a> that enables one to easily and quickly obtain the one letter + three digit alphanumeric code for any surname that one enters into the system.<\/p>\n<p>Essentially a phonetic algorithm for indexing names by sound, Soundex encodes homophonous names with the same alphanumeric representation so that they can be correlated despite differences in spelling.<\/p>\n<p><a href=\" http:\/\/en.wikipedia.org\/wiki\/Metaphone\">Metaphone<\/a> is an improved version of Soundex that was invented in 1990 and that takes into account irregularities in English spelling and pronunciation.\u00a0 The latest version, Metaphone 3, was brought out in 2009 and \"achieves an accuracy of approximately 99% for English words, non-English words familiar to Americans, and first names and family names commonly found in the United States, having been developed according to modern engineering standards against a test harness of prepared correct encodings.\" (Wikipedia)<\/p>\n<p>I thought that I'd give Soundex a try on a controlled body of material.\u00a0 I've long been aware that there are <a href=\" http:\/\/en.wikipedia.org\/wiki\/Spelling_of_Shakespeare%27s_name\">numerous different ways to spell the name Shakespeare<\/a>.\u00a0 In an article entitled <a href=\" http:\/\/shakespeareauthorship.com\/name1.html\">\"The Spelling and Pronunciation of Shakespeare's Name\"<\/a>, David Kathman brings together many of these variants.\u00a0 Here are the Soundex results I obtained when I entered the variants into the software:<\/p>\n<p>non-literary references<\/p>\n<p>Shakespeare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakespere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakespear\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakspeare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShackspeare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShakspere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShackespeare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShackspere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShackespere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShaxspere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShexpere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShakspe~\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S210<br \/>\nShaxpere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShagspere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShaksper\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShaxpeare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShaxper\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShake-speare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakespe\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakp\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 \u00a0\u00a0 \u00a0\u00a0 \u00a0\u00a0 \u00a0\u00a0 \u00a0\u00a0 S210<\/p>\n<p>literary references<\/p>\n<p>Shakespeare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShake-speare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakspeare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShaxberd\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShakespere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakespear\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShak-speare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShakspear\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShakspere\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nShaksper\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S216<br \/>\nSchaksp.\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S210<br \/>\nShakespheare\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakespe\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0\u00a0 S221<br \/>\nShakspe\u00a0\u00a0\u00a0 \u00a0\u00a0 \u00a0\u00a0 \u00a0\u00a0 \u00a0\u00a0 \u00a0\u00a0 S210<\/p>\n<p>Since all of these spellings refer to the same name, ideally they should all have yielded the same alphanumeric code.\u00a0 It is encouraging, though, that most of the variants come out as S221 or S216, while there are only 4 occurrences of S210.\u00a0 I have not run all of the variants through Metaphone, though I presume that it would do an even better job than Soundex.<\/p>\n<p>Nevertheless, we should not be so naive as to believe that Soundex and Metaphone can do our genealogical research for us, since they are only meant to recognize patterns that we might otherwise overlook.\u00a0 For example, the alphanumeric Soundex code for \"Mair\" is M600, but the same code is also applied to the following long list of names:\u00a0\u00a0 MAHAR | MAHER | MAHR | MAIER | MAIR | MARIA | MARIE | MARR | MARROW | MARY | MAURY | MAYER | MAYOR | MEIER | MERRIHEW | MERRY | MEYER | MIR | MOHR | MOIR | MOOR | MOORE | MORA | MORE | MOREAU | MOREY | MORR | MORROW | MOWER | MOWERY | MOWRY | MOYER | MUIR | MURIE | MURR | MURRAH | MURRAY | MURRIE | MURROW | MURRY | MYER | MYHRE | ; I'm certain that these are not all variants of the same name.<\/p>\n<p>On the other hand, although \"Mair\" and \"Maier\" <strong>are<\/strong> variants of the same surname, that is not the end of the story either.\u00a0 Before I went to my ancestral village of Pfaffenhofen, Austria in 1967, I had always assumed that \"Mair\" was an Anglicization of \"Maier\" or some other spelling of the German surname (e.g., Meyer, Meier, Mayer, Maier, Mier, Meir).\u00a0 Indeed, many people used to ask me if I were related to Lucy Mair, the British anthropologist, but I knew that could not be so because <a href=\" http:\/\/www.surnamedb.com\/Surname\/Mair\">her name<\/a> was <a href=\" http:\/\/www.houseofnames.com\/mair-family-crest\">of Scots<\/a> or <a href=\" http:\/\/genforum.genealogy.com\/mair\/messages\/170.html\">English origin<\/a>, while mine was of German derivation.\u00a0 It is interesting that I am listed in Wikipedia as being a person with the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Mair\">surname Mair in a Scots context<\/a>, though I'm sure that it won't be long after this post goes up that the Wikipedia editors shift me to the much smaller group of people named Mair in a German context.\u00a0 In any event, when I went to Pfaffenhofen, I discovered that there were many individuals whose surname in the church record books and on tombstones was given as \"Mair\", and in the Innsbruck phonebook there were scores of people surnamed \"Mair\".\u00a0 Even more surprising to me was that it was not uncommon for families to change their name from \"Maier\" (or some other spelling) to \"Mair\" and vice versa, depending upon fashion or personal preference.<\/p>\n<p>For those who might be curious, the German surname \"Mair\" derives from Middle High German <em>meiger<\/em>, meaning \"higher or superior\", often used for stewards of landholders or great farmers or leaseholders; today a Meier is generally a dairy farmer. Meier and Meyer are used more often in Northern Germany, while Maier and Mayer are found more frequently in Southern Germany.\u00a0 (This note is based upon<a href=\" http:\/\/genealogy.about.com\/library\/surnames\/m\/bl_name-MEYER.htm\"> this entry<\/a> in genealogy.about.com.)<\/p>\n<p>The main purpose of this post, however, is not to engage in genealogical investigations of the surname \"Mair\", but to bring the Soundex and Metaphone\u00a0 algorithms to the attention of Language Log readers and to suggest that they might have useful purposes for linguistic research quite apart from genealogical investigations.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>One of the earliest and best photographers in China was called John Zumbrun, but I have also seen his surname spelled various different ways, including Zumbrum.\u00a0 Some of his pictures may be seen here (this site is run by Thomas H. Hahn, digital archivist of old photographs). As soon as I saw his surname, I [&hellip;]<\/p>\n","protected":false},"author":13,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_exactmetrics_skip_tracking":false,"_exactmetrics_sitenote_active":false,"_exactmetrics_sitenote_note":"","_exactmetrics_sitenote_category":0,"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[22,24,117,56],"tags":[],"class_list":["post-3747","post","type-post","status-publish","format-standard","hentry","category-orthography","category-phonetics-and-phonology","category-pronunciation","category-research-tools"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/3747","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=3747"}],"version-history":[{"count":0,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=\/wp\/v2\/posts\/3747\/revisions"}],"wp:attachment":[{"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=3747"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=3747"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/languagelog.ldc.upenn.edu\/nll\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=3747"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}