Annals of Artificial Stupidity

« previous post | next post »

Katie Deighton, "What Can’t the Internet Handle in 2022? Apostrophes", WSJ 9/29/2022:

Sybren Stüvel is an Amsterdam-based software developer with a fairly uncommon name and a surprisingly common predicament.

As he completes the tasks of daily life, computers refuse to accept his name as valid or mangle it entirely. A credit card provider rejected his moniker, a Vancouver hotel hit bumps locating his reservation—as he stood there exhausted from a nine-hour plane trip—and an airline wouldn’t let him check into a flight. “You can imagine my stress level,” he said.

While buying insurance, he said, “They asked me to confirm that my last name is indeed Stüvel.”

Well into the internet’s fourth decade, most everything is done online. Yet some names still stump machines. Computers can defeat Garry Kasparov at chess, but some systems stumble over names containing apostrophes, numbers, hyphens or letters not commonly used in English texts. […]

“I retweet a complaint at least five times a week,” said Miroslav Šedivý, a software engineer based in Vienna and the administrator of the plainly named Twitter account “Your Name Is Invalid.”

He gravitated to the topic after companies in Germany, where he formerly lived, literally gave him a bad name: they tended to lose the “Š” and the “ý” of his last name of Šedivý. He once tracked down a piece of mail addressed to “Miroslav Ediv.” […]

“Hey Disney, let’s chat for a moment about names,” a user named Leah D’Andrea-Lee tweeted to Walt Disney Co. recently, saying the company repeatedly rejected her name when she tried to renew her Magic Key reservation pass for its Disneyland theme park. “What gives?”

Ms. D’Andrea-Lee and her husband Chris Lee are frequent Disney visitors and die-hard fans who won best in show for their cosplay at the “Mousequerade” contest at the official Disney fan club’s 2022 expo.

So it’s dispiriting, said the L.A.-based actor and costume maker, when she is addressed by one of the biggest media companies in the world as “LEAH DANDREALEE,” “Leah D&andrea” or, as her Magic Key came in the mail, the generic “Magic Key Holder.”  […]

Among the frustrated are some Irish customers of the national carrier Aer Lingus, which ironically doesn’t accept names such as “O’Neill” and “O’Brien” when taking bookings.

Aer Lingus’s booking system, called Astral, is nearly 60 years old and doesn’t cater to special characters, an airline spokeswoman said.

“We recognize the limitations of the system with respect to accepting special characters and apologize to customers for any inconvenience caused,” she said. “As part of future systems development we will consider implementing reasonable steps to address this issue.”

As a (very limited) defense of the cited crappy software systems, we can note that standards for encoding and displaying letters outside the (mutually incompatible) ASCII and EBCDIC character sets are relatively recent — and standards for entering them don't yet exist. And updating (multiple interacting) software systems involves substantial  (financial, social, and diplomatic) costs.

I call this defense "limited" because the problem is widespread and serious, and companies wouldn't (I hope) put up with similar problems in systems that they're forced to really care about, like tracking their bank balances. But there are all too many examples of Artificial Stupidity in new (and allegedly improved) bureaucratic computer systems, often out-sourced to companies that supposedly know what they're doing, which subject their users to serious difficulties that force weird, time-consuming, and unsanctioned (or even illegal) work-arounds. I'd list a few from my recent experience, but I'll spare you for now.

On the other side of the limited-name equation, I'll note what a pain it is for Unix command-line apps to deal with Apple Macintosh file names, which may have internal spaces, double quotes, single quotes, apostrophes, dollar signs, hash marks, etc. In that context, I'm with Aer Lingus.



57 Comments

  1. Jarek Weckwerth said,

    September 30, 2022 @ 10:23 am

    Well, isn't this simply a manifestation of the cultural and economic hegemony of the USA? If the typewriter (and then computer) had been invented in the Czech Republic, or even Germany, it would surely include those diacritics. Not to mention China.

    But that's the way the world it is. (The full form of) my first name includes ł, and I just never bother with it outside its motherland ;) (Even professional linguists often don't know how to deal with it though. That annoys me no end.)

  2. Jarek Weckwerth said,

    September 30, 2022 @ 10:25 am

    (Because ASCII stands for the American standard code for things doesn't it.)

  3. Y said,

    September 30, 2022 @ 10:32 am

    The Mac was the pioneer in using more varied and more characters in file names, back when DOS was limited to 8 alphanumeric characters. But to this day it excludes colons, which are reserved for separators. If you have a lot of pdfs papers, especially in the humanities, and you like the file names to echo the paper titles, you still have to bear this annoyance, nearly 40 years after the first Mac.

  4. Terry K. said,

    September 30, 2022 @ 10:35 am

    Since computer systems, as noted, have trouble with names like O'Brien or hyphenated names, both of which we have here in the USA, no, it's not as simple as "the cultural and economic hegemony of the USA". Not saying there's not a good dose that that. But it's more complex.

  5. Oskar Sigvardsson said,

    September 30, 2022 @ 10:41 am

    > As a (very limited) defense of the cited crappy software systems, we can note that standards for encoding and displaying letters outside the (mutually incompatible) ASCII and EBCDIC character sets are relatively recent — and standards for entering them don't yet exist.

    As a software developer, I have to take issue with this: Unicode has existed for a long time now, and the standard encoding for it (UTF-8) is 30 years old. Text encoding is a notoriously hard thing to program, but at this point, there is no excuse for getting this stuff wrong, and yet it happens all the time. One thing I've noticed is that Unicode and text encoding is rarely taught to Computer Science students, so you mostly have to learn "on the job", which is not ideal.

    [(myl) Yes BUT… at what point did which programming languages, database systems, editors, web browsers, user-interface apps, fonts, etc., actually accept, store, and display Unicode, at all much less correctly? I can tell you from extensive personal experience that until quite recently, the answer was "very damn few". Detailed litany of diacritic-decorated horror stories available on request…]

    That said: many of the problems pointed to have (essentially) nothing to do text encoding schemes (whether they be ASCII, EBCDIC, one of the Unicode formats, or something else), it has to do with improper quoting and escaping, especially of HTML. Take one example of

    Leah D'Andrea-Lee

    If you where to run that through a generic HTML escaper, the apostrophe is replaced and it becomes:

    Leah D'Andrea-Lee

    If you then unescape that, you get back to the real name: no information has been lost. However, if you escape that AGAIN (the string being passed between independent computer systems, such a mistake might creep in), the ampersand gets escaped and it becomes:

    Leah D'Andrea-Lee

    This has nothing to do with text encoding exactly: the apostrophe character is in ASCII. It has to do with how text is escaped and unescaped, which is a slightly different (but related) issue.

    The core issue of both encoding and escaping issues is of course is that you do not test for these kinds of cases. Another issue is lack of diversity among programmer teams: I've had the experience personally of a new member with non-ASCII characters in their name joining the team and our own software breaking. It is a hard issue to deal with, but for systems like this, it really is unacceptable to get it wrong.

  6. Oskar Sigvardsson said,

    September 30, 2022 @ 10:44 am

    Oh, the irony: your commenting system unescaped my examples, so they now just appear "correct"! When i entered them, the apostrophe was replaced with "& # 3 9;" and the ampersand with "& a m p ;" (without the spaces)

    See, text processing is hard!

  7. J.W. Brewer said,

    September 30, 2022 @ 11:04 am

    Because Irish-origin surnames of the O'Suchandsuch variety have long been common in the U.S. and American typewriters typically had an apostrophe key, technical inability to handle those (as opposed to handling letters adorned with umlauts and carons and whatnot) seems a bit hard to attribute to Americo-centrism. Unless maybe there was a pre-existing Anglosphere practice of omitting the apostrophes in such surnames in telegraphese, such that O'Neill WOULD BE RENDERED AS ONEILL STOP. On the other hand, I think a lot of early computer databases couldn't deal with surnames written out as more than one word, so that e.g. the presidential surname van Buren would turn into VANBUREN even though earlier generations of typesetters etc. had been able to handle the two-word variant and even variation between "van" and "Van" in such names.

  8. CP said,

    September 30, 2022 @ 11:08 am

    A minor problem to be sure, but my first name is often truncated to 10 letters. I’m sure because of some database limitations.

    So I am regularly referred to as Christophe. An actual name, but not quite mine. Not sure if the issue applies to last names though.

  9. phanmo said,

    September 30, 2022 @ 11:14 am

    I run into this on a regular basis, as my daughter's first name has a diaeresis (trema) in it, as does my wife's last name.

  10. Sergey said,

    September 30, 2022 @ 11:52 am

    I remember the discussion here about Kazakhstan transitioning to Latin script, with their head of government insisting that no diacritics and no letters besides the basic 26 must be used. And people here being surprised at that. Now, this post explains the reasoning behind that decision very saliently.

  11. Guy Plunkett III said,

    September 30, 2022 @ 12:55 pm

    I can commiserate to some extent, since only about half of the times does a system deal with suffixes, and some that claim they do end up addressing me as Mr Iii … Even today, a quick search of PubMed for Iii [au] yields 34 results, although none are mine because I once brought it to their attention and my papers are now listed as Plunkett G 3rd instead of Plunkett G Iii

  12. anhweol said,

    September 30, 2022 @ 12:57 pm

    The 2017 Kazakh Romanization proposal did indeed avoid diacritics, but it relied very heavily on apostrophes, so still might have defeated some of these systems.

  13. Tad Dockery said,

    September 30, 2022 @ 12:58 pm

    To build on Mark's limited defense, there are a million plinky things underlying this in computer systems. Several of those plinky things are these characters being commonly used as directives in computer languages, and issues with interpretation are going to be especially common in the old systems that make up the bulk of bureaucratic hardware and software. Those systems are old because of procurement practices meant to protect those institutions from exploitation in what are very niche markets; but as a result they're usually way behind the curve.

    (Source: I'm a software developer at the Wisconsin State Historic Preservation Office. I get to hear about our archaeologists used to be required to type everything in ALL CAPS.)

  14. Bill said,

    September 30, 2022 @ 1:57 pm

    My (Italian surname) contains an apostrophe, and I've long since learned to drop it. Either the apostrophe would cause errors, or they would truncate the name (11 character with, 10 without).

  15. Philip Taylor said,

    September 30, 2022 @ 2:02 pm

    Forgive me for being picky, but should not Irish names such as “O’Neill” [sic] and “O’Brien” [sic] actually be “Ó Neill” and “Ó Brien” ? The diacritic is a fada, not an apostophe, I believe. Entered here using Alt Gr + Shift + O on an IBM 1391406 'clicky' keyboard talking to Windows 7.

  16. Terry K. said,

    September 30, 2022 @ 2:16 pm

    @Philip Taylor

    That's, I presume, how they are written in Irish, but we are not talking about writing names in Irish. We are talking about names belonging to English speaking and writing Irish and Irish Americans. And such names get written as O'Brien, etc. And it's not up to me or to you to tell those people they are writing their surname wrong.

  17. Breffni said,

    September 30, 2022 @ 3:08 pm

    I’d add that re-Irishing those names isn’t as simple as turning the apostrophe back into a fada. O’Neill is Ó Néill, O’Brien is Ó Briain, O’Shaughnessy is Ó Seachnasaigh, etc. And that’s without considering female forms (Ní Bhriain, Uí Bhriain…).

  18. Philip Taylor said,

    September 30, 2022 @ 3:24 pm

    Well, while I thank Breffni most sincerely for his very helpful correction of my spelling of two common Irish names, I would respectfully suggest to Terry K. that the reason that (most) "English speaking and writing Irish and Irish Americans" use O'Neill, O'Brien, etc., is the very issue being discussed here. Anything that can't be represented in ASCII is likely to be rejected (or mangled, or whatever) by a large number of computer systems in current use around the world, and to cope with this inability to accomodate O-fada (Ó), etc., (most) "English speaking and writing Irish and Irish Americans" have adopted the more universally acceptable Anglicised form. See here for informed comment from one who retains the fada in his name — Rossa Ó Snodaigh.

  19. Terry K. said,

    September 30, 2022 @ 3:41 pm

    Those forms are older than computers. And Google Ngrams confirms my impression on that. (I'm not older than computers, after all.)

    https://books.google.com/ngrams/graph?content=O%27Neill%2CO%27Brien&year_start=1800&year_end=2019&corpus=en-2019&smoothing=3#

    Hopefully that link will work. I tried embedding but it didn't like that.

    No one has argued that there aren't people who retain the fada. My point was simply that these names illustrated the issue is more complex than being "simply a manifestation of the cultural and economic hegemony of the USA".

  20. Philip Taylor said,

    September 30, 2022 @ 3:56 pm

    « My point was simply that these names illustrated the issue is more complex than being "simply a manifestation of the cultural and economic hegemony of the USA" » — I could not agree more. Indeed, I very much suspect that my fellow Britons have, ever since the Anglo-Norman invasion of Ireland in 1169, done their very damnedest to force the Irish to eschew their heathen spellings and adopt names and spellings that were more acceptable to their British would-be conquerors.

  21. Paul Garrett said,

    September 30, 2022 @ 4:33 pm

    About year 2000, I had a student in my cryptology class whose first/given name was "Ry4an". I asked him how the "4" was pronounced. He said it was silent.

  22. Tobias said,

    September 30, 2022 @ 5:22 pm

    To be fair, names are hard to get right in software:
    https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

  23. Jarek Weckwerth said,

    September 30, 2022 @ 6:13 pm

    @Terry K: the issue is more complex than … "simply … the … hegemony of the USA".

    Yes, there is some fluff on the margins, granted. And the history goes further back to the dominance of Rome and later Western Europe more generally. However, I do still think that the fact that all of the most widespread operating systems are essentially in English, and that there are no alternatives originating from other languages, does have quite a bit to do with the difficulties those systems have representing non-English characters, which was the main thread in the OP.

    In other words, if Windows, macOS etc. originated from, let us say, Greece, and Greek was the lingua franca of the modern world, we would be discussing how characters outside the GreekSCI are commonly butchered by computer systems.

  24. Chas Belov said,

    September 30, 2022 @ 7:53 pm

    Not to mention the difference between a straight apostrophe and a smart apostrophe, which should be treated by computer programs doing name matching as identical.

  25. Qwerty said,

    September 30, 2022 @ 8:10 pm

    Spare a thought for those of us with Western names in Japan. Especially long Western names

  26. AntC said,

    September 30, 2022 @ 8:39 pm

    @myl at what point did which programming languages, database systems, editors, web browsers, user-interface apps, fonts, etc., actually accept, store, and display Unicode, at all much less correctly?

    Indeed: I was programming an ICL ME29 in the early 1980's for an application in which (how shall I put it) the clientele had reasons to wish to be evasive and multiplicative as to their identity(s). There was an astonishing tendency for grandfathers/fathers/sons and mothers/daughters to have exactly the same names, but live at a variety of addresses and move frequently.

    The ME29's search algorithms were optimised for sixbit — that is, upper-case only and few graphics. Nevertheless, we coped with McMullins, MacMullins Jnr, O'Mullins III, … by dint of converting to a 'Soundex' (6-bit). A search would typically return less than a half-dozen possibles — to be resolved verbally with the client.

  27. Garrett Wollman said,

    September 30, 2022 @ 10:20 pm

    The world of travel, alas, is not so simple as just fixing a database schema and updating applications to use UTF-8 and the Unicode collation algorithm, the Unicode case-folding algorithm, etc. Airlines are required to submit passenger manifests to national authorities and immigration officials, and to check those names against a passenger's identity documents — typically a passport. There is a standard for how the "machine-readable area" of a passport is laid out, and it allows only the 26-letter upper-case Latin alphabet — no diacritics, no punctuation, no spaces. The MRA uses the '>' character as both a filler and a field separator. So my "passport name" is represented as "WOLLMAN>>GARRETT>A".

    It's entirely understandable that identity documents, which must be examined and verified by officials and travel businesses around the world would not allow the full range of Unicode characters, simply because verifiers cannot be expected to accurately read and compare *all of the world's scripts*. Since these standards are determined at a government-to-government level, it is unlikely that most developing countries would agree to a Eurocentric expansion of the permitted character set that still left their native scripts entirely out-of-repertoire. Much easier to agree on "just these 27 letterforms" and have a national standard for how local names are represented on passports, in Passenger Name Records, etc.

    (It's worth noting that, in pre-Unicode days, there were multiple International Standard character sets for data communication and storage, and they assigned different glyphs to the same codepoints, because 7 or 8 or 16 bits isn't enough to uniquely encode all the world's scripts. The International Standard version of ASCII is called ISO 646, which replaced the punctuation characters typically found on the right-hand side of a QWERTY kwyboard, like []{}\|, with different national characters depending on which ones were most frequent in each country's languages. This was succeeded by a family of 8-bit character sets, ISO 8859, which had separate variants for Western, Central, Northern, and Southern European languages, plus separate encodings for Turkish, Greek, Arabic.and Thai — because even 8 bits wasn't enough to directly encode all combinations of letters and diacritics from French, Polish, Romanian, and Turkish in the same "Latin" character set.)

  28. Barbara Phillips Long said,

    October 1, 2022 @ 1:19 am

    When the Affordable Care Act had been passed, I had to buy health insurance. Dealing with insurance required an online identity. Dealing with the government site also required me to have an online account.

    One function of the government site was to supply a tax form showing I had been properly insured. The only problem was, one site had my surname as Phillips-Long and the other as Phillips Long and because of programming protocols, neither site could change to match the other. Therefore, all data exchanges between the sites sent my records to data purgatory because the names were not identical.

    I had to jump through several hoops to request and obtain the proper IRS form to file my taxes, which included waiting for snail mail delivery, because one site would not take surnames with spaces and the other site would not take surnames with hyphens.

    I guess I am lucky my name was not even more complicated by apostrophes or accents, but I did not feel grateful at the time.

  29. Philip Anderson said,

    October 1, 2022 @ 3:22 am

    @Garrett Wollman
    I don’t know how the MRA handles apostrophes, but a hyphen is indicated. My son is A B C-D, which appears as C<D<<A<B, i.e. << separates the family name from the personal names.

  30. Philip Taylor said,

    October 1, 2022 @ 6:28 am

    If your son were "A B C D", Philip, and gave his given names as "A B" and his surnames as "C D", would they not also appear as C<D<<A<B ? In other words, what in C<D<<A<B is representing the hyphen that could not equally well be representing a space ?

  31. Eldor Trombat said,

    October 1, 2022 @ 6:39 am

    This classic discussion of the issue of names by a programmer is enlightening and amusing

    https://dev.to/carlymho/whats-in-a-name-validation-4b41

  32. bks said,

    October 1, 2022 @ 8:55 am

    "Tate is to drop the accent on Paul Cézanne when it opens its anticipated blockbuster exhibition on 5 October (until 12 March 2023). The move follows a decision by the artist’s grandson Philippe, who argues that although the name normally carried an accent in Paris at the time, it did not in the artist’s native Provence. "

    https://www.theartnewspaper.com/2022/08/31/acute-move-tate-cuts-paul-cezannes-accent-ahead-of-blockbuster-show

  33. pdw said,

    October 1, 2022 @ 12:32 pm

    In the European Union, companies that use computer systems that can't represent diacritics have gotten in trouble with the GDPR. For example this case: https://gdprhub.eu/index.php?title=Court_of_Appeal_of_Brussels_-_2019/AR/1006

    "The Court of Appeal of Brussels held that, in accordance with Article 16 GDPR, the data subject has the right for their name to be correctly spelled when processed by the computer systems of the Bank. To claim in 2019 that adapting a computer system to correctly handle diacritics would cost several months of work and/or constitute additional costs for the Bank, does not allow the Bank to disregard the rights of the data subject. A correctly functioning banking institution may be expected to have computing systems that meet current standards, including the right to correct spelling of people's names."

  34. Jarek Weckwerth said,

    October 1, 2022 @ 12:59 pm

    @pdw: Thanks a lot! This is a super useful and interesting pointer!

  35. bulbul said,

    October 1, 2022 @ 1:58 pm

    *laughs in Slavic and Hungarian*
    My last name has a háček, an accute accent and an umlaut.

  36. Philip Taylor said,

    October 1, 2022 @ 2:14 pm

    An umlaut or a tréma, Bulbul ?

  37. Philip Anderson said,

    October 1, 2022 @ 2:24 pm

    @Philip Taylor
    Yes, that is how my niece’s name appears, and I suspect O’Brien might be treated the same; the < indicates _a_ non-alphanumeric element in the name, and << is the field separator.

  38. maidhc said,

    October 1, 2022 @ 6:23 pm

    When I was a child, I loved "Tales of the Greek Heroes" by Roger Lancelyn Green. I later learned that he was one of the Inklings, and wrote a number of other books.

    His surname is "Lancelyn Green". Yet library catalogs seem to invariably call him "Green, Roger Lancelyn".

    Admittedly, if you just looked at the cover of one of his books, there is no easy way to tell.

  39. Jonathan Badger said,

    October 1, 2022 @ 10:21 pm

    @CP
    A lot of Japanese RPGs (especially from the NES and Gameboy eras but sometimes even today) limited names of player characters to five or six characters. So I ended up having characters named "Jonat" or "Jonath" even with my not very long first name.

  40. Michael Watts said,

    October 2, 2022 @ 12:33 am

    About year 2000, I had a student in my cryptology class whose first/given name was "Ry4an". I asked him how the "4" was pronounced. He said it was silent.

    I would tend to assume that that was a Tom Lehrer reference, since one of the bits preserved in Tom Lehrer's albums is a joke about someone named Hen3ry with a silent 3.

  41. philip said,

    October 2, 2022 @ 1:11 am

    Never take aything that Rosa Ó Snodaigh says as Gospel …

    Even here in Ireland, with each new school that my children attend, I end up phoning the secretaries and offering to come in and teach them how to access the fada on the keyboards of whatever computer they are using. Takes a while, but, eventually, the school starts sending me letters with the child's name spelt correctly in them: Siún and Lorcán; the other son, Connla, got away without the burden of a fada in his name.

    As for Ó Snodaigh (fils), there are even some Irish speakers who do not use an Irish version of their name as they are perfectly happy with their Anglicised 'slave name'. So those O'Shaughnessys, O'Briens and de Courcys have to be catered for.

    And what about the Mcs of this world? How do computers deal with Johm McMenamin, for example?

  42. Peter Taylor said,

    October 2, 2022 @ 2:34 am

    @philip, I know that a British MP (Karl McCartney) complained about Hansard, the official record of Parliament, not using a superscript c in his surname. I don't know whether they finally made changes to the system.

  43. Philip Taylor said,

    October 2, 2022 @ 4:23 am

    Re McMenamin, M${}^{c}$Menamin and so on, it is my impression that almost within living memory such names were routinely printed with a reverse apostrophe (or opening single quotation mark) rather than as a 'c', raised or otherwise, as in (e.g.,) "M‘Donald". Do others share this belief, and if so, is it known when and why the custom died out ?

  44. J.W. Brewer said,

    October 2, 2022 @ 8:21 am

    Going back way before ASCII, one earlier global technology that needed a standardized character set was telegraphy. While the original Morse code was indeed devised by an Anglophone in the U.S., wikipedia advises me that what became the global standard outside the U.S. (as "International Morse Code") was a very substantial revision of Morse's original system done by a German fellow named Friedrich Gerke, which was first adopted by the Deutsch-Österreichischen Telegrafenverein in 1851 and then adopted more broadly by the Union Télégraphique Internationale at its founding in Paris in 1865. No standard provision was made for German umlauts or French accents (or Spanish tildes) etc etc., and this in a time when English was much less dominant internationally vis-a-vis other major European-origin languages.

  45. Breffni said,

    October 2, 2022 @ 12:46 pm

    Regarding philip’s comment above, I think it’s important to put on record that the idea that anglicised versions of Irish names can aptly be called “slave names” isn’t at all widespread in Ireland. In fact, I’ve never come across it before.

  46. Aurelia said,

    October 2, 2022 @ 2:02 pm

    I have trouble believing that after governments and taxpayers have spent billions and trillions on creating computer systems, that this has to exist at all.
    Anglo English names have had apostrophes for hundreds of years, French, Italian, Spanish names certainly do and millions of North Americans have had multiple middle and last names, characters, umlauts, accents, apostrophes etc. and somehow birth certificates and passports and everything else is supposed to match, criminal records, banks, credit report companies, all have to. if they don’t then people’s names don’t match…and more than one govt agency or business would be sued for discrimination on the basis of nationality, ethnicity, name…

    This is pathetic. Why hasn’t it ever been fixed and is anyone currently fixing it?

  47. Coby said,

    October 2, 2022 @ 4:16 pm

    The availability of diacritics that were absent from American typewriters has enabled the California cities of San Jose and La Canada to rebrand themselves as San José and La Cañada.

    I am dreading the possibility that my county seat, Martinez, might one day choose to become Martínez, which would be anachronistic because at the time of its founding, in the 19th century, there was a rule in Spanish that surnames ending in -ez did not require an acute accent over the preceding vowel.

  48. Terry Hunt said,

    October 2, 2022 @ 9:15 pm

    @ Philip Taylor — The "turned comma" (as it was known in printing contexts, since it was achieved by turning a moveable type comma upside down) was still alive enough in the early 1950s for the Scottish writer James Murdoch MacGregor to have the pseudonym he used for his science fiction work printed as J. T. M'Intosh on various magazine short stories and on his first novel (from US publisher Doubleday).

    As a bookseller in Scotland in the late 1970s it cropped up with both authors and customers often enough that I (a Sassenach) was well aware of it. Incidently, our firm (and I assume many others in Scotland) routinely treated M', Mc, and Mac (superscript or not) as the 27th letter, preceding M, for alphabetization purposes. I wonder if the same is or was done in Ireland for O', or if Oliver precedes O'Neill?

  49. Philip Taylor said,

    October 3, 2022 @ 4:15 am

    Good Lord, a "turned comma" — I haven't heard that phrase in a very long time. So long, in fact, that I now forgotten where I did first encounter it, but your description of the reason for the name certainly accords with my own recollections, as does the fact that it was still in use within living memory …

  50. Carlos Crespo said,

    October 4, 2022 @ 3:44 am

    An additional problem is that some characters are used as field delimiters or have some special meaning.
    For example in SQL the apex (ascii code 39) is used to define the beginning and end of a text string.
    In Portuguese there are names like D'Eça – which as a rule (although wrongly) are written with the apex (the correct thing would be to use the apostrophe – unicode 0146, as in the Leah D'Andrea example – but as on keyboards there is no apostrophe , the apex is used). When migrating databases we often encounter problems, and we ended up using yet another character ´ (acute accent – extended ascii code 180) as a substitute.
    The use of the apostrophe has yet another effect: when converting the name between systems Leah D'Andrea can transform into Leah DÆAndrea (ascii code 146)…

  51. Brian Crane said,

    October 4, 2022 @ 10:13 am

    Between this issue and people in our global workforce having surprisingly similar names, our company abandoned alphabetic userids for a more universal format a few years ago. It's harder to determine who is who in error messages (you actually have to look them up), but it's made some internal processes much, much easier.

    Emails still display the correct names (Microsoft seems to have figured out a solution to this issue several years ago), so there is no pushback of people feeling like they are just a number instead of a name.

  52. Philip Taylor said,

    October 4, 2022 @ 1:53 pm

    For as long as I have been associated with a British university, Brian, user-IDs have always been alpha-numeric, where the alpha part indicated College, department and status and the numeric part simply a sequence number. When I joined Westfield College in 1972, I became "Uaaa006" (which would have been entirely upper-case in those days), then when I transferred to Bedford College I became Uhaa006, and then Chaa006 when I became a system administrator. Had there been an undergrad with whom I shared all other attributes, he (or she) would have been Zaaa006/Zhaa006.

  53. Philip Taylor said,

    October 4, 2022 @ 2:37 pm

    Sorry, too long ago — Westfield was UN…, Bedford UA… and Royal Holloway UH… I think.

  54. James Wimberley said,

    October 5, 2022 @ 11:52 am

    One of the online forms you have to fill in to get a visa for Australia insists that you enter your name exactly as it's written in your passport. But the software refuses to accept diacritics.

  55. Philip Taylor said,

    October 5, 2022 @ 1:58 pm

    And while it provides a conversion table for characters bearing diacritics, that table excludes a number of Vietnamese letter+diacritic combinations (e.g., Ệ, Ơ, Ư, …) and probably many others as well.

  56. Philip Taylor said,

    October 6, 2022 @ 2:41 am

    … although I suppose one could argue that while Ơ and Ư are undeniably Vietnamese letters, Ệ is not, since the dot-under applies to the word in which the letter Ê occurs, not to the letter itself. Nonetheless, the dot-under diacritic does appear in Vietnamese names as they appear in Vietnamese passports, for example the passport of Nguyễn Thị Châu Hà.

  57. Yerushalmi said,

    October 18, 2022 @ 2:23 am

    “As part of future systems development we will consider implementing reasonable steps to address this issue.”

    That is a horrifying sentence from the airline spokeswoman. In the *future* (1) we will *consider* (2) taking *reasonable* (3) *steps* (4) to *address* (5)?

    This is the classic joke about "we'll form a committee to discuss the makeup of the committee that will issue the report on whether to.." only in real life. My God.

RSS feed for comments on this post