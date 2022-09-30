Annals of Artificial Stupidity
Katie Deighton, "What Can’t the Internet Handle in 2022? Apostrophes", WSJ 9/29/2022:
Sybren Stüvel is an Amsterdam-based software developer with a fairly uncommon name and a surprisingly common predicament.
As he completes the tasks of daily life, computers refuse to accept his name as valid or mangle it entirely. A credit card provider rejected his moniker, a Vancouver hotel hit bumps locating his reservation—as he stood there exhausted from a nine-hour plane trip—and an airline wouldn’t let him check into a flight. “You can imagine my stress level,” he said.
While buying insurance, he said, “They asked me to confirm that my last name is indeed Stüvel.”
Well into the internet’s fourth decade, most everything is done online. Yet some names still stump machines. Computers can defeat Garry Kasparov at chess, but some systems stumble over names containing apostrophes, numbers, hyphens or letters not commonly used in English texts. […]
“I retweet a complaint at least five times a week,” said Miroslav Šedivý, a software engineer based in Vienna and the administrator of the plainly named Twitter account “Your Name Is Invalid.”
He gravitated to the topic after companies in Germany, where he formerly lived, literally gave him a bad name: they tended to lose the “Š” and the “ý” of his last name of Šedivý. He once tracked down a piece of mail addressed to “Miroslav Ediv.” […]
“Hey Disney, let’s chat for a moment about names,” a user named Leah D’Andrea-Lee tweeted to Walt Disney Co. recently, saying the company repeatedly rejected her name when she tried to renew her Magic Key reservation pass for its Disneyland theme park. “What gives?”
Ms. D’Andrea-Lee and her husband Chris Lee are frequent Disney visitors and die-hard fans who won best in show for their cosplay at the “Mousequerade” contest at the official Disney fan club’s 2022 expo.
So it’s dispiriting, said the L.A.-based actor and costume maker, when she is addressed by one of the biggest media companies in the world as “LEAH DANDREALEE,” “Leah D&#38;andrea” or, as her Magic Key came in the mail, the generic “Magic Key Holder.” […]
Among the frustrated are some Irish customers of the national carrier Aer Lingus, which ironically doesn’t accept names such as “O’Neill” and “O’Brien” when taking bookings.
Aer Lingus’s booking system, called Astral, is nearly 60 years old and doesn’t cater to special characters, an airline spokeswoman said.
“We recognize the limitations of the system with respect to accepting special characters and apologize to customers for any inconvenience caused,” she said. “As part of future systems development we will consider implementing reasonable steps to address this issue.”
As a (very limited) defense of the cited crappy software systems, we can note that standards for encoding and displaying letters outside the (mutually incompatible) ASCII and EBCDIC character sets are relatively recent — and standards for entering them don't yet exist. And updating (multiple interacting) software systems involves substantial (financial, social, and diplomatic) costs.
I call this defense "limited" because the problem is widespread and serious, and companies wouldn't (I hope) put up with similar problems in systems that they're forced to really care about, like tracking their bank balances. But there are all too many examples of Artificial Stupidity in new (and allegedly improved) bureaucratic computer systems, often out-sourced to companies that supposedly know what they're doing, which subject their users to serious difficulties that force weird, time-consuming, and unsanctioned (or even illegal) work-arounds. I'd list a few from my recent experience, but I'll spare you for now.
On the other side of the limited-name equation, I'll note what a pain it is for Unix command-line apps to deal with Apple Macintosh file names, which may have internal spaces, double quotes, single quotes, apostrophes, dollar signs, hash marks, etc. In that context, I'm with Aer Lingus.
Jarek Weckwerth said,
September 30, 2022 @ 10:23 am
Well, isn't this simply a manifestation of the cultural and economic hegemony of the USA? If the typewriter (and then computer) had been invented in the Czech Republic, or even Germany, it would surely include those diacritics. Not to mention China.
But that's the way the world it is. (The full form of) my first name includes ł, and I just never bother with it outside its motherland ;) (Even professional linguists often don't know how to deal with it though. That annoys me no end.)
Jarek Weckwerth said,
September 30, 2022 @ 10:25 am
(Because ASCII stands for the American standard code for things doesn't it.)
Y said,
September 30, 2022 @ 10:32 am
The Mac was the pioneer in using more varied and more characters in file names, back when DOS was limited to 8 alphanumeric characters. But to this day it excludes colons, which are reserved for separators. If you have a lot of pdfs papers, especially in the humanities, and you like the file names to echo the paper titles, you still have to bear this annoyance, nearly 40 years after the first Mac.
Terry K. said,
September 30, 2022 @ 10:35 am
Since computer systems, as noted, have trouble with names like O'Brien or hyphenated names, both of which we have here in the USA, no, it's not as simple as "the cultural and economic hegemony of the USA". Not saying there's not a good dose that that. But it's more complex.
Oskar Sigvardsson said,
September 30, 2022 @ 10:41 am
> As a (very limited) defense of the cited crappy software systems, we can note that standards for encoding and displaying letters outside the (mutually incompatible) ASCII and EBCDIC character sets are relatively recent — and standards for entering them don't yet exist.
As a software developer, I have to take issue with this: Unicode has existed for a long time now, and the standard encoding for it (UTF-8) is 30 years old. Text encoding is a notoriously hard thing to program, but at this point, there is no excuse for getting this stuff wrong, and yet it happens all the time. One thing I've noticed is that Unicode and text encoding is rarely taught to Computer Science students, so you mostly have to learn "on the job", which is not ideal.
That said: many of the problems pointed to have (essentially) nothing to do text encoding schemes (whether they be ASCII, EBCDIC, one of the Unicode formats, or something else), it has to do with improper quoting and escaping, especially of HTML. Take one example of
Leah D'Andrea-Lee
If you where to run that through a generic HTML escaper, the apostrophe is replaced and it becomes:
Leah D'Andrea-Lee
If you then unescape that, you get back to the real name: no information has been lost. However, if you escape that AGAIN (the string being passed between independent computer systems, such a mistake might creep in), the ampersand gets escaped and it becomes:
Leah D'Andrea-Lee
This has nothing to do with text encoding exactly: the apostrophe character is in ASCII. It has to do with how text is escaped and unescaped, which is a slightly different (but related) issue.
The core issue of both encoding and escaping issues is of course is that you do not test for these kinds of cases. Another issue is lack of diversity among programmer teams: I've had the experience personally of a new member with non-ASCII characters in their name joining the team and our own software breaking. It is a hard issue to deal with, but for systems like this, it really is unacceptable to get it wrong.
Oskar Sigvardsson said,
September 30, 2022 @ 10:44 am
Oh, the irony: your commenting system unescaped my examples, so they now just appear "correct"! When i entered them, the apostrophe was replaced with "& # 3 9;" and the ampersand with "& a m p ;" (without the spaces)
See, text processing is hard!
J.W. Brewer said,
September 30, 2022 @ 11:04 am
Because Irish-origin surnames of the O'Suchandsuch variety have long been common in the U.S. and American typewriters typically had an apostrophe key, technical inability to handle those (as opposed to handling letters adorned with umlauts and carons and whatnot) seems a bit hard to attribute to Americo-centrism. Unless maybe there was a pre-existing Anglosphere practice of omitting the apostrophes in such surnames in telegraphese, such that O'Neill WOULD BE RENDERED AS ONEILL STOP. On the other hand, I think a lot of early computer databases couldn't deal with surnames written out as more than one word, so that e.g. the presidential surname van Buren would turn into VANBUREN even though earlier generations of typesetters etc. had been able to handle the two-word variant and even variation between "van" and "Van" in such names.
CP said,
September 30, 2022 @ 11:08 am
A minor problem to be sure, but my first name is often truncated to 10 letters. I’m sure because of some database limitations.
So I am regularly referred to as Christophe. An actual name, but not quite mine. Not sure if the issue applies to last names though.
phanmo said,
September 30, 2022 @ 11:14 am
I run into this on a regular basis, as my daughter's first name has a diaeresis (trema) in it, as does my wife's last name.
Sergey said,
September 30, 2022 @ 11:52 am
I remember the discussion here about Kazakhstan transitioning to Latin script, with their head of government insisting that no diacritics and no letters besides the basic 26 must be used. And people here being surprised at that. Now, this post explains the reasoning behind that decision very saliently.