Fake account spotting on Facebook
« previous post | next post »
One language-related story in the British press over the weekend was that Gavin McGowan was threatened by Facebook with having his account shut down… because they said his name was fake.
About ten years ago Gavin learned some Scottish Gaelic and started using the Gaelic spelling of his name: Gabhan Mac A Ghobhainn. Facebook is apparently running software designed to spot bogus accounts on the basis of the letter-strings used to name them. Gabhan's name evidently failed the test.
One more application for computational linguistics (and not a new one): language identification. Algorithms for detecting the language of a piece of text can do quite well on even very small samples, based mainly on bigrams and trigrams of letters. (If sz is reasonably frequent in uncapitalized words in a text that also has occurrences of gy followed by a vowel, it's very probably Hungarian. That sort of thing.) But Facebook's technology apparently cannot yet recognize even common personal names like Gavin or family names like McGowan if they are spelled Gaelic-style. It's a surprising failure.
Of course, Gaelic is a very small minority language; but almost all of its speakers live in the big, Internet-savvy, English-speaking countries: Australia, Canada, New Zealand, the UK, and the USA. People were bound to complain.
And they did: Facebook received a petition urging clemency for Gabhan's account, with 2,000 signatures, which amounts to 3.5% of the total number of native speakers of Gaelic in the world. "This was a mistake and we apologise," said Facebook in the reply Gabhan received.
Leslie Katz said,
March 1, 2015 @ 2:11 pm
Just for fun: there was a time when the Australian immigration authorities could administer to a person wanting to enter the country a language test in any "European language", failure to pass which would lead to refusal of entry.
In the 1930s, a language test was administered to a notorious Czech Communist, which test he failed. The language on which he was tested was Scottish Gaelic.
The ultimate appellate court in Australia afterwards held that Scottish Gaelic wasn't a European language within the meaning of the immigration law.
Rebecca said,
March 1, 2015 @ 2:44 pm
Native Americans have also been caught in the Facebook fake name sting, not for letter sequence issues but for syntactic crimes.
http://colorlines.com/archives/2015/02/native_americans_say_facebook_is_accusing_them_of_using_fake_names.html
Q. Pheevr said,
March 1, 2015 @ 3:36 pm
Facebook's preoccupation with real names is yet another thing that tells me their view of the Internet is inimical to mine. That's okay, though; they can go their way, and I'll go my way, and neither of us need ever have anything to do with the other. But it does seem completely indefensible for any international Web site to have such a narrow-minded, parochial, pig-ignorant view of what a "real name" has to look like. If they genuinely care about real names, why can't they be bothered to learn anything about them?
Jon said,
March 1, 2015 @ 4:47 pm
It reminds me of the time when a Londoner with some Irish ancestry and the boring English name of John Stevens decided to join the IRA. He became Sean MacSteaofoin, if memory serves correct.
Micah J. Transparent-Pseudonym said,
March 1, 2015 @ 5:26 pm
I've used Facebook under this name for about seven years now. Whenever I read a story like this, it makes me wonder exactly what algorithm they could possibly be using, which catches all these false positives and yet also has false negatives which are that… transparent.
Mimi said,
March 1, 2015 @ 5:41 pm
Facebook also apologized for trying to force members of the LGBTO community to use their legal names:
https://www.facebook.com/chris.cox/posts/10101301777354543
Matt said,
March 1, 2015 @ 7:27 pm
But Facebook's technology apparently cannot yet recognize even common personal names like Gavin or family names like McGowan if they are spelled Gaelic-style. It's a surprising failure.
The stories talk about "capital letters and apostrophes," but McGowan's name as reported doesn't seem to use either of those in a way that would surprise an English speaker. I wonder if it wasn't the standalone "A" that set off Facebook's filter — the part of the logic that was designed to catch fake names like "Love A Duck" and so on.
S Frankel said,
March 1, 2015 @ 8:40 pm
What is that disembodied A, anyway? The amount of Gaelic I know would fit comfortably onto the eyelash of a very small gnat, but I thought that Mac (son) is usually followed by the family name, in the form of a patronymic in the genitive. That "A" looks like a vocative particle (followed by the right case).
Jonathon Owen said,
March 1, 2015 @ 9:47 pm
Mac is usually followed by a family name, but not always. Mac a Ghobhainn means "son of the smith", Mac an t-Saoir (McIntyre) means "son of the carpenter", and I'm sure there are others. The disembodied a/an is the definite article.
S Frankel said,
March 1, 2015 @ 10:34 pm
Thanks, Jonathon Owen. So the "A" shouldn't be capitalized. That's probably not why Facebook choked, though.
[It does look strange to see it capitalized; but I kept the capitalization because Gavin himself spells his name that way, as can now be seen from the Facebook page he has been allowed to maintain.—GKP]
And is the name "Taggart" from "Mac an t-Sagairt" (son of the priest)?
mollymooly said,
March 2, 2015 @ 5:07 am
John Stephenson at least had an Irish mammy, whereas Alfred Michael Wilmore was English of the English; his reinvention as Micheál Mac Liammóir was his greatest performance.
I how much harder it is for a machine to recognise a language if it also has to allow for mistakes common among L1 (let alone L2) speakers.
maidhc said,
March 2, 2015 @ 5:38 am
S Frankel: Yes, it is. As the proverb has it (Scottish Gaelic version):
Is e 'leanabh fhéin a's luaithe 'bhaisteas an sagart.
The priest baptises his own child first.
The Celtic church, like the Orthodox church today, did not require priests to be celibate. Some people say that celibacy became mandatory for Latin Church priests only in the eleventh century.
The proverb and the name could well go back earlier than that. Some Scottish surnames can be traced back to the period when the Vikings were settling in.
I had a friend whose father was very Scottish nationalist, so his birth certificate and thus his passport gave his name in the Gaelic version. However all his other ID had the English version of his name. Since he was a US resident, it caused him a lot of problems crossing borders.
There's a bit more backstory to this. In the days when Ireland was ruled by the English, the Post Office would not deliver mail that had an address in Irish. Also, there was a law that tradesmen had display their business name on their cart. However, Irish names were not accepted and the tradesman would be fined. Similar petty annoyances occurred all the time back then.
A quaint story of times past, perhaps, but notice that when the Olympic torch was to begin its progress towards London from Cornwall, the British government stepped in to paint over all the Cornish language signs that might appear on television along the route.
pj said,
March 2, 2015 @ 7:36 am
@maidhc
This was news to me, so I searched for more information. I can only find reference to the lettering on the arch at Land's End being changed shortly before the torch relay. Do you know of other changes?
J.W. Brewer said,
March 2, 2015 @ 9:55 am
Re Stephenson et al, I would think it's one thing e.g. to use the spelling MacDhòmhnaill if your birth certificate says McDonald, but rather another if your birth certificate says Donaldson (or O'Donnell or FitzDonald or Donaldovich or what have you). One reflects different orthographic/pronunciation approaches to the same underlying lexeme; the other involves calquing/translation.
The odd thing about this is that you would think that if people were setting up facebook accounts under fake names with bad intent (e.g. so they could post abusive or fraudulent content w/o attribution) they would often and perhaps usually pick *plausible* fake names, i.e. letter strings that might well be an actual not-too-exotic personal name in the relevant-in-context language/culture, just not one easily traceable to the individual using it.
S Frankel said,
March 2, 2015 @ 10:04 am
Thanks to maidhc for the thoughtful answer. Repression of Celtic languages in the UK isn't all old news. My first Welsh teacher remembered being beaten at school for speaking Welsh to his schoolmates.
About Facebook:The "real name" policy is there because "real" names are more valuable to advertisers. That's one problem (they want to monetize your identity). The other problem is that they don't want to spend any more money than they have to to make the advertisers feel like they're getting a good deal, so all the verification is done automatically. They don't hire any actual human beings to take care of problems. This is why there are stories of people sending in copies of their birth certificate or driver's license, only to be met with stony silence. It's not worth Facebook's time or money to set up a system that will fix things.
We're not Facebook's customer's. The advertisers are the customers. We're Facebook's product.
Terry Collmann said,
March 2, 2015 @ 11:34 am
mollymooly – it always amuses me that "Liammóir" can be translated as "Big Willy".
wally said,
March 2, 2015 @ 11:40 am
Sometimes I amuse myself when using Google Translate by using the Detect Language feature to see how quickly it can detect the intended language. Often it only takes a word or two for the program to figure it out. Since Hungarian was mentioned I tried typing in the simple proverb "Az ido penz" (the time [is] money). I only got a far as the first two characters before it knew it was Hungarian. I have probably looked at Hungarian translations in the past so I don't know if the system was primed to notice Hungarian for me.
S Frankel said,
March 2, 2015 @ 11:50 am
Google Translate just detected Hungarian when I typed "az" so it wasn't just you. It detected Czech for "bez" (I was thinking Polish, but presumably that could get adjusted down the line).
But there are some weirdos. Cz detects Czech (I don't think that digraph is used in the language). Following with a, e, o, or u detects Polish (fine), but following with i detects Indonesian (un-possible).
Ll detects English, Lla detects Finnish (!), and filling it out to Llama detects Spanish.
Word-fragment weirdos seem the rule, not the exception.
"Bravo" is French.
Marek said,
March 2, 2015 @ 12:03 pm
>Cz detects Czech (I don't think that digraph is used in the language).
This might be because it treats "cz" on its own as a single token, which would occur in Czech corpora in reference to the country code or internet domain. On the other hand, the Polish digraph "cz" only occurs within other words. The algorithm probably looks both at word frequencies and ngram frequencies.
S Frankel said,
March 2, 2015 @ 12:42 pm
@ Marek – that sounds reasonable, although it doesn't know what to do with the "Cz" in the context of a full sentence. It just leaves it untranslated.
Similarly, the "czi" for Indonesian just remains as an untranslated lump in the middle of an Indonesian sentence
J. W. Brewer said,
March 2, 2015 @ 2:02 pm
Personal names are so various (especially in the modern US and to lesser extent UK because of multiplicity of different ethnic backgrounds and naming traditions present in the societies) that it makes my head hurt to think about how one would design an algorithm that would semi-reliably say "oh that one's unlikely to be real." Even for weird-looking letter sequences, you have problems like needing to formulate a rule like "essentially no English words begin with a doubled f, except for certain surnames (e.g. Fforde, which has a half-dozen bearers of sufficient notability to have their own wikipedia articles)" and I would expect that that's not the only except-for-proper-names exception to such letter-string patterns, to say nothing of the fact that it's not uncommon for Americans to have surnames with letter strings ("szcz," for example) that never ever occur in "domestic" English-origin names (and maybe the ff-initial names are ultimately of Welsh origin or something, but long assimilated into the rest of Britain?). I would like to see a list of names actual (if pseudonymous) people used to set up actual accounts that were in fact fake and were in fact flagged by whatever algorithm they're using — and perhaps compare it to a list of false positives wrongly flagged by the same algorithm.
It's one thing to say that names that are in some sense orthographically odd or non-standard for the relevant culture (specifically in terms of spacing, diacritical marks, capitalization etc) may be reduced into some sort of dumbed-down standardized format by the record-keeping protocols of the government or other large institutions (no accents, no umlauts, no tildes, possibly ALLCAPS so your own idiosyncratic views as to what should or shouldn't be lowercase are overriden completely) so you won't get a birth certificate or driver's license or passport that perfectly reflects your own subjective preferences in that regard. But I don't think that's what's going on here.
S Frankel said,
March 2, 2015 @ 2:24 pm
Thus Fowler (who, although not always right, is nevertheless a good read):
"In old manuscripts the capital F was sometimes written ff. This is the origin of the curious spelling of some English surnames: ffolliot, fforde, ffoulkes, ffrench, and other. The distinction of possessing such a name is naturally prized; readrtd of *Cranford* will remember Mrs. Forrester's cousin Mr. ffoulkes who always looked down on capital letters and said they belonged to lately invented families; and it was feared he would die a bachelor until he met a Mrs. ffaringdon and married her, 'and it was all owing to her two little ffs'.
Obviously those surnames are not of Welsh origin, although the Celtic linguist (and one of my teachers) Robert Fowkes was, but his name was probably from English "folks" (according to him).
B Slade said,
March 2, 2015 @ 4:35 pm
The "unusual" bigrams/trigrams of letters here would seem to significantly overlap with Indo-Aryan languages like Hindi, so I'm surprised, as I don't think Facebook generally tags Indian account as "fake".
Boursin said,
March 2, 2015 @ 4:40 pm
-lla ~ -llä is the suffix indicating the Finnish adessive case – which corresponds, depending on context, to several different prepositions such as "at", "on" or "with". It's never used by itself, but there are probably enough pages about Finnish case endings on the Web to make this a non-mystifying identification.
S Frankel said,
March 2, 2015 @ 4:47 pm
That's it! I tried some other Finnish case endings (tta, ssa, lla), and they're all detected as Finnish.
Jim said,
March 2, 2015 @ 6:47 pm
Noticing the list of "Internet-savvy, English-speaking countries"… wouldn't, um, Ireland also have a significant chunk of the Gaelic speakers in the world?
Ian M said,
March 2, 2015 @ 6:50 pm
Rebecca, thanks for that link about the Native Americans locked out of Facebook. What I thought was ironic was the customer service message at the end from a mere "Harvey" – no surname. So Facebook can control your use of your own name but not supply you with its customer service people's names.
Suburbanbanshee said,
March 2, 2015 @ 7:02 pm
Irish Gaelic (usually Gay-lik) and Scottish Gaelic (usually Gal-lik) have significantly different spelling, grammar, and pronunciation. They are different languages.
(And then there's Manx and such.)
jon livesey said,
March 2, 2015 @ 8:21 pm
Not to spoil a good stroy, but if you try to track down the story about the "British Government" painting over Cornish names, you end up at this story in The Cornishman.
http://www.cornishman.co.uk/Cornish-removed-sign/story-16172106-detail/story.html
They suggest that exactly one sign was painted over in the course of general renovations, and apparently the motive was to replace an English/Cornish sign with a sign with more European languages.
Now I need to get off and beat more children for speaking Wlesh. Which will be odd, given that my own nephews and nieces learned Welsh in Chester, which is actually in England, but close to the Wlesh border.
jon livesey said,
March 2, 2015 @ 8:22 pm
Not to spoil a good story, but if you try to track down the story about the "British Governmeon" painting over Cornish names, you end up at this story in The Cornishman.
http://www.cornishman.co.uk/Cornish-removed-sign/story-16172106-detail/story.html
They suggest that exactly one sign was painted over in the course of general renovations, and apparently the motive was to replace an English/Cornish sign with a sign with more European languages.
Now I need to get off and beat more children for speaking Wlesh. Which will be odd, given that my own nephews and nieces learned Welsh in Chester, which is actually in England, but close to the Wlesh border.
S Frankel said,
March 2, 2015 @ 8:52 pm
@jon livesey – If you don't know about the history of repression of the Celtic languages, there are many places to start. Here's a quickie, for Welsh: http://en.wikipedia.org/wiki/Welsh_Not
It was the official policy of the British government from 1848 (following a report in 1847 http://en.wikipedia.org/wiki/Treachery_of_the_Blue_Books) to stamp the language out. They had a great deal of success with their program.
My first Welsh teacher went to school in south Wales in the 1930s when speaking Welsh was a punishable offense.
Piyush said,
March 3, 2015 @ 12:38 am
@B Slade
The reason Facebook does not flag names in Indian languages as "fake" has probably got to do with the fact that India is their second largest market in terms of number of users, outnumbering Brazil by a factor of more than two.
Aside from the fact that they do not want to unintentionally antagonize a large fraction of their users, this also means that they have a much larger database of Hindi or Tamil or Bengali names than they have of Gaelic names or names in one of the native American languages. Another factor might be that there Facebook almost certainly has more engineers on its staff who can speak Hindi or Tamil or Bengali than than those who understand Gaelic.
J. W. Brewer said,
March 3, 2015 @ 1:03 pm
Regardless of the earlier history, for the last several decades in the UK there has been money flowing (I believe some of it from Brussels not London) for projects related to preserving/promoting Welsh and Scottish Gaelic (Cornish and Manx may lack the same formal status), as well as providing various symbolic benefits to those languages. Indeed current UK naturalization law says that foreigners desiring to become citizens don't have to demonstrate fluency in English if they can do so in Welsh or Scottish Gaelic (although I expect that very few applicants take advantage of that option). Similarly, if you have the sort of corporate business that would typically be named XYZ Ltd. you have the option (if in Wales) of naming it XYZ Cyf. (for "cyfyngedig"). So although obviously facebook is a private-sector actor that's part of what makes this particular incident especially peculiar.
Natalie Solent said,
March 3, 2015 @ 2:11 pm
Like several other commenters, I had difficulty believing that in these days of political correctness the story as described by maidhc, namely that "when the Olympic torch was to begin its progress towards London from Cornwall, the British government stepped in to paint over all the Cornish language signs that might appear on television along the route", was accurate.
In general the "British Government" these days is earnestly anxious to preserve minority languages. In my opinion it is a good deal too fond of doing this by compulsion rather than persuasion, but in itself is a worthy objective. Furthermore, one of Cornwall's main industries is tourism and local government is very aware that signs in Cornish add to the interest of Cornwall as a tourist destination. As an example of the positive official attitude to the language, here is a link to the Cornwall Council website in which a declaration about the Cornish gaining official recognition as a minority is translated into Cornish. Someone has been to a fair bit of effort to do that, given that only a few hundred people have Cornish as their main language and all of them speak English as well.
Here is the link (scroll down for the translation into Cornish):
http://www.cornwall.gov.uk/community-and-living/equality-and-diversity/cornish-minority-status/
Speakers of British indigenous languages, like their Irish-speaking counterparts, may well be patronized these days but they are not oppressed.
Jonathon Owen said,
March 5, 2015 @ 6:22 pm
@Piyush: Wrong kind of Indian. The controversy is about Native Americans not being allowed to use their real names because Facebook thinks they're fake. Here's a link.
Piyush said,
March 6, 2015 @ 4:26 pm
@Jonathan Owen
I know. I was responding to the comment by B Slade who wondered why Facebook does not similarly flag (as you say) the "wrong" kind of Indians.