Language Log

Just add "intelligent" and "informed" …

October 26, 2010 @ 4:11 pm · Filed by Mark Liberman under This blogging life

Mouseover title: "And what about all the people who won't be able to join the community because they're terrible at making helpful and constructive co– … oh."

October 26, 2010 @ 4:11 pm · Filed by Mark Liberman under This blogging life

Permalink

17 Comments

Scott said,

October 26, 2010 @ 4:30 pm

It will be a crime against irony if this post doesn't attract trolls…
Don Sample said,

October 26, 2010 @ 4:50 pm

The spammers will just train their bots to rate other spam-bot posts as constructive, and the whole thing will collapse.
Oskar said,

October 26, 2010 @ 4:55 pm

Being a software engineer, I think this is vastly more difficult to implement than xkcd makes it seem. First off all, most commenters would be loathe to spend ten minutes rating other comments when just want to type something and move on.

Also, there's a rather easy way to get around this. Just have a bots rate the comments of other bots. It would be trivial to modify a spambot to simply do this in addition to making comments.

There's also more subtle issues: what if interprets his task to evaluate a comment for "constructiveness" as evaluating a comment's other qualities? What if he simply disagrees with the comment?

That said, this is essentially the system that Slashdot pioneered, and Digg and Reddit modified for their own purposes. But those sites spent years tweaking the system to get it right, and it's still not perfect. It's a bit much to ask of an owner of a blog.

(and in case you're wondering, I have years of experience pounding jokes into submission, totally depriving them of any comedy by absurd over-analysis. Isn't that what we do here on Language Log?)
whoever said,

October 26, 2010 @ 5:05 pm

Heh, #1 it's a joke, but #2, it wouldn't collapse so easily. Captcha itself already runs in this evolutionary/self-regulating status, and doesn't suffer from false positives. I.e. Captcha expands its pictures and word combos continuously by soliciting user feedback about what words are present in which pictures.

There isn't a false positive collusion strategy going on between spammers for various reasons, ie lack of collusion between spammers, real people outnumbernig fakes, and interventions… without being too specific which reason(s) stop user policing from breaking the system, it's sufficient to say it doesn't actually do so… presumably this system would work the same, for the same reasons.

It's easier for AI and third world laborers to manufacture true positives than to overwhelm the self policing with false ones.

When talking about approving a potential users' comments, it implies a trusted user system (like an entrenched forum or commentary community), which makes it unlikely any spammers would ever be given a chance to upvote other spammers… since currently active communities are already manually policed, and unlikely to have trusted spam accounts. These could be rooted out anyway, and a preponderance of spam accounts makes spam commentary useless anyway.
Jonathan Badger said,

October 26, 2010 @ 5:42 pm

Also, there's the problem that the "popular answer" may not be the correct one — if you've ever seen "Slashdot" (which as previously mentioned, has a community moderation system similar to what is being proposed), you'll know that things that are dead wrong get modded up to +5 Informative, and quite often true things get modded as "Flamebait" if they go against the grain of whatever the crowd has decided.
D.O. said,

October 26, 2010 @ 6:39 pm

But how do we train original posters to be, you know, … ?
fs said,

October 26, 2010 @ 7:15 pm

whoever: Perhaps you are referring to reCAPTCHA. CAPTCHA is just a generic term for such systems – it's a rather forced acronym of "Completely Automated Public Turing test to tell Computers and Humans Apart".
Rodger C said,

October 26, 2010 @ 10:08 pm

Am I the only person here who's gotten captchas in Greek lately? The kind that seem to be lifted from pages, of course. Had to reload.
Lori said,

October 27, 2010 @ 4:56 am

I'm getting reCaptchas in a lot of scripts. But I haven't been able to figure out whether it accepts the real thing (Unicode), a transliteration, the same character without diacritics, etc. Or if I need to use the superscript 2 character for a superscript 2. Or if I can leave out the punctuation marks. That depends on what most other people enter when they see weird characters, which turns captchas into a test for crowdguessing. My guess is, US-ASCII only and drop everything non-alphanumeric. I'm surely not representative in memorizing ALT-PLUS-hex key combos for special characters.
SeanH said,

October 27, 2010 @ 6:40 am

First off all, most commenters would be loathe to spend ten minutes rating other comments when just want to type something and move on.

Surely this is a feature, not a bug? Drive-by commenting should be discouraged.
Dan Lufkin said,

October 27, 2010 @ 5:45 pm

This sounds like a job for Mark V. Shaney
Disfraz said,

October 27, 2010 @ 6:03 pm

@Lori: IIRC reCAPTCHA is built on Google Books, and you actually only need to get one word right. One of the two is a word that Google's OCR identified, and the other is one that it had trouble with. The foreign scripts turn up in the second category, but as long as you get the OCR-positive word right then you only need a vague approximation of the other one.
(Or, just hit refresh and stop sweating the little things.)
Jerry Friedman said,

October 27, 2010 @ 6:48 pm

Speaking of overanalyzing jokes, CAPTCHA appears to be about ten years old. That's not old enough to be "traditional" in my book.
Andrew said,

October 28, 2010 @ 1:15 am

@Rodger C: would that be phi beta captcha?
W. Kiernan said,

October 28, 2010 @ 2:29 pm

The mouse-over, which I missed when I first read the comic, is about the Voigt-Kampff test, the hook of "Do Androids Dream of Electric Sheep."
Sili said,

October 29, 2010 @ 6:04 pm

Being a software engineer, I think this is vastly more difficult to implement than xkcd makes it seem.

Exactly.

Once those bots have been made, we've created proper AI.

At least that's what I took to be the joke.

With reCaptcha for instance, any successful attempt at gaming the system will be a net good, since it will have improved our OCR technology. For now reCaptcha deals with words that have not could* be read automatically. Any success with automating the process can be harnessed for good.

* I know this is not grammatical, but I find it impossible to catch the meaning I want otherwise.
D. Sky Onosson said,

November 1, 2010 @ 2:37 pm

@ Sili:

Maybe have not been able to be read automatically?

RSS feed for comments on this post

Just add "intelligent" and "informed" …

17 Comments

Scott said,

Don Sample said,

Oskar said,

whoever said,

Jonathan Badger said,

D.O. said,

fs said,

Rodger C said,

Lori said,

SeanH said,

Dan Lufkin said,

Disfraz said,

Jerry Friedman said,

Andrew said,

W. Kiernan said,

Sili said,

D. Sky Onosson said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta