Phrase Detectives

« previous post | next post »

Massimo Poesio writes:

Phrase Detectives is a game-with-a-purpose designed to gather data about anaphora. We put online about 1.2 million words – half Wikipedia, half fiction from the Project Gutenberg (the plan is to make all the data freely available through LDC and the Anaphoric Bank), and ask our players to tell us what an anaphoric expression refers to, or to check what other ‘detectives’ have done. The game collects 8 judgments for every anaphoric expression, and each interpretation is validated by 5 other players, so that the data can also be used to study disagreements in anaphoric interpretation. We have collected over 700,000 anaphoric judgments in this first year and around 300,000 validations, and we’d like to complete the annotation of the first 1 million words before moving on to release 2 of the game (as you’ll see if you play, there are several limitations), so we started a competition – $500 to whomever gets the most points in January – to double the number of players (we have around 1500, it would be nice to get to at least 3000).

As Massimo suggests, the goal is to create a large text corpus annotated for anaphora and coreference.  Annotations of this kind are used by linguists to determine the norms of language structure and use, by computational linguists to train and test their programs, and by psychologists to develop and test hypotheses about the mechanisms of human language processing.

In a well-ordered universe, such corpora would also be of interest to those who develop usage advice.  A couple of years ago, I discussed some earlier work of Massimo’s in that connection (“A test kitchen for stylistic recipes“, 6/1/2008) —  though I don’t think that release 1 of Phrase Detectives deals with discourse deixis.

Anyhow, I urge everyone to participate in crowd-sourced linguistic annotation of this kind.

And shouldn’t there be some way to make things like this part of the educational curriculum, so that students could learn about grammar while simultaneously contributing to new research?



19 Comments

  1. Tom Saylor said,

    January 16, 2010 @ 2:48 pm

    I have to say I found the game’s questions so oddly phrased that I couldn’t be sure what they were asking. They seem to confuse use and mention, asking whether a highlighted phrase (like “the boy”) was previously *mentioned* when what they mean to ask, it seems, is whether the highlighted phrase is used to refer to something previously mentioned.

  2. Rosie Redfield said,

    January 16, 2010 @ 3:16 pm

    I agree with Tom. I kept getting answers wrong because I couldn’t figure out what they were asking about, or because none of the choices seemed correct.

    The button labels are confusing too, and the exclamation marks should be ditched. (Could they honestly believe that using exclamation marks makes people feel enthusiastic?)

  3. Jerry Friedman said,

    January 16, 2010 @ 3:45 pm

    The instructions could be a lot better, but I eventually figured out what to do in most cases. However, in situations where the correct answer is “hidden” inside another phrase, I still don’t know what to do, so I’ve been skipping those.

  4. John Cowan said,

    January 16, 2010 @ 4:33 pm

    Yes, the hidden-phrases dialog box pops up, but selecting a specific phrase does nothing.

  5. micah said,

    January 16, 2010 @ 4:42 pm

    At least on my browser, it only looks like it does nothing; if I click on it and mouse over the overarching phrase again, I can see that it was actually selected.

  6. MJ said,

    January 16, 2010 @ 5:37 pm

    I really can’t imagine that the strange way they have of phrasing these questions is more intuitive to the untutored than not conflating use and mention. Phrases aren’t properties of other phrases, and phrases don’t refer to phrases, and no one so far as I know talks as though they are or they do. Why go out of your way to phrase things incorrectly and counterintuitively?

  7. Massimo Poesio said,

    January 16, 2010 @ 6:13 pm

    Thanks much Mark for the advertising – we need all the players we get! To the many people who have left comments complaining about our use of the term ‘phrase’ – I sympathize!! I myself shudder every time I read the instructions … But please don’t let that put you off. The problem is finding a way of phrasing the instructions that non-linguist would understand. When we started testing our interface at Essex, none of the subjects (ok, Essex students) could understand the ‘entity’ (let alone, ‘discourse entity’) / mention terminology that we were using, as in

    Mark the previous mention of the same entity

    so we tried to find another way of phrasing the instructions and what we have now seemed to be the most clear – which is not to say that it’s perfect! We’d be delighted if some of you could find a clearer way of phrasing it! If you do like to play, just think about finding the nearest mention of the same entity. (And yes, there are lots of other problems, as with discourse deixis, nested mentions being difficult to catch … we hope to improve this, we do want to eventually annotated 100M words!)

    Again thanks to all

  8. Sili said,

    January 16, 2010 @ 6:17 pm

    Now I feel silly for not havind done my research better.

    But I’m thrilled to know that public boredom is being harvest in the name of linguistics as well as astronomy.

    I have to say, though, that this is the first I hear of this project, while Galaxy Zoo was on several of the space/science related blogs I follow. Perhaps they could ask Chris Lintott & al for advice about PR.

    But their orange mascot is cute!

  9. Jerry Friedman said,

    January 16, 2010 @ 8:08 pm

    How would you phrase it correctly and intuitively? Something like, “There may be several phrases that refer to the same thing as the highlighted phrase, in which case you should select the one closest to the highlighted phrase.”?

    Anyway, people do often talk about phrases referring to other phrases, as here:

    Often a pronoun takes the place of a particular noun. This noun is known as the antecedent. A pronoun “refers to,” or directs your thoughts toward, its antecedent.

    A reflexive pronoun refers back to the subject of a sentence….

         I learned a lot about myself at summer camp. (Myself refers back to I.)

    I think this sort of thing is common.

  10. fs said,

    January 17, 2010 @ 5:15 am

    Is it just me or is cataphora not accounted for? I’m trying to link “they” to “the poor girl” (and something earlier in the text) in “The next morning, as they were going through the dark gate, the poor girl looked up at Falada’s head, and cried:”, but it seems I’m not allowed to click “the poor girl” in this situation.

  11. tutu said,

    January 17, 2010 @ 7:50 am

    Yes, only previous mentions can be selected, but that isn’t explained. I still don’t get what a “property” is supposed to be. And when a phrase refers to an entire sentence there’s no way to mark that.

  12. Tom Saylor said,

    January 17, 2010 @ 9:24 am

    @Jerry Friedman

    I think there are two senses of “refer to”: one in which a linguistic entity is said to refer to a nonlinguistic entity (as in “Barack Obama” refers to the President of the U.S.) and another in which a pronoun is said to refer to its antecedent. It’s used in the latter sense in the instance that you cite, and I don’t think anyone has a problem with that. But there isn’t any established use of “refer to” in which (1) an ordinary noun phrase is said to refer to another noun phrase (as in “Barack Obama” refers to “the President of the U.S.”) or (2) a pronoun’s antecedent is said to refer to the pronoun (as in “the boy” refers to “him”). I think it’s the game’s use of (1) and (2) that people are objecting to.

  13. Massimo Poesio said,

    January 17, 2010 @ 1:38 pm

    @fs – yes, you can only mark anaphora. The way we suggest doing it is by marking cataphoric reference as anaphora and then we’ll try to identify cataphora in the post-processing phase. E.g., in

    if she will still be around in one year, Mary …

    (with she being the first mention of discourse entity Mary), just mark ‘she’ ‘never mentioned before’ and ‘Mary’ as having ‘she’ as the previous mention.

    @tutu, re Properties: sigh! What we were trying to do was to distinguish between noun phrases that introduce or refer to discourse entities, and noun phrases that have to be interpreted predicatively. E.g., in

    My father is a policeman

    ‘My father’ updates the discourse model by introducing a new discourse entity, whereas ‘a policeman’ just expresses a property of ‘My father’. Our suggestion: ask yourselves if that noun phrases can be viewed as introducing a new entity or referring back to an already introduced entity

  14. Massimo Poesio said,

    January 17, 2010 @ 1:40 pm

    @Tom Saylor

    yes, phrases do not refer to other phrases, so people who object to the terminology we use in the game are absolutely right – I wish we had found a better way of explaining it but this seemed to be what our subjects found clearer!

  15. Nathan Myers said,

    January 17, 2010 @ 8:32 pm

    If even professional linguists cannot find ways to express themselves clearly, are we all doomed? I wonder if many of the people who become linguists do because they find it so hard to communicate.

  16. Grep Agni said,

    January 18, 2010 @ 10:49 am

    I recently changed computers and didn’t port this particular bookmark. When I last played, what really bothered me was the bizarre ‘phrases’ I was supposed to mark up. I was once asked about an opening parenthesis! More usually the automatic parser simply identifies thing that aren’t really constitutive phrases. For example, an article on Wikipedia about thumb twiddling* containes the sentence Most people tend to twiddle their thumbs in the direction where the thumb currently at the top goes towards the fingers. I believe I was asked to mark the prhase the top goes towards the fingers.

    There is a way to mark these errors, but it’s clumsy and worth very few points, especially considering how much time and effort it takes.

    *Not the most boring thing I’ve ever read, but close.

  17. Massimo Poesio said,

    January 18, 2010 @ 5:34 pm

    @ Grep Agni

    ah, yes! Sorry about the bad markables, but unfortunately identifying by hand the markables it’s not really feasible, so we have to use an automatic parser, which mostly does OK but occasionally gets confused. We ourselves spend most of our time ‘behind the scenes’ trying to correct these problems but it’s slow work of course. Our suggestion is to leave a comment when someone finds those markables, skip them, and then we’ll go fix them – but alas this is all done on a voluntary basis as there are no points for that (we really need to find a way for scoring those suggestions!)

    Massimo

  18. svan said,

    January 18, 2010 @ 11:35 pm

    I agree with many of the above comments, but I also have to say I just honestly didn’t find this game remotely *fun* (and I assume that was the intent). And I am one to normally enjoy language-y games.

  19. Sarl said,

    July 14, 2011 @ 2:05 pm

    @Massimo Poesio,

    Properties: sigh! What we were trying to do was to distinguish between noun phrases that introduce or refer to discourse entities, and noun phrases that have to be interpreted predicatively.</blockquote

    I’ve been playing this game for a while now and I wish I’d read this before. In the “detectives conference” questions, which I understand to be “correct the other guy’s annotation” mode, I notice that other annotators have widely ranging opinions on what a property is supposed to be. I’ve wrongly annotated them quite a few times as well, based on my varying idea of what they’re supposed to be.

    I urge you to rethink your explanation of properties and maybe change their name. I wish you and your project all the best (more linguistic data is good!) but I fear the worst for the quality of your property explanation.

    (PS.: Is the concept of a property explained anywhere in your papers?)

RSS feed for comments on this post