Coherence Quiz answers

As promised, the results of yesterday's little experiment on "Coherence of sentence sequences" are here.

A tabular summary:

 Question Correct Wrong
1 166 (98%) 4 (2%)
2  135 (80%)  33 (20%)
3 167 (99%) 2 (1%)
4 158 (93%) 12 (7%)
5 113 (67%) 56 (33%)
6 152 (90%) 17 (10%)
7 165 (97%) 5 (3%)
8 115 (68%) 55 (32%)
9 169 (99%) 1 (1%)
10 167 (98%) 3 (2%)
11 163 (96%) 7 (4%)
12 137 (81%) 32 (19%)

So the survey respondents (as a whole) guessed the original order of all twelve sentence-pairs correctly — though the margins varied from 2-to-1 to 99-to-1. The overall percent correct was 89%, though of course that percentage will depend on the particular mix of examples.

(The counts don't all sum to the same row-wise value because a couple of participants left some answers blank — there's probably a way to get Qualtrics to prevent that, but I didn't figure it out in time…)

One of the cases that people had trouble with, Pair 5, comes from a NYT story from 7/8/1998, where the second sentence is the first of three specific instances of the generalization stated in the first one::

Pay no attention to the soaring stock price of Rambus Inc., which has shot up 54 percent in the past two weeks.

Geoff Tate, the president and chief executive of Mountain View-based Rambus, shrugs off the rise as Wall Street's odd response to "old news."

"We try to ignore the stock price when it goes up or down," said Tate, 43. "When our stock price goes up, I am quick to remind everybody here, `This is great, but don't think this means we got smarter today. We're the same people we were before."'

Rambus is an anomaly in many ways. In a chip industry that's hitting a rocky spell, Rambus is on a roll. Rambus doesn't even make anything; it licenses its technology to chipmakers, and encourages computer makers to use chips made with Rambus technology.

In a stock market that rewards companies for short-term earnings, Rambus keeps its eye on the horizon. Yet unlike other companies whose stocks levitate on promises of future profits — firms like Yahoo and spring to mind — Rambus operates in the black, and has ever since its initial public offering last year.

The other troublesome case, Pair 8, comes from a passage in chapter 12 of Wuthering Heights, where Mrs. Linton is enacting insanity:

She could not bear the notion which I had put into her head of Mr. Linton’s philosophical resignation.  Tossing about, she increased her feverish bewilderment to madness, and tore the pillow with her teeth; then raising herself up all burning, desired that I would open the window.  We were in the middle of winter, the wind blew strong from the north-east, and I objected.  Both the expressions flitting over her face, and the changes of her moods, began to alarm me terribly; and brought to my recollection her former illness, and the doctor’s injunction that she should not be crossed.  A minute previously she was violent; now, supported on one arm, and not noticing my refusal to obey her, she seemed to find childish diversion in pulling the feathers from the rents she had just made, and ranging them on the sheet according to their different species: her mind had strayed to other associations.

‘That’s a turkey’s,’ she murmured to herself; ‘and this is a wild duck’s; and this is a pigeon’s.  Ah, they put pigeons’ feathers in the pillows—no wonder I couldn’t die!  Let me take care to throw it on the floor when I lie down.  And here is a moor-cock’s; and this—I should know it among a thousand—it’s a lapwing’s.  Bonny bird; wheeling over our heads in the middle of the moor.  It wanted to get to its nest, for the clouds had touched the swells, and it felt rain coming.  This feather was picked up from the heath, the bird was not shot: we saw its nest in the winter, full of little skeletons.  Heathcliff set a trap over it, and the old ones dared not come.  I made him promise he’d never shoot a lapwing after that, and he didn’t.  Yes, here are more!  Did he shoot my lapwings, Nelly?  Are they red, any of them?  Let me look.’

‘Give over with that baby-work!’ I interrupted, dragging the pillow away, and turning the holes towards the mattress, for she was removing its contents by handfuls.  ‘Lie down and shut your eyes: you’re wandering.  There’s a mess!  The down is flying about like snow.’

I went here and there collecting it. 

‘I see in you, Nelly,’ she continued dreamily, ‘an aged woman: you have grey hair and bent shoulders.  This bed is the fairy cave under Penistone crags, and you are gathering elf-bolts to hurt our heifers; pretending, while I am near, that they are only locks of wool.  That’s what you’ll come to fifty years hence: I know you are not so now.  I’m not wandering: you’re mistaken, or else I should believe you really were that withered hag, and I should think I was under Penistone Crags; and I’m conscious it’s night, and there are two candles on the table making the black press shine like jet.’

‘The black press? where is that?’ I asked.  ‘You are talking in your sleep!’

‘It’s against the wall, as it always is,’ she replied.  ‘It does appear odd—I see a face in it!’

‘There’s no press in the room, and never was,’ said I, resuming my seat, and looping up the curtain that I might watch her.

‘Don’t you see that face?’ she inquired, gazing earnestly at the mirror.

And say what I could, I was incapable of making her comprehend it to be her own; so I rose and covered it with a shawl.

‘It’s behind there still!’ she pursued, anxiously.  ‘And it stirred.  Who is it?  I hope it will not come out when you are gone!  Oh!  Nelly, the room is haunted!  I’m afraid of being alone!’

I took her hand in mine, and bid her be composed; for a succession of shudders convulsed her frame, and she would keep straining her gaze towards the glass.

‘There’s nobody here!’ I insisted.  ‘It was yourself, Mrs. Linton: you knew it a while since.’

‘Myself!’ she gasped, ‘and the clock is striking twelve!  It’s true, then! that’s dreadful!’

Her fingers clutched the clothes, and gathered them over her eyes.  I attempted to steal to the door with an intention of calling her husband; but I was summoned back by a piercing shriek—the shawl had dropped from the frame.

Overall, I find it surprising and interesting that random out-of-context sentence pairs are so often more coherently interpretable in the original order than in reversed order — even if we eliminate examples with obvious reference-chain issues.

As one of the commenters noted, this task seems likely to be a difficult one for automatic analysis — and that's what we've found in trying some of the techniques recommended in the literature, as well as various improvements.

I suspect that there are differences among authors and genres in how easy or difficult it is to infer the original order of random sequences from their works. And there are probably individual differences in how good people are at making such guesses.



  1. Belial Issimo said,

    April 18, 2019 @ 11:29 am

    The row values all sum to 170 or 1 or 2 less, except for Q11 which sums to 180. Is there a typo in the Q11 values?

    [(myl) Yes — thanks — it was 173 instead of 163 in the first column — fixed now,.]

  2. J.W. Brewer said,

    April 18, 2019 @ 11:50 am

    Seeing the larger passages creates a (non-random, non-blind) opportunity to see how easy or hard it might be to pick two-sentence sequences that resist this sort of decoding when read in isolation. I can from a quick skim identify two places in the Bronte where it seems to me that two adjacent sentences would flow equally well if put in the other order: a) at the end of the first paragraph, where the switch would put "A minute previously …" before "Both the expressions . .. ."; and b) at the very end, where the switch would put "I attempted to steal …" before "Her fingers clutched . . ." But those are ones where I think the flow of the discourse seems approximately equally natural in both orders, not ones where the order actually used by the author seems more awkward or less natural than the alternative created by the switch.

  3. Yuval Pinter said,

    April 18, 2019 @ 3:37 pm

    This task, i.e. predicting if the given sentence-2 follows sentence-1, is gaining increasing attention in the NLP literature, notably the upcoming NAACL's Best Paper BERT (mentioned in this Log post) which uses this exact signal for multi-taskily training its contextual embeddings.

    [(myl) It's BERT from which we've gotten the best results on this task — indeed the only results better than chance — but BERT is still much worse than people are…]

  4. Brett Reynolds said,

    April 18, 2019 @ 5:54 pm

    This is an ESL activity that I use regularly. Take a given paragraph, could be from something a studen has written or a text that the class is studying or going to study, put the sentences in random order, and ask the students to put them back in order. Finally ask students to justify their order.

  5. Michael Watts said,

    April 18, 2019 @ 7:30 pm

    the survey respondents (as a whole) guessed the original order of all twelve sentence-pairs correctly — though the margins varied from 2-to-1 to 99-to-1.

    If you calculate directly from the data rather than from the rounded-off percentages, the margins vary from 2-to-1 to 169-to-1, a much much wider range.

    This is an ESL activity that I use regularly. Take a given paragraph, could be from something a studen has written or a text that the class is studying or going to study, put the sentences in random order, and ask the students to put them back in order.

    This task gets a lot easier as you add more text. I view the two-sentences version as sort of inherently conceptually iffy because it is almost always the case that there _could be_ a perfectly coherent text featuring either order. Examining, say, a page of text (in the order it was written), and then determining whether that text is coherent or incoherent, is easy (for a human). With just two sentences, there's not enough context to answer definitively — both orders are coherent and neither order is incoherent, except in rare cases. It's just that one order is more likely.

    I recorded my answers along with their thought processes yesterday; I'll offer them here (I refer to the sentences as #1 and #2 according to the order of the answer I picked):

    Q1. Red – #2 elaborates on #1

    Q2. Black – #1's "according to PHLS researchers" is typical of the first sentence in an article; #2 explains why #1 is true

    Q3. Black – #2 explicitly draws a contrast ("simply") with the context set by #1

    Q4. Red – #1 is typical of the beginning of a story. However, both orders are extremely plausible.

    Q5. Red – #2 describes something ("Rambus") introduced by #1.

    Q6. Red – #2 explicitly draws a contrast with #1. ("Before… . Now… .") This could theoretically happen in either order, but #2 bears conversational focus and should therefore occur second.

    Q7. Red – #2 describes a natural consequence of #1.

    Q8. Red – #2 makes sense as a response to #1. But #1 makes no sense as a response to #2.

    Q9. Black – #1 explicitly introduces a concept ("curious contrivances called druggs"); #2 defines it.

    Q10. Red – #2 explains #1. Both orders are plausible, but "rang" in #1 gives me a feeling that it should come first. If #1 was "The phone was ringing, but I let it ring.", this would be near-totally ambiguous.

    Q11. Black – #2 responds directly to #1. The red order is nonsense.

    Q12. Red – this looks like a pretty typical quote-with-"said"-tag-inserted-in-the-middle. Both orders are very plausible.

    I see that I got Q5 wrong.

  6. Andrew Usher said,

    April 18, 2019 @ 10:00 pm

    I don't think 2 to 169 is 'much wider' than 2 to 99, especially given the statistical significance. But it is mathematical sloppiness to go through the percentages they way he did, I wouldn't hesitate to criticise that.

    I got both of these right, and presumably, all the others as well. 8 was somewhat a guess (as I noted in the other thread), but for 5 what tipped me off was the repetition of 'Rambus'. The other order would surely have dictated the use of a pronoun at the start of the sentence. For each one, all you do really is mentally read them to yourself and see whether a human would have actually written that. Perfect? No, but close as one gets.

    I'm not sure that adding more sentences would improve results, except for a pure narrative, and even then the time required would increase exponentially.

    k_over_hbarc at yahoo dot com

  7. Michael Watts said,

    April 18, 2019 @ 10:20 pm

    I'm not sure that adding more sentences would improve results, except for a pure narrative, and even then the time required would increase exponentially.

    We need to specify what "adding more sentences" means. If the task is "here are the sentences, individually; arrange them into the order in which they were originally written", two sentences is not enough to perform the task reliably. 50 sentences gives you so many options that again you won't be able to perform the task reliably. But 7 sentences is significantly easier than either of those extremes.

    If the task is "here are the sentences in an order chosen by the experimenter; read them in this order and judge (whether or not they form a coherent text) or (whether a normal human could have written them in this order)" or something of that kind, then more sentences is better, and accuracy on the task should basically just increase with the amount of text provided.

  8. eub said,

    April 19, 2019 @ 10:07 pm

    Were these sampled uniformly over sentences, or over text positron, or something else? This would affect sentence length in the examples.

    [(myl) The texts were divided into sentences via this script:

    use Lingua::EN::Sentence qw( get_sentences add_acronyms );
        my $sentences = get_sentences($_);
        foreach my $sentence (@$sentences){
    	print "$sentence\n";

    And random adjacent pairs were selected via this one:

    NLINES=`wc $1 | gawk '{print $1}'`
    sed -n "$WHICHLINE,$((WHICHLINE+1))p" $1


  9. Andrew Usher said,

    April 20, 2019 @ 11:36 am

    Michael Watts:

    The first was my intent. And, while I understand your reasoning, I am still not sure that 7 sentences would be easier (faster or more accurate) in general. We don't have a way to easily test it, though the code just given might be modifiable in that way.

