The long get longer

« previous post | next post »

Al Filreis's Modern and Contemporary American Poetry is one of the most successful MOOCs. In particular, participants' involvement is sustained over time to an unusual extent — here's the daily volume of forum posts and comments for the first two months of ModPo2, which is currently underway:

Overall forum participation is high, but the distribution of thread lengths is highly skewed: some posts get 60-70 comments, while others get much smaller numbers:

About 75% get none at all (not shown on the graph above).  Here's the empirical probability of continuing a discussion thread as a function of the thread length (where length=1 is the original post with no comments yet):

In other words, the longer a thread is (up to length 20 or so, anyhow), the more likely it is to be continued.  Although we haven't tried yet, it seems likely that a "rich get richer" process of an appropriate sort can approximate this pattern fairly well. A small amount of poking around in the literature has left me uncertain whether this is a more-or-less universal feature of discussion forums.

A more interesting question is the nature of the substantive factors that contribute to continuing or stopping at any given stage.

[Joint work with Ritika Khandeparkar]

Update — As I suspected, this is clearly NOT a universal property of new-media threads. Here's the same plot for comments over the past few years of LLOG (since we changed to release of WordPress that filters spam comments, and began leaving comments open on about 87% of our posts):

(I've counted only posts where the possibility of commenting existed.)

It's interesting to compare the patterns of continuation-probability in zebra-finch songs, Joseph Conrad's paragraphs, conversational breath groups, and Walt Whitman's lines. Except for Joseph Conrad, all of these patterns are conspicuously non-markovian …


  1. Lane said,

    December 4, 2013 @ 9:01 am

    Does this distinguish the distinct possibilities that

    a) some posts are just "hits" and will get lots of comments, and that the distribution of comment-attractiveness of posts just follows this kind of line, or

    b) the more comments there are already, the more likely people are to comment (because there are comments, not because the original posting is interesting)


    [(myl) No — and there are other possible stories as well. For example, maybe there's a distribution of communities of various sizes, and the members of a given community are highly likely to comment on a post by another community member. Several of these stories might simultaneously be true.

    There's a more general — and important — point here. As is now well known, social networks generally have statistical properties that are well approximated by a random Dirichlet process. You could use this fact to argue for the idea that social influence should not be explained in terms of the behaviors of social actors, since the network of connections mediating that influence might be the result of a random process in which such behavior plays no role. But that would obviously be a mistake.

    Similarly, there are many different kinds of hypothetical processes that all generate power-law distributions — and it would be a serious mistake to suppose that any given phenomenon exhibiting such a distribution should be explained in terms of a particular process-type from this set.]

  2. rpsms said,

    December 4, 2013 @ 12:23 pm

    My take is that people are less likely to comment on the "source material" but seem to relish the idea of both announcing that they are in complete agreement with a comment regarding the source material and calling people of with a differing opinion a poopy head.

  3. Jerry Friedman said,

    December 4, 2013 @ 12:28 pm

    Surely it's also true of LL comment threads that the longer they are, the likelier they are to be continued, up to some number (which I'll bet is greater than 20).

    To judge from the ModPo FAQ, apparently people don't frequently ask what the difference between "modern" and "contemporary" is. From other things I saw at he site, I'm guessing it's the difference between "modern" and "postmodern".

  4. X said,

    December 4, 2013 @ 1:11 pm

    There was an interesting study on pop music that tried to tease out the difference between "popular because it's good" and "popular because it's popular", and their main finding was that it's primarily the latter. There's some influence due to quality, but it's mainly down to random luck making a particular song a front-runner and then snowballing to self-sustaining popularity. I imagine that forum posts follow the same patterns (and perhaps would be amenable to the same kind of study?).

    Cite: Salganik, Dodds, Watts – Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market (2006)

  5. Mike C said,

    December 4, 2013 @ 1:51 pm

    I wonder what the differences are between the comment thread of an LL post (or any blog post) and a thread on the ModPo forum, where the "Statement of Accomplishment" requirements include posting to a thread at least once a week.

  6. J. W. Brewer said,

    December 4, 2013 @ 4:30 pm

    The Salganik et al paper is interesting but does not seem like a particularly plausible model of how song popularity is determined in the real world, at least historically.

  7. Mark said,

    December 4, 2013 @ 5:11 pm

    If the forum software tends to "bump" the most recently posted-to threads to the top of the list then that would help drive a lot of that behavior, right?

  8. AEM said,

    December 4, 2013 @ 9:02 pm

    I wrote a python script to scrape data for Language Log comment threads a few months ago. (Available here:

    Here are some of the resulting graphs:

    (I emailed myl about this at the time, but I never got a response.)

    [(myl) Sorry for being such a lousy correspondent — your email must have been an innocent victim of some semi-competent spam trap. Anyhow, thanks!]

  9. Jerry Friedman said,

    December 5, 2013 @ 12:43 pm

    Okay, maybe it's not true of LL comment threads.

  10. Nathan said,

    December 5, 2013 @ 3:35 pm

    I assume these measurements would be very different for blog comments vs. forum comments. There are many differences in the two environments, especially threading.

  11. richard said,

    December 5, 2013 @ 3:53 pm

    The pop song study reminds me of another one I read long ago looking at Beta vs. VHS (two rival formats for video cassettes, for you younguns born after the dawn of the DVD), which modeled the potential marketshare as a negatively curved (saddle-shaped) space, the sides of which represented market domination by one format or the other. As I remember, they were able to model the win of VHS over Beta repeatedly using a random walk beginning with data from the early period of the rivalry, and concluded that nothing succeeds like success.
    Hmmm. I'll have to look for that paper….

  12. Xmun said,

    December 6, 2013 @ 12:54 am

    What does MOOC stand for?

  13. Dan Lufkin said,

    December 6, 2013 @ 9:57 am

    @Xmun — That's Massive Open Online Course. Google around on MOOC; there are hundreds of good ones out there.

  14. Mark said,

    December 7, 2013 @ 9:33 pm

    Thanks for your useful info.To judge from the ModPo FAQ, apparently people don't frequently ask what the difference between "modern" and "contemporary" is. From other things I saw at he site, I'm guessing it's the difference between "modern" and "postmodern".

RSS feed for comments on this post