## Macroscopic bosons among us

In the spring of 1995, I was serving on an academic "Planning and Priorities" committee, and some of my fellow committee-members became concerned that there were too many graduate courses, and that this was a symptom of inadequate focus on undergraduate education. I agreed on both counts, though I also felt that an excessive number of grad courses was — and is — generally a bad thing for graduate programs as well.

Anyhow, I became curious about what the distribution of course registrations was actually like. The following note, unearthed after 17 years and recycled as a Language Log post, was the result.  I fished it out of the midden-heap of old email because of its marginal relevance to the July 4 announcement from CERN. It turns out that graduate students, like the Higgs particle, are bosons — or at least, their course-registration choices obey Bose-Einstein statistics

As background, we had been given some historical data that included a disturbing table showing the distribution of student enrollments over graduate courses. Expressed as a percentage of all graduate courses offered during the time-period in question, the numbers were:

 Number of students: 1-3 4-6 7-10 11-15 16-20 21-30 30+ Percent of courses: 26.8 18.9 20.7 19.8 6.8 4.5 2.5

Dear Colleagues,

I was curious to see whether the distribution of SAS graduate CUs ["course units"] among SAS graduate courses is well approximated as a random process. A crude attempt at modeling suggests that a specific, simple model works almost eerily well. This suggests that there might be some limits to what sorts of distributions of course sizes we can expect to see, if the boundary conditions are left as they are.

The only parameters that I used were the number of courses taught and the number of CUs to be distributed among them. We consider the number of ways to distribute r things (the number of CUs) into n bins (the number of courses), and we assume (implausibly enough) that all ways of doing it are equally likely. This seems like the first null hypothesis to entertain.

There are at least two different plausible meanings of "all ways of doing it" in this case: so-called Maxwell-Boltzmann statistics, in which we assume that the things being parceled out are individually distinguishable, so that there are obviously n^r ways of assigning things to bins; and so-called Bose-Einstein statistics, in which we assume that the things being parceled out are indistinguishable, so that a "way of doing it" is just an assignment of counts to bins, and
the number of such assignments turns out to be the number of ways of choosing n-1 things from n+r-1 things.

(I assume that Fermi-Dirac statistics, in which no two things can occupy the same bin, is inappropriate here — the insertion of an appropriate joke at this point is left as an exercise for the reader.)

To my surprise, the Bose-Einstein model seems to fit the facts rather well. It may be a commonplace of social science that this tends to happen in situations of free choice; but I don't think it is always true, since (for instance) the distribution of population over cities follows a Zipf's-law type of distribution, which is substantially different.

I have also tried a bit of prediction, by looking at what the model says would happen if the number of courses were cut in half.

Anyway, I submit this for what little it is worth — at least a minute's amusement. My memory of the approximation formulae for occupancy problems is a bit rusty, and I apologize if I have got them wrong.

The first parameter of the model, the number of graduate courses taught by SAS standing faculty in 1993-94, was given in Frank's table as 440. The total number of CUs was not given: I approximated it by adding up the number of courses in each category multiplied by the number of CUs in the middle of the category:

118*2 + 83*5 + 91*8 + 87*12 + 30*18 + 20*25 + 11*31 = 3804

Thus I took 3800 as the second parameter. Modest changes in this parameter will not affect the results very much.

With either kind of way of counting, a certain number of bins will be expected to wind up with a count of zero. In order to make things work out, I increased the hypothesized number of courses so as to make the expected number of bins with non-zero counts come out to 440.

Thus for the Bose-Einstein way of counting, we have to assume that there were 497 courses of which 57 had no enrollment. For the Maxwell-Boltzmann way of counting, we have to assume that there were 444 courses of which
4 had no enrollment. In the tables below, the "Observed %" are just the course counts given in Frank's table, divided by 440; the "Corrected %" are the course counts from Frank's table divided by 497 or 444, as appropriate; and the "Model %" are the percentages predicted by the random-process model.

The Bose-Einstein model fits remarkably well — it seems that SAS graduate students are rather like photons. A (probably inappropriate use of the) Chi-square test suggests that the difference is well within the range of expected chance fluctuations. The Maxwell-Boltzmann model is not "clumpy" enough, in this case as for most real-world phenomena.

If the model fit is not a complete fluke, then we can conclude that the most straightfoward way to change the distribution is to change the parameters of the situation. Only two parameters were used: the number of courses and the number of CUs — and so we either have to reduce the number of courses offered, or increase the number of CUs
taken, in order to change things very much.

In this connection, there is one sobering fact about the Bose-Einstein distribution. If Qk is the expected number of bins with k things in them, then in most cases, I believe it is true that Q0 > Q1 > Q2 … Qn Thus as the number of courses decreases, or the number of students increases, the average number of students per course obviously goes up; but it always remains true that there are more smaller courses than larger ones. As a small empirical indication, I have appended the predictions of the Bose-Einstein model for the case in which the number of courses is cut in half, while the number of students stays the same.

Bose-Einstein model: 3800 indistinguishable things distributed among 497 bins, on the assumption that all distinguishable allocations are equally likely to occur.

 Students per course: 0 1-3 4-6 7-10 11-15 16-20 21-30 30+ Observed %: ? 26.8 18.9 20.7 19.8 6.8 4.5 2.5 Corrected % 11.4 23.7 16.7 18.3 17.5 6.0 4.0 2.5 Model %: 11.6 27.3 18.9 16.4 11.9 6.4 5.4 2.2

As an exercise in predicting the consequence of cutting the number of courses in half, here are the model predictions for 3800 indistinguishable things distributed among 248 bins, on the assumption that all distinguishable allocations are equally likely to occur:

 Students per course: 0 1-3 4-6 7-10 11-15 16-20 21-30 30+ Model %: 6.1 16.2 13.4 14.4 13.5 9.9 12.4 8.6

Maxwell-Boltzmann model: 3800 distinguishable things distributed among 444 bins, on the assumption that all distinguishable allocations are equally likely to occur. This one does not fit!

 Students per course: 0 1-3 4-6 7-10 11-15 16-20 21-30 30+ Observed %: ? 26.8 18.9 20.7 19.8 6.8 4.5 2.5 Corrected % 0.9 26.6 18.7 20.5 19.6 6.8 4.5 2.5 Model %: 0.9 29.6 49.5 19.0 0.9 0 0 0

_____Best wishes,

_____Mark Liberman

This may be a well-known result in sociology, but if so, I don't know how to look it up.

The history of "Bose-Einstein statistics" is an interesting one. Wikipedia explains:

While presenting a lecture at the University of Dhaka on the theory of radiation and the ultraviolet catastrophe, Satyendra Nath Bose intended to show his students that the contemporary theory was inadequate, because it predicted results not in accordance with experimental results. During this lecture, Bose committed an error in applying the theory, which unexpectedly gave a prediction that agreed with the experiment (he later adapted this lecture into a short article called Planck's Law and the Hypothesis of Light Quanta). The error was a simple mistake—similar to arguing that flipping two fair coins will produce two heads one-third of the time—that would appear obviously wrong to anyone with a basic understanding of statistics. However, the results it predicted agreed with experiment, and Bose realized it might not be a mistake at all. He for the first time took the position that the Maxwell–Boltzmann distribution would not be true for microscopic particles where fluctuations due to Heisenberg's uncertainty principle will be significant. […]

Physics journals refused to publish Bose's paper. Various editors ignored his findings, contending that he had presented them with a simple mistake. Discouraged, he wrote to Albert Einstein, who immediately agreed with him. His theory finally achieved respect when Einstein sent his own paper in support of Bose's to Zeitschrift für Physik, asking that they be published together. This was done in 1924.

The reason Bose produced accurate results was that since photons are indistinguishable from each other, one cannot treat any two photons having equal energy as being two distinct identifiable photons. By analogy, if in an alternate universe coins were to behave like photons and other bosons, the probability of producing two heads would indeed be one-third (tail-head = head-tail). Bose's "error" is now called Bose–Einstein statistics.

Apparently we cannot treat any two graduate students having equal course requirements as being two distinct identifiable people…

Anyhow, this (Bose's discovery, not the indistinguishability of graduate enrollments) is one of my favorite examples of fruitful ignorance.

1. ### D.O. said,

July 5, 2012 @ 6:17 pm

But why do you think that students are indistinguishable? It's CUs that are. And given that students on average take several courses each semester and for them these CUs are indistinguishable you might have a mixed explanation: Fermi-Dirac for the students, Bose-Einstein for CUs conditioned on a student. I know, I know, I am trying to ruin a joke, you will never invite me to a party.

2. ### D.O. said,

July 5, 2012 @ 6:32 pm

Errrr, fast typing sloppy editing. It should be Bose-Einstein or Boltzmann for the students, Fermi-Dirac for CUs conditioned on the student.

3. ### Jason Eisner said,

July 6, 2012 @ 12:53 am

D.O. asks: "But why do you think students are indistinguishable?"

Well, apparently the Bose-Einstein model fits these data. I think you're asking: why might one sensibly expect it to fit?

A random partition of indistinguishable objects, imposed upon the students from above, would indeed be an odd model of course selection, but it is not the only way to obtain Bose-Einstein statistics. Here is a more plausible construction in which the students instead act in series:

Suppose that when a student is choosing where to fulfill a CU, she chooses course c with probability proportional to n(c)+1, where n(c) is the number of students previously enrolled in c. This so-called Pólya urn model is an oversimplified model but not too crazy. It too yields the observed Bose-Einstein statistics, if I'm not mistaken …

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC432601/

[(myl) Cool! If only Google Scholar had existed in 1995, I might have learned (Wen-Chen Chen, "On the Weak Form of Zipf's Law", Journal of Applied Probability, 17(3) 1980) that when

… we consider more general Dirichlet-multinomial urn models which include the Bose-Einstein models above as special cases and the Maxwell-Boltmann models as limiting cases […] the weak form of Zipf's law still holds […] Parallel results concerning the stronger form of Zipf's law will be reported somewhere else.

But I recall trying to fit a Zipf's-law model to the student registration data, and finding that it fit rather badly. Perhaps I made a mistake — I don't now recall how I did it. Anyhow I didn't know about the relationship between Bose-Einstein statistics and Zipf's Law, and clearly I should have.

Apparently words, citizens, dollars, books, and species are also all bosons? And is this something that sociologists all know about?]

4. ### D.O. said,

July 6, 2012 @ 1:14 am

@Jason Eisner. My kvetch was with CU=student equation.

5. ### Daniel Johnson said,

July 6, 2012 @ 8:40 am

There is a natural explanation. Economist Arthur De Vany made the same observation of movie attendance in his book "Holleywood Economics: How Extreme Uncertainty Shapes the Film Industry" in chapter 2 "Bose-Einstein dynamics and adaptive contracting in the motion picture industry". He shows there that a Bose-Einstein distribution can be expected if a movie goer chooses the movie based on their own individual preference without consideration for how many others might choose that movie also.

[(myl) This doesn't seem consistent with Jason's Dirichlet Urn Model, where the more people choose to view a given movie, the more likely it is that the next viewer will choose that one as well.]

6. ### Jerry Friedman said,

July 6, 2012 @ 9:35 am

Are tenured professors fermions, since in the short term there's a fixed number of jobs and only one professor can occupy each?

Of course, we have to leave out the spin-statistics connection, since MYL doesn't share his chair with a spin-down occupant.

7. ### KevinM said,

July 6, 2012 @ 12:32 pm

8. ### Gene Callahan said,

July 6, 2012 @ 1:38 pm

"since (for instance) the distribution of population over cities follows a Zipf's-law type of distribution"

Check out England in 1700: just wildly off from Zipf's law. (The second-largest city was about 10 times smaller than it ought to be.) I recall finding a number of cases like this, but the others escape me at the moment.

9. ### Jerry Friedman said,

July 6, 2012 @ 2:06 pm

@Daniel Johnson: Actually, De Vany says, "In the dynamics leading to the Bose–Einstein distribution movie customers sequentially select movies and the probability that a given customer selects a particular movie is proportional to the fraction of all the previous moviegoers who selected that movie."

10. ### Daniel Johnson said,

July 6, 2012 @ 2:12 pm

You are correct, I misread De Vany. The Bose-Einstein is the distribution that will result from repeated bayesian trials where a movie goer bases their judgement of a movie's quality on the current attendance figures. Thus according to De Vany, the distribution of seats sold between the current movies in release will follow Bose-Einstein as long as there is no real a-priori rhyme or reason as to which movies will suceed. Some consolation to those professors whose classes end up getting dropped for lack of students (thus bringing the topic back around).

[(myl) This sort of argument is always very tricky. It shows that the observed pattern of numbers of students in classes, or viewers in movies, or residents in cities, or whatever, has the form expected if the only thing going on were a Dirichlet Urn Model, in which the rich get randomly richer, with (as you say) "no real a-priori rhyme or reason".

But that's not the same as showing that in fact no such rhyme or reason exists — you might instead have a model in which the a-priori attractiveness of movies or courses is assigned by the same sort of random process, and attendance then stochastically follows attractiveness. Or you could have some sort of mixed model, in which there is one process for distributing intrinsic attractiveness, and then a second one for distributing customers, with the second one dependent both on the product's attractiveness and its popularity.

Such models are generally more plausible in those cases where we have some independent knowledge of the situation — thus the growth of cities is clearly determined in part by geography and politics and so on; enrollment in courses is partly determined by requirements and teaching quality and grading difficulty; and so on.

There's a similar argument to be made about Duncan Watts' "Accidental Influentials" theory: it's true that social network have the statistics you'd expect if they grew by random rich-get-richer accretion — but this doesn't mean that the behavior of the nodes in network has no effect on the process. See Alexy Khrabrov, "Mind Economy: Dynamic Graph Analysis of Communications", Penn PhD thesis 2011, for a demonstration that traditional ideas of "social capital" do apply to the dynamics of social networks.]

11. ### Daniel Johnson said,

July 6, 2012 @ 3:48 pm

in De Vany's case, he does not depend on that argument but rather spends much of the rest of the book directly showing that the traditional factors, e.g. budget, actors, directors, critics, are at best very weak predictors of movie success.

Certainly in the case of graduate school, I found that course overlap with upcoming qualifying exams appeared to be a rather good predictor of course attendance.

12. ### Victor Mair said,

July 6, 2012 @ 8:08 pm

By coincidence, I just now received this note from a friend in Iowa, which makes reference to "boson" in a very different, but highly topical, context:

====

I’m sure you have seen references to the discovery of the Higgs boson particle by physicists working at the Large Hadron Collider in Europe. Today, I read an interesting thing about the super sensitivity of the energies they are trying to work with. The paths of the proton particle streams they send around the 17 miles of collider tunnel to collide with one another must be altered to account for the rising of a full moon. That is the power behind our concept of lunacy, after all.

http://www.msnbc.msn.com/id/47880473#.T_eJrnAmx5Q (see the box on the right side about Stephen Hawking being a bad gambler [topic of another, still more recent, LL post: http://languagelog.ldc.upenn.edu/nll/?p=4059%5D)

[(myl) The effect is not due to the energies that they're working with, as I understand it, but to the size of the accelerator ring. Crustal deformation due to lunar tides changes the effective diameter of the ring. See here for graphs and equations.]

13. ### D.O. said,

July 6, 2012 @ 8:18 pm

What if we turn the table (clearly, the phrase from gaming!)? Suppose students are consiencious beings with fixed preferences about classes, but faculty is just jockeying around with they courses created from pulling together random topics and without regard of students' needs or their collegues' offerings. We can end up with the same distribution. It is symmetric.

14. ### Sili said,

July 8, 2012 @ 12:45 pm

How do students know how many people are actually enrolled in a given course? Is it enough that they only know the sub-population of their friends and their choices?

15. ### the other Mark P said,

July 11, 2012 @ 5:00 pm

to account for the rising of a full moon

Ugh! I sure hope the people at CERN have a better grip on astronomy than this sentence.

A "full" moon is no closer than a "new" moon. Only our angle of viewing it makes more light reflect off it. The moon does vary in distance, but not directly related to its fullness.

Sorry, to go off-topic but some things are so wrong they shouldn't be allowed to stand.