Language Log

One law to rule them all?

June 2, 2019 @ 6:50 am · Filed by Mark Liberman under Computational linguistics

Power-law distributions seem to be everywhere, and not just in word-counts and whale whistles. Most people know that Vilfredo Pareto found them in the distribution of wealth, two or three decades before Udny Yule showed that stochastic processes like those in evolution lead to such distributions, and George Kingsley Zipf found his eponymous law in word frequencies. Since then, power-law distributions have been found all over the place — Wikipedia lists

… the sizes of craters on the moon and of solar flares, the foraging pattern of various species, the sizes of activity patterns of neuronal populations, the frequencies of words in most languages, frequencies of family names, the species richness in clades of organisms, the sizes of power outages, criminal charges per convict, volcanic eruptions, human judgements of stimulus intensity …

My personal favorite is the noises it makes when you crumple something up, as discussed by Eric Kramer and Alexander Lobkovsky, "Universal Power Law in the Noise from a Crumpled Elastic Sheet", 1995 ) referenced in "Zipf and the general theory of wrinkling", 11/15/2003).

Contradicting the Central Limit Theorem's implications for what is "normal", power law distributions seem to be everywhere you look.

Or maybe not?

Many of the alleged "power-law" examples are actually log-normal, or some other heavy-tailed distribution, according to a paper by Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman, "Power-law distributions in empirical data" (SIAM Review 2009). As an alternative to the paper, you can read Cosma's blog post "So You Think You Have a Power Law — Well Isn't That Special?", 6/15/2007; or this summary of the results in "Cozy Catastrophes", 2/15/2012:

In our paper, we looked at 24 quantities which people claimed showed power law distributions. Of these, there were seven cases where we could flat-out reject a power law, without even having to consider an alternative, because the departures of the actual distribution from even the best-fitting power law was much too large to be explained away as fluctuations. (One of the wonderful thing about a stochastic model is that it tells you how big its own errors should be.) In contrast, there was only one data set where we could rule out the log-normal distribution. […]

We found exactly one case where the statistical evidence for the power-law was "good", meaning that "the power law is a good fit and that none of the alternatives considered is plausible", which was Zipf's law of word frequency distributions. We were of course aware that when people claim there are power laws, they usually only mean that the tail follows a power law. This is why all these comparisons were about how well the different distributions fit the tail, excluding the body of the data. We even selected where "the tail" begins to maximize the fit to a power law for each case. Even so, there was just this one case where the data compelling support a power law tail.

Links to Cosma's other posts on the topic can be found in "Power laws and other heavy-tailed distributions", and a recent discussion can be found in "Power laws", 5/30/2018.

It's worth noting that there are many random processes that can be shown mathematically to produce power-law distributions, at least in the tails — as Cosma puts it,

there turn out to be nine and sixty ways of constructing power laws, and every single one of them is right, in that it does indeed produce a power law. Power laws turn out to result from a kind of central limit theorem for multiplicative growth processes, […].

So in a way it's surprising that so many of the power-law claims turn out to be bogus or at least doubtful.

And what about the claims of power-law distributions in the vocalizations of whales, dolphins, and other animals? I'm not sure. But the fact that the key review paper doesn't list Clauset et al. among its hundreds of references, and that none of the relevant papers seems to apply the tests described in Clauset et al., or to offer links to their underlying data, makes me suspicious. (If any readers know of papers that apply the needed tests, or offer datasets suitable for checking the claims, please let me know.)

And as a footnote: The nature of processes that generate power-law distributions was the topic of what might be the most unpleasant debate in the history of mathematical modeling. This battle took place between 1955 and 1961, and the combatants were Herbert Simon and Benoit Mandelbrot. See "The long tail of religious studies" for links and details.

Update — see also Heathcote, Brown, and Mewhort, "The power law repealed: The case for an exponential law of practice", Psychonomic Bulletin & Review 2000:

The power function is treated as the law relating response time to practice trials. However, the evidence for a power law is flawed, because it is based on averaged data. We report a survey that assessed the form of the practice function for individual learners and learning conditions in paradigms that have shaped theories of skill acquisition. We fit power and exponential functions to 40 sets of data representing 7,910 learning series from 475 subjects in 24 experiments. The exponential function fit better than the power function in all the unaveraged data sets. Averaging produced a bias in favor of the power function.

And also Michael Ramscar, "Source codes in human communication", preprint 3/22/2019.

Update #2 — see also Steven Piantadosi, "Zipf’s word frequency law in natural language: A critical review and future directions", Psychom. Bull. Rev. 2014.

Update #3 — And if you're still with us, try "The long tail: In which Gauss is not mocked, but TWiTs (and dictionaries) are", 12/2/2005.

June 2, 2019 @ 6:50 am · Filed by Mark Liberman under Computational linguistics

Permalink

4 Comments

Gregory Kusnick said,

June 2, 2019 @ 10:40 am

power-law distributions have been found all over the place — Wikipedia lists

For a moment you had me convinced that Wikipedia's "List of X" pages obey a power-law size distribution.
Tim Rowe said,

June 2, 2019 @ 3:20 pm

The prevalence of power-law distributions (or distributions that might be mistaken for them) in no way by contradicts the central limit theorem. The central limit theorem is a mathematically proven fact (we spent a whole term at university driving that proof from first principles), but it relates specifically to the combination of *independent* distributions. In the case of the supposed power-law distributions, the contributing distributions are not independent.

[(myl) This is not transparently true of many of the "nine and sixty" processes that generate results with power-law distributions, for example the rank-to-frequency relation of "words" generated from sequences of randomly-chosen characters, one of which is a word-delimiter.

It's true that there are ways to exempt such processes from the preconditions for the Central Limit Theorem, which of course remains true when its preconditions are met. But these cases are just one of many problems with the ways in which the Central Limit Theorem, and the assumption that Gaussian distributions are "normal", are over-applied — see Tukey on Exploratory Data Analysis.

But I agree — "Gauss is not mocked"…]
Mark Dominus said,

June 3, 2019 @ 2:16 am

I enjoyed this paper on the history and mathematics
of confusions between power law and log-normal distributions:

Michael Mitzenmacher /
A Brief History of Generative Models for Power Law and Lognormal Distributions /
Internet Mathematics, vol 1, No. 2, pp. 226-251, 2004.

http://www.eecs.harvard.edu/~michaelm/postscripts/im2004a.pdf

"The argument over whether a lognormal or power law distribution is a better fit for some empirically observed distribution has been repeated across many fields over many years. For example, the question of whether income distribution follows a lognormal or power law distribution also dates back to at least the 1950s."
Andrew Usher said,

June 5, 2019 @ 7:29 am

Given the CLT, we should _always_ assume normal or log-normal distributions until proved otherwise. (Can something in between the two be defined mathematically?)

The power-law distributions are only ever approximations, and as such are only useful if they _extrapolate_ better than the log-normal. As thie article found, that is hhradly ever the case – and, as one might expect, word frequencies don't obey the CLT condition of being a composite of independent factors.

k_over_hbarc at yahoo.com

RSS feed for comments on this post

One law to rule them all?

4 Comments

Gregory Kusnick said,

Tim Rowe said,

Mark Dominus said,

Andrew Usher said,

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta