Collaborative post by John Kingston and Chris Potts
Newspaper stories about the financial markets often contain quantitative information that is intepretable only by experts. The headline screams "Dow Up 200!", but what does that mean? In some contexts (say, apartment rentals), 200 is a lot. In others (e.g., houses prices), it is hardly anything at all. Similiarly, what is a 3% change like? Sometimes we're asked to shrug off 3% differences as irrelevant (think of polling data). For the markets, though, most of us have the sense that 3% is a big deal.
The headlines do contain some information that all of us have intuitions about: the verbs and other predicates that describe the change. We know that rise says that the change was upwards, and we can intuitively juxtapose it with soar, which suggests really dramatic upward change. Conversely, fall and plummet describe motion in the downward direction, with the second implying much worse news than the first.
So much for our linguistic intuitions. Do they square with the way newspaper headline writers use these predicates in describing financial markets? This is much less clear. As part of our Data Rich Humanities project, sponsored by UMass Amherst CHFA, we have been exploring this question using the collection of 23,327 NY Times financial headlines described in this earlier post.
As an initial experiment, we pulled all the headlines that match the pattern Dow PRED N (point(s)/percent/%), where PRED is a verb/predicate and N is a numerical value. Where the unit is not mentioned, points seems always to be understood. Headlines writers have a strong propensity for reporting changes in terms of points, which are interpretable only if one knows the size of the market at the time of reporting. However, percentage changes can involve small enough numbers that they might raise other issues. So we can assume that, whatever the numerical value, most readers look to the words in the headline to figure out what happened. This makes those words vitally important to accurate reporting.
We converted all point values to percentage values, using the opening price of the market on the day the headline was written (or as close as we could come, for days when we didn't have the data). This put all the data on a uniform scale, and it removed the time-dependence of point values. (200 points was a much bigger deal twenty years ago than it is today. Percentage changes have a more constant impact.)
The results are generally encouraging; it looks as though readers of the Times can basically trust their intuitions about the relevant predicates in forming an opinion about how serious the financial news is. Figure 1 provides histograms for four predicates: rise, fall, soar, and plunge.
|Figure 1 (click to enlarge)|
Intuitively, rise and fall are moderate change verbs. They are (generally) used when the market change was below 1%. After that, their superlative counterparts (usually) take over. In the terms of linguistic pragmatics, rise conversationally implicates not soar: soaring counts as rising, but it is misleading to use rise for dramatic changes. Some misleading instances aside, headline writers seem to be adhering to this pragmatics. The same can be said for the fall-plunge pair.
The histograms are suggestive, but we think they are not the ideal tool, especially since our data have some notable outliers and gaps that probably trace to data-sparseness. The boxplots in figure 2 seem superior:
|Figure 2 (click to enlarge)|
This is an information-rich visualization. It is worth pointing out various of its features, since they point uniformly to the idea that headline writers use these predicates reliably:
- The most important data-point is the dark horizontal line in each plot. This is the median value. The values for rise and fall are close: 0.53 for rise and 0.63 for fall. In contrast, the median values for soar and plunge are higher: 1.5 and 1.9, respectively.
- We can also compare the boxed areas. These delimit the values within the first and third quartiles — the core of the data. For each of the rise/soar and fall/plunge pairs, these areas do not overlap, a strong indication that their median values are genuinely different.
- The downward whisker of soar (whose length extends 1.5 times the 25-50% inter-quartile range or to the minimum) does not overlap with the upper quartile (75%) of rise. The downward whisker for plunge just barely overlaps with the upper quartile of fall.
All these distributional features point to the conclusion that headline writers, like the rest of us, generally use rise and fall for modest changes, reserving their more picturesque counterparts for extreme changes in the market.
Of course, we do not mean to say that the language is never misleading:
(Our thanks to Chris Davis for useful suggestions about how to present these numbers.)