Automatic measurement of media bias

« previous post | next post »

Mediate Metrics ("Objectively Measuring Media Bias") explains that

Based in Wheaton, IL, Mediate Metrics LLC is a privately held start-up founded by technology veteran and entrepreneur Barry Hardek. Our goal is to cultivate knowledgeable consumers of political news by objectively measuring media “slant” — news which contains either an embedded statements of bias (opinion) or an elements of editorial influence (factual content that reflects positively or negatively on U.S. political parties).

Mediate Metrics’ core technology is based on a custom machine classifier designed specifically for this application, and developed based on social science best practices with recognized leaders in the field of text analysis. Today,  text mining systems are primarily used as general purpose marketing tools for extracting insights from platforms such as like Twitter and Facebook, or from other large electronic databases. In contrast, the Mediate Metrics classifier was specifically devised to identify statements of bias (opinions) and influence (facts that reflects positively or negatively) on U.S. political parties from news program transcripts.

(The links to Wikipedia articles on "social science" and "text mining" are original to their page.)

Based on 25 years of experience with the DARPA and NIST programs that have developed and tested the text analysis techniques that they reference, I can testify that  "best practices" in this area would require three things:

(1) A detailed definition of what is meant by "statements of bias (opinions) and influence (facts that reflects positively or negatively)", including a clear definition of how media streams should be coded for these statements of bias or influence.

(2) Documentation of the inter-annotator agreement resulting from independent coding of media streams (text, audio, video) by several coders trained in applying the definitions in (1);

(3) Documentation of the performance that the automated techniques trained or designed to imitate human annotation of the coding defined in (1), typically in terms of precision/recall metrics or DET curves relative to human annotation.

In this particular case, it would also be important for (at least an adequate sample of) the human annotation to be done "blind", i.e. without knowledge of the source, to check that annotators' prior opinions of various sources are not biasing their coding of indices of bias.

Ideally, steps (2) and (3) would be validated by a trusted third party, to ensure that the system designers are not testing on the training material or cheating in various other ways. (This is not mainly because of concern for overt dishonesty, though overt dishonesty is not unknown; it's because, to paraphrase Richard Feynman, the easiest person to fool is yourself.)

The rest of the Mediate Metrics' "about" page leaves it unclear how much of this they've done:

Advancements in computer science and natural language processing (NLP) have ushered in an era of machine-assisted analysis. Large scale text analytics techniques have been redefined by synthesizing  human judgments to systematically extract insights from unstructured data sets. Experts in the field of text analytics have also applied academic and industry breakthroughs to develop friendlier, collaborative NLP software systems.  Researchers now have the power to incorporate structured human decision-making into custom software classifiers, increasing the accuracy, efficiency, and quality of content analysis. Employing these techniques allows Mediate Metrics to quickly analyze a wide range of programming, and to continually refine and adapt our political slant measurement platform for the issues of the day.

And the posts on their weblog in the "How it works" category are not any more explicit.  So pending more information, I'm going to assume that they've done something like the following:

(A) Devised a coding scheme for bias based on their own armchair intuitions, perhaps with some armchair intuitions from a political scientist or two;

(B) Assigned a few coders to create some training material based on (A), without checking inter-annotator agreement;

(C) Trained a classifier based on the results of (B), looked at a sample of the results to determine that it's doing something approximately sensible, and gone forward with product launch.

This guess is based not only on general skepticism and cynicism, but also on observation of other projects that have done similar things. This approach is a natural (though naive) response to the valid (but over-simplified) picture that they paint of recent advances in NLP techniques.

The biggest trouble with this approach is that the initial degree of inter-annotator agreement, depending on how you define it and measure it, is likely to be spectacularly low, say around 30%. That's not mainly because the judgments are inherently so individual that no inter-subjective agreement is possible (though individual differences are likely to be a serious problem in this case). It's mainly because even seemingly clear and well-defined text-analytic categories (like names of organizations or references to proteins) need a tedious iterative process of definition, comparison of independent annotation, discussion, extension or modification of the definitions, etc., in order to achieve a decent level of inter-annotator agreement.

If Mediate Metrics had been through this process, which can take several months of hard work, I'm betting that they'd be telling us about it. They should even be boasting about it, because surely it would be a big step forward to demonstrate that you can train human coders to agree on a precisely-defined and unbiased definition of bias.

A second trouble with the approach outlined in (A)-(C) has to do with the performance of the automatic algorithm trained to imitate the human annotations. The problem is not simply its accuracy, though of course we do care about how well it approximates the distribution of human judgments. Since we want to compare the "bias" estimates for different media sources, we need to know how many of what kind of errors it makes — misses and false alarms — and how those different error rates co-vary with other factors of interest. For instance, in this case we're interested in "bias" estimates for various media outlets, or for various individual columnists or commentators. But perhaps their program has a high false-alarm rate (that is, it often signals "bias" when human coders generally would not) when applied to Tom Friedman, because he happens to over-use certain words or turns of phrase that tended to be associated with bias judgments in its training data. And maybe it has a high miss rate (that is, it often fails to signal "bias" when most human coders would do so) when applied to Maureen Dowd, because her style was only sparsely represented in the training material. Or maybe the algorithm has a higher false-alarm rate on TV news transcripts than on newswire text.

So for us (well, for me) to believe in the results of their program, I'd want to see their coding manual, some documentation of the inter-annotator agreement of their coders, some description of the algorithms they use, and some documentation of their algorithms' error rates, both in general and across a sample of media outlets and individual writers.

I think that what Mediate Metrics is doing, or is claiming to do, is a good idea in principle. And I think that it's certainly possible to do it well in practice. But what they've told us about what they've done doesn't leave me very confident that their implementation of the idea is a trustworthy one.

Media (or meta-media) uptake is slim so far: "Measuring Media Bias…Is it Possible?", Inside Cable News 2/10/12.

[Tip of the hat to Linda Seebach]

Update — From Phil Resnik's highly-relevant post "#CompuPolitics", 1/18/2012, a highly-relevant quote from Noah Smith:

"The results are noisy, as are the results of polls. Opinion pollsters have learned to compensate for these distortions, while we're still trying to identify and understand the noise in our data."

Opinion pollsters have been refining their methodology for 70 or 80 years, with thousands of published studies about the effects of sampling and polling methods, wording and ordering of questions, etc., all validated against alternative designs and also against real-world voting patterns or sales figures or whatever.

In the case of estimates of media bias, the sources of real-world validation are going to be more subtle, and concerns about (statistical) bias and noise are at least as great. So I'm skeptical that metrics based entirely on trade-secret methodology are going to get much traction.



14 Comments

  1. Barry Hardek said,

    February 11, 2012 @ 9:35 am

    FYI, inter-coder reliability reached a peak of over 80% before this was released, and was regularly well above your estimated 30%. In fact, I don't think it has been that low since the early days of development.

    This HAS been in development for many months. The code book is substantial, and there were more adjudication sessions along the way than I care to remember. With that said, I come from the competitive world of business, and that makes me a little paranoid. I am not yet inclined to openly publish my cook book for all to see. Furthermore, there are many different outlets for this service, and the majority of them are not staffed by subject matter experts in the field of text analytics. To that point, the writer from Inside Cable News said that I was going "over his head" after a few short paragraphs of background information. Realizing this, and considering my own time constraints, I did not post (or "boast") about the gory details of our development program just yet.

    I am always interested in improving the system, and am still refining what I call my "quality control" processes. Although I have relied heavily on the guidance of experts, more are always welcomed.

    [(myl) I'm glad to hear that there's more behind this effort than is explained on the company's web site, and that you've been through the process of iterative refinement needed to get reliable coding. As for the issue of keeping the "cook book" secret, it seems to me that you face a dilemma: You can't really expect people to trust your results if your methodology is secret; but you do need to worry about being ripped off if your methodology is published. In the opinion polling and market research business, my understanding is that this dilemma is (generally) solved by making methodologies public, and competing on implementation, reputation, and specific targeting.

    As your company's web page indicates, the general techniques for "text analytics" are not a mystery. If your annotation specifications became the standard in this area, you would reap significant PR value. And training annotators to do an accurate and efficient job of coding such complex concepts and relations is not a trivial thing to do, so there's a competitive advantage to be gained there as well.]

  2. Barry Hardek said,

    February 11, 2012 @ 11:04 am

    All valid comments, and food for thought.

    I've considered the items you mentioned to varying degrees, and knew that fuller disclosure would eventually be required. At this early stage, however, my objectives are to: a.) Determine what the general market interest is; b.) Obtain feedback from curious generalists, and; c.) Gather critical assessments from experts in the field.

    To that last point, thank you very much for your input.

  3. D said,

    February 11, 2012 @ 11:06 am

    What is bias anyway? If 70% of a journalist's articles reflect negatively on party X, and the other 30% on party Y, is that biased? What if party X actually are more dishonest?

    And this line: "…and influence (facts that reflects positively or negatively) on U.S. political parties…". Aren't these facts relevant?

    The recent Politifact Lie of the Year controversy, and the "should the New York Time correct false statements by politicians in news articles" discussion are good examples of how hard it can be to classify bias objectively.

  4. D.O. said,

    February 11, 2012 @ 11:57 am

    It has nothing to do with language, but I am a bit curios were is a bussiness value in this project. Say we have a system that shows that NYT's opinions are more aligned with Democratic party agenda then WSJ's or that Paul Krugman is more liberal than Charles Krauthammer. What is the selling pitch?

  5. Christian Waugh said,

    February 11, 2012 @ 12:37 pm

    I think the author might be coming at this a little too "academically." There is no dilemma for the company. Their results will speak for themselves. Customers who don't like them will not return, just as in polling. I think that as a scientific matter there is a better way of framing the enquiry. Who is best able to minimize bias and, lol, who is best able to maximize it while still being considered a source of news?

  6. HP said,

    February 11, 2012 @ 1:18 pm

    Doesn't the whole premise rely on the assumption that there's such a thing as an unbiased text? Is that even theoretically possible? I'd really like to one of those.

  7. John said,

    February 11, 2012 @ 2:49 pm

    What about sarcasm and innuendo? Seems hard to do. What about 'he said, she said' journalism? That might appear unbiased but by giving equal time can be quite biased. This is a hard problem…

  8. linda seebach said,

    February 11, 2012 @ 10:36 pm

    I should add that I attributed the link to Romenesko.

    Also, as an editorial writer for a number of years, I wonder how one evaluates "bias" in something that's supposed to be opinionated? People do complain about biased editorials, but they're missing the point.

  9. Kenny said,

    February 12, 2012 @ 1:10 am

    Although I think everyone or at least most people share a common understanding of bias as inappropriately favoring someone or something (same with slant), it seems that the company might be trying to recast what it does as identifying evidence of preferences for one side or another.

    If "bias" is taken that way, finding it doesn't have to mean something inherently bad about those who produced it. It can reveal whether a source is accomplishing its own goals, whether those goals are just to "present the facts" and avoid characterizing any of the actors or events, to provide commentary after a "balanced" presentation of fact, or just to provide opinion. I'm not sure if the algorithms could do this, but maybe it could even point to something that wasn't realized to be "biased" (in addition, course, to being wrong about some material). Though I suppose the technologies could also be used to verify whether you are flying your bias under the radar.

    In any case, if you know what was found, you can work out the theoretical questions of whether it was good or bad for yourself.

  10. Jerry Friedman said,

    February 12, 2012 @ 12:26 pm

    @linda seebach: If I complained about a "biased editorial", I'd mean that it ignored facts and valid arguments in order to favor some position. The point of opinion pieces isn't to be dishonest—at least in my opinion.

  11. Barry Hardek said,

    February 12, 2012 @ 5:08 pm

    Responding to some of the comments made here:

    • Objectively measuring political news bias (or “slant” as we prefer to call it, of which bias is a subset) is indeed a difficult endeavor. Still, as is often the case in the realm of text mining, there is a wealth of content to analyze, and one can find indicative needles-in-the-haystack with disciplined processes and strict guidelines.

    • From my perspective, Op-Ed news content is absolutely valid, as long as viewers are aware that the content they are watching is indeed that. Frankly, I think that boundary between opinion pieces and straight news is often blurry for the general public. News wonks know the difference intuitively, but I cannot tell you how many times I have had conversations with uninitiated viewers in which they proudly state that, “The only news program I watch is {INSERT YOUR CHOICE OF >>> The O’Reilly Factor/The Rachel Maddow Show/etc.}.” Furthermore, straight news programming often contains a subtle-but-consistent political tilt, despite claims to the contrary.

    The fact is that TV news programs, regardless of type, often frame the core political messaging that resonates throughout society, translating into voting behavior and government policies that dramatically affect our daily life. That being the case, don’t you think an object entity should endeavor to “watch the watchers” in order to serve the greater good?

    I know that may sound pretentious. I just don’t know how else to say it.

    • As far as business value, don’t you think that:
    * Certain news outlets and programs would find benefit from having an objective party validate that they are, in fact, fair and balanced?
    * Watchdog groups would gain from insight reports on the explicit and implicit political views of prominent news anchors, correspondents, and contributors?
    * Networks, news analysts, and interest groups would value secondary news slant studies on specific topics such as health care, labor/union issues, military spending, right-to-life, tax reform, regulatory measures, etc.?
    I do.

  12. Ginger Yellow said,

    February 13, 2012 @ 11:53 am

    As far as business value, don’t you think that:
    * Certain news outlets and programs would find benefit from having an objective party validate that they are, in fact, fair and balanced?

    Not really. The people who think they are biased aren't going to be convinced by data, especially data produced by a proprietary methodology. Plus, from a personal perspective as a British journalist, I'm not so into the whole impartial journalism ideal. My ideal is fealty to the truth, not to balance.


    * Watchdog groups would gain from insight reports on the explicit and implicit political views of prominent news anchors, correspondents, and contributors?

    Can't they work this out themselves? They are, after all, supposed to be watchdogs. Everyone knows the biases of prominent writers. You don't need an algorithm to tell you that, say, Matt Taibbi doesn't like Goldman Sachs or banks in general.
    Detecting editorial bias isn't particularly hard for educated humans to do, though proving it in some quantifiable way is (but I'm not sure why you'd want to). Paying for an automated process that tries to simulate human bias recognition seems a bit odd. I could potentially see some value in covering large numbers of non-prominent writers/pundits, but not a whole lot. In fact, it would seem to be most valuable outside the realm of politics, in the commercial sphere – companies identifying if they or their products are being badmouthed in the local press. But I'm pretty sure there are services that already do that.


    * Networks, news analysts, and interest groups would value secondary news slant studies on specific topics such as health care, labor/union issues, military spending, right-to-life, tax reform, regulatory measures, etc.?
    I do.

    Do you? Why? So you can shout "Bias!" at a news source? What does that achieve?

    Personally, I find the discourse about journalism in the US is unhealthily obsessed with the question of bias and doesn't pay enough attention to accuracy and informing the reader.

  13. Jon Weinberg said,

    February 13, 2012 @ 3:52 pm

    I'm completely mystified by the entire enterprise. If there's anything that we learned from the U.S. Federal Communications Commission's decades-long experience with the now-repealed "fairness doctrine," it's what HP said: An evaluator can code a statement as biased only with reference to a pre-existing neutral position. Since there is no a priori neutral position, there's no way to identify "bias" that transcends the substantive positions of the firm doing the evaluating.

  14. Jason said,

    February 14, 2012 @ 7:26 pm

    I don't think that there's anything intrinsically wrong with, say, mining texts for the use of key words and phrases associated with various world-views — indeed, that's just bog-standard content analysis. If someone uses the phrase "Israel Firsters", for example, they are clearly of the left, and if someone mutters darkly about "New York Elites", they are clearly of the right — and ironically both intending to refer to the same people in both cases. What I have a problem with is in calling this metric "objective" — it clearly requires the subjective intervention of coders who understand the partisan environment such code words exist in — and "bias", to which no non-subjective definition plausibly exists. "Bias" often means simply "this outlet refuses to adopt my preferred frame", whatever that frame might be.

RSS feed for comments on this post