The robot army

« previous post | next post »

Randall Stross, "When the Software Is the Sportswriter", NYT 11/27/2010:

ONLY human writers can distill a heap of sports statistics into a compelling story. Or so we human writers like to think.

StatSheet, a Durham, N.C., company that serves up sports statistics in monster-size portions, thinks otherwise. The company, with nine employees, is working to endow software with the ability to turn game statistics into articles about college basketball games.

Established in 2007, StatSheet.com provides statistical analysis of college football and basketball, Nascar and other sports. It dices data in more ways than any fan could possibly absorb. But charts, graphs and rankings alone cannot replace words that tell a story. We humans love stories; a craving for narrative seems part of our nature.

This month, StatSheet unveiled StatSheet Network, made up of separate Web sites for each of the 345 N.C.A.A. Division I men’s basketball teams. Beyond statistics galore, each site has what the company calls “automated content,” stories written entirely by software, including write-ups of the team’s games, past and future. With a joking wink, StatSheet’s founder, Robbie Allen, refers to these sites as the “Robot Army.”

The idea is an old one (e.g. Stephan Kerpedjiev, "Automatic Generation of Multimodal Weather Reports from Datasets", ANLP 1992) but this iteration seems to be quite well done. You can sample the goods at statsheet.com. The page for Penn's basketball team, quakerball.com, is a quick and easy source of information that I may visit from time to time when I've missed a game in person or via other media.   I'll be really impressed when they can generate a convincing real-time play-by-play report from the unfolding box score, as radio baseball announcers apparently did in the 1920s.



18 Comments

  1. Evan Harper said,

    November 28, 2010 @ 12:40 pm

    I'll be really impressed when they can generate a convincing real-time play-by-play report from the unfolding box score, as radio baseball announcers apparently did in the 1920s.

    Ronald Reagan was one of the announcers who did this. Once the wire stopped transmitting reports, and he improvised a series of foul balls and game delays until it picked up again, whereupon he gradually caught himself up to real-time.

    [(myl) I've heard this story as well. However, when I went to look for sources, while writing this post, I couldn't find any that went beyond a brief assertion, either un-referenced or referred to another recent and brief assertion. Reagan's own account describes an audition in Davenport in which he was asked to pretend to broadcast a made-up game; but nothing about broadcasting from box scores in his (later) job in Des Moines. And his work there was in the mid-1930s, by which time I gather that live sports broadcasting had become the norm. Can you cite a primary (or otherwise reliable) source that validates my former impression?

    Given President Reagan's ability in later years to (as Mark Twain put it) "remember anything, whether it happened or not", I'm not sure that we can rely on stories that he told in the 1970s or 1980s about what happened in the 1930s.]

  2. Evan Harper said,

    November 28, 2010 @ 12:58 pm

    myl: I would definitely go with your findings above my hazy recollections cited to random webpages.

  3. Twitter Trackbacks for Language Log » The robot army [upenn.edu] on Topsy.com said,

    November 28, 2010 @ 1:10 pm

    […] Language Log » The robot army languagelog.ldc.upenn.edu/nll/?p=2809 – view page – cached November 28, 2010 @ 12:16 pm · Filed by Mark Liberman under Computational Tweets about this link […]

  4. fev said,

    November 28, 2010 @ 1:24 pm

    If you mean re-creating the game in the studio from a batter-by-batter wire feed, I can remember it from the late 1960s. AAA Braves road games on WRNL were preceded by a brief mention that they were wire re-creations (don't remember the exact wording).

    There's a brief touch of it in "Bull Durham" as well.

    [(myl) I was prepared to consign "Bull Durham" to the same historical status as some of the stories about Ronald Reagan. But your personal recollections are another matter — I'll shift the Reagan-story status from "probably false" to "probably true" in my mental truth-maintenance system, on the grounds that he did broadcast Cubs games for ~5 years in an era when (on your account) radio broadcasts may routinely have been re-created in the studio. Still, it would be nice to have some better documentation, both in general and for his specific case.]

  5. Spectre-7 said,

    November 28, 2010 @ 1:56 pm

    @myl

    Your link in this passage—

    I'll be really impressed when they can generate a convincing real-time play-by-play report from the unfolding box score, as radio baseball announcers apparently did in the 1920s.

    —is currently pointing to statsheet.com, which I suspect is an error. I was rather looking forward to reading about radio baseball announcers in the '20s, so color me a little disappointed.

    [(myl) Sorry — fixed now. But you may still be disappointed, as the link just goes to a section of a Wikipedia article about "Major League Baseball on the radio".]

  6. Faldone said,

    November 28, 2010 @ 2:13 pm

    It might be worthwhile seeing if Reagan ever played a sports announcer in a movie Seems like a lot if his memories were from films he had done.

  7. Randy Hudson said,

    November 28, 2010 @ 3:35 pm

    "A huge issue for Pennsylvania was their 1 steals, coming in way below their 4.4 season average." — Those robots have to study a bit harder for their Turing test.

  8. fev said,

    November 28, 2010 @ 5:23 pm

    Reagan played a radio reporter in his first movie, "Love is in the Air" (1937) — busted down to running a children's show because his reporting annoyed the gangster-friendly station owner, the IMDB plot summary suggests.

    That would have been after his radio career. By one 1980 account, he was in California for spring training ("he annually went to the Cubs' training camp in Catalina Island off the California coast to study mannerisms, appearances, batting stances, etc., for use in his re-creations") when he got his tryout with Warner.

  9. David Barry said,

    November 28, 2010 @ 6:38 pm

    Fascinating to learn about the old baseball commentary based on the telegraph wire. In cricket these were called 'synthetic' broadcasts. The most famous of these came in 1938, where the main caller in Sydney would hit his desk with his pen to create the sound of bat on ball in England.

  10. pm said,

    November 28, 2010 @ 6:59 pm

    You are correct that automated production of news story content is an old one. I saw a demonstration of the system which IBM created for the online service provider Prodigy back in 1991, a system which generated news stories and related graphics from Reuters news feeds and stock price feeds.

  11. Faldone said,

    November 28, 2010 @ 7:07 pm

    My wife worked in commodities reporting in the '80s and she would typically have the article almost completely written before the final numbers came in. She only needed to fill in the blank spaces before filing the article.

  12. fev said,

    November 28, 2010 @ 8:08 pm

    Back when extreme repertorial high tech was a Trash-80 with rubber cups for the telephone receiver, one of my evil rimrat friends wrote a program for composing the basic house fire story. Needless to say, it had a prompt for entering the dog's name.

  13. Stephen Nicholson said,

    November 28, 2010 @ 11:28 pm

    On the subject of Reagan making-up games, there is a mini-series called Cronkite Remembers where, IIRC, Walter Cronkite admits he did the same thing for football games. Cronkite said that the told the story often, but seemed skeptical of Reagan's story.

  14. Breen Mullins said,

    November 29, 2010 @ 10:20 am

    I heard a recreation in 1962. The Giants finished their last game of the season in an afternoon game. Russ Hodges and Lon Simmons told us that they'd be back in the evening to call the Dodgers game, which would determine the winner. It wasn't until years later that I learned how they'd done it. At 8 years old I'd imagined a very quick flight to L.A.

  15. Kenny Easwaran said,

    November 29, 2010 @ 7:19 pm

    I've often seen newspaper articles that look like they were produced in this way. I'd much rather see the single chart or graph than read someone's "narrative" of what the numbers tell, especially given how common it is for the narrator to stress aspects of the numbers that I think are much less significant or interesting than some others.

  16. Rob Young said,

    November 29, 2010 @ 9:00 pm

    In 1934 broadcasters in Australia created a 'live' cricket match, broadcast from cables received from England. This included providing sound effects as needed. And lots of talking to cover the gaps between cables.

    A part of the broadcast is still available here:

    http://www.abc.net.au/science/slab/2bl/cricket.htm

  17. Chad Nilep said,

    November 30, 2010 @ 12:51 am

    Sorry to drift this thread from talk about Ronald Reagan toward statsheet.com and automatic text generation, but I've just read their Blue Devil Daily, and I must say that that text is incredibly impressive. I was expecting clunky, Chinese-menu type text generation, but the actual text seems so smooth and so appropriate to the statistical context that I wouldn't have pegged it as automated. Undergraduate writing, perhaps, but not computer-generated.

  18. Joshua said,

    November 30, 2010 @ 2:36 am

    According to this Sports Illustrated article, KSSK radio in Honolulu was still airing re-created games of the local minor league baseball team as late as 1981.

RSS feed for comments on this post