- Website: http://umiacs.umd.edu/~resnik/
- Philip Resnik is a professor at the University of Maryland, holding joint appointments in the Department of Linguistics and at the Institute for Advanced Computer Studies. He received his bachelor's degree in Computer Science at Harvard in 1987, and his Ph.D. in Computer and Information Science at the University of Pennsylvania in 1993, and has worked in industry R&D at Bolt Beranek and Newman, IBM T.J. Watson Research Center, and Sun Microsystems Laboratories. Dr. Resnik's research focuses on combining knowledge-based and statistical methods for natural language processing, with applications in machine translation, translation crowdsourcing, and computational social science. His current work is supported by NSF, DARPA, IARPA, ARL, and a Google Research Award. Outside academia, he serves as strategic technology advisor for CodeRyte Inc., the nation's fastest growing provider of NLP solutions in healthcare, and as lead scientist for Converseon, a leading social media consultancy.
Posts by Philip Resnik:
- There is a difference between "terror" and "terrorism" (in the diplomatic/international relations world) and Obama did not say "terrorism".
- Obama used the words "acts of terror" but he was referring to the 9/11 attacks or acts of terror in general, not the Benghazi attack.
It struck me that there might be an interesting linguistic angle on one of the highlights (or lowlights, depending on your view) of the second presidential debate last Monday night: Candy Crowley fact checking Mitt Romney on the fly and telling him "He [Obama] did in fact, sir," refer to the embassy attack in Libya as an act of "terror" in a Rose Garden speech the day after the event. A brief, non-partisan description of the exchange, and the controversy about it, can be found in this video from comparative journalism site Newsy.com. Conservatives were furious, liberals delighted.
Looking around at the lively blogosphere discussion, I've found two potentially interesting linguistic aspects here:
On the "terror" vs. "terrorism" distinction, see this 2004 piece by Geoff Nunberg for a fascinating discussion of the shift from the latter to the former in the language of the Bush administration. The Newsy story above also mentions legal implications of the word "terrorism".
The second angle, and the crux of the matter, has to do with what Obama might or might not have been referring to when he used the phrase "acts of terror", and this seems like something about which linguists might have something useful to say. So here's a stab at it. Read the rest of this entry »
Read the rest of this entry »
Just a quick pointer to this fun post by Toma Tasovac, which discusses the removal of a term from WordNet, the best known and most widely used lexical database for English. Apparently DuPont, the huge chemical company, expressed displeasure about the entry for Teflon (oops, I mean TeflonTM), which did not indicate its status as a registered trademark.
Christiane Fellbaum's mail to the WN-USERS mailing list indicates that, although DuPont had not yet actually requested removing the term, the WordNet folks "settled" by offering to do so as "the simplest solution". Tasovac suggests to DuPont that they follow up this clear success by following his generously contributed outline for setting up a Division for Lexicography, Trademark Enforcement and World Domination. He concludes, "I have three more killer tips for how to rule the world by means of lexicographic black magic, but they are patented and trademarked. I am willing to discuss business propositions with DuPont representatives in strictest confidence."
A couple of months ago, I pointed out that entertainment industry folks are tracking Justin Bieber's popularity using automated sentiment analysis, and I used that as a leaping-off point for some comments about language technology and social media. Here I am again, but suddenly it's not just Justin's bank account we're talking about, it's the future of the country.
As the Republican primary season marches along, a novel use of technology in politics is evolving even more rapidly, and arguably in a more interesting way, than the race itself: the analysis of social media to take the pulse of public opinion about candidates. In addition to simply tracking mentions about political candidates, people are starting to suggest that volume and sentiment analysis on tweets (and other social media, but Twitter is the poster child here) might produce useful information about people's viewpoints, or even predict the success of political campaigns. Indeed, it's been suggested that numbers derived from Twitter traffic might be better than polls, or at least better than pundits. (Is that much of a bar to set? Never mind.)
One of the random things I happened to notice yesterday, in a list of people who passed away in 2011, was the name of Leonard Stern, co-creator of Mad Libs. (Back in 2008, Arnold Zwicky marked the game's 50th anniversary here on Language Log.) For those who've never seen it, Mad Libs is a word game in which one player prompts a second player for a list of words — give me a noun; ok, now an adjective; ok, now another noun, etc. — where the kinds of words needed are determined by labeled blanks that are situated in a little story that only the first player can see. In the second step of the game, the two players read the story together with the words inserted in their proper positions. The very first Mad Libs gave the following as an example:
"_____________! he said ________ as he jumped into his convertible exclamation adverb ______ and drove off with his __________ wife." noun adjective
(Footnote: I've borrowed the example from the game's Wikipedia entry.)
Thinking about Mad Libs last night after a bedtime conversation with my six year old, I've concluded that someone really needs to design a linguistics course entirely around Mad Libs.
On November 7, publishers Reed Elsevier announced the passing of Pierre Vinken, former Reed Elsevier CEO and Chairman, at age 83. But to those of us in natural language processing, Mr. Vinken is 61 years old, now and forever.
Though I expect it was unknown to him, Mr. Vinken has been the most familiar of names in natural language processing circles for years, because he is the subject (in both senses, not to mention the inaugural bigram) of the very first sentence of the Wall Street Journal (WSJ) corpus:
Pierre Vinken, 61 years old, will join the board as a nonexecutive director Nov. 29.
But there's a fascinating little twist that most NLPers are probably not aware of. I certainly wasn't.
Forget Watson. Forget Siri. Forget even Twitterology in the New York Times (though make sure to read Ben Zimmer's article first). You know natural language processing has really hit the big time when it's featured in a story in Entertainment Weekly.
At the Revolutions blog, David Smith posts a nice little discussion about growth in jobs where people are making sense of data; he used job search site indeed.com to look at trends in job postings. Apparently postings involving "statistician" are not seeing a lot of growth, but "data scientists" have really started to catch on during the last year or so. (Hat tip to Joe Reisinger for tweeting this. He comments that data scientist is a "truly terrible name, but it's undeniably a different skill set: way too many statisticians can't code".) Read the rest of this entry »
Read the rest of this entry »
Looking at Geoff's post on machine-translated phishing scam messages, the message certainly does come across as very similar to the English output we in the biz frequently see coming out of statistical machine translation of Chinese. This includes Chinese-specific issues like recovering correct determiners from a language that does not express them overtly (I hope that the [not this] letter meets you in good spirits), as well as the ubiquitous phenomenon of sentences that are locally coherent — thanks to phrase-level translations and good statistical language-models for English — but globally nonsensical. I don't claim to know what makes a text poetic, but it seems to me that this combination of local coherence and larger-scale disconnectedness must be at least partly responsible for what Geoff describes as the "strange poetry" of machine translationese.
I've stolen the title of this post from the subject line of a message from Hal Daumé, who has invited folks at University of Maryland to a huge Jeopardy-watching party he's organizing tonight. Today is February 14, so for at least some of the audience, Jeopardy might indeed jeopardize Valentine's Day, substituting geeky fun (I use the term fondly) for candle-lit dinners.
In case you hadn't heard, the reason for the excitement, pizza parties, and so forth is that tonight's episode will, for the first time, feature a computer competing against human players — and not just any human players, but the two best known Jeopardy champions. This is stirring up a new round of popular discussion about artificial intelligence, as Mark noted a few days ago. Many in the media — not to mention IBM, whose computer is doing the playing — are happy to play up the "smartest machine on earth", dawn-of-a-new-age angle. Though, to be fair, David Ferrucci, the IBMer who came up with the idea of building a Jeopardy-playing computer and led the project, does point out quite responsibly that this is only one step on the way to true natural language understanding by machine (e.g. at one point in this promotional video).
Regardless of how the game turns out, it's true that tonight will be a great achievement for language technology. Though I would also argue that the achievement is as much in the choice of problem as in the technology itself.
This started out to be a short report on some cool, socially relevant crowdsourcing for Egyptian Arabic. Somehow it morphed into a set of musings about the
A statistical revolution in natural language processing (henceforth NLP) took place in the late 1980s up to the mid 90s or so. Knowledge based methods of the previous several decades were overtaken by data-driven statistical techniques, thanks to increases in computing power, better availability of data, and, perhaps most of all, the (largely DARPA-imposed) re-introduction of the natural language processing community to their colleagues doing speech recognition and machine learning.
There was another revolution that took place around the same time, though. When I started out in NLP, the big dream for language technology was centered on human-computer interaction: we'd be able to speak to our machines, in order to ask them questions and tell them what we wanted them to do. (My first job out of college involved a project where the goal was to take natural language queries, turn them into SQL, and pull the answers out of databases.) This idea has retained its appeal for some people, e.g., Bill Gates, but in the mid 1990s something truly changed the landscape, pushing that particular dream into the background: the Web made text important again. If the statistical revolution was about the methods, the Internet revolution was about the needs. All of a sudden there was a world of information out there, and we needed ways to locate relevant Web pages, to summarize, to translate, to ask questions and pinpoint the answers.
Fifteen years or so later, the next revolution is already well underway. Read the rest of this entry »
Read the rest of this entry »
Yesterday, on our way to school, my four-year-old commented, "When you love somebody, it can't be unloved. That's 'irreversible change'." I'm not sure which I appreciate more, the sweet sentiment (don't we all wish this were 100% true?), the generalization of a concept he learned on Sid the Science Kid, or the example of unloved in this unconventional usage.
Why do I find this so compelling? On reflection, perhaps it's because instead of the adjectival un- prefix (unhappy, unclear), which is about states, what we have here seems from context to be the verbal un-, which is about reversing actions (unlock, untie). Love as an action, something that effects a change of state, not just a state.
Or maybe I'm just in a sappy mood. :-)