Language Log

English Verb-Particle Constructions

July 26, 2017 @ 9:59 am· Filed by Spencer Caplan under Computational linguistics, Syntax

Lately I've been thinking about "optionality" as it relates to syntactic alternations. (In)famous cases include complementizer deletion ("I know that he is here" vs. "I know he is here") or embedded V2 in Scandinavian. For now let's consider the English verb-particle construction. The relative order of the particle and the object is "optional" in cases such as the following:

1a) "John picked up the book"
1b) "John picked the book up"

Either order is usually acceptable (with the exception of pronoun objects — although those too become acceptable under a focus reading…)

1c) "John put it back"
1d) *"John put back it"

Read the rest of this entry »

Permalink Comments (20)

Gender, conversation, and significance

July 26, 2017 @ 5:11 am· Filed by Mark Liberman under Computational linguistics

As I mentioned last month ("My summer", 6/22/2017), I'm spending six weeks in Pittsburgh at the at the 2017 Jelinek Summer Workshop on Speech and Language Technology (JSALT) , as part of a group whose theme is "Enhancement and Analysis of Conversational Speech".

One of the things that I've been exploring is simple models of who talks when — a sort of Biggish Data reprise of Sacks, Schegloff & Jefferson "A simplest systematics for the organization of turn-taking for conversation", Language 1974. A simple place to start is just the distribution of speech segment durations. And my first explorations of this first issue turned up a case that's relevant to yesterday's discussion of "significance".

Read the rest of this entry »

Permalink Comments (10)

Helpful Google

July 17, 2017 @ 5:44 pm· Filed by Mark Liberman under Computational linguistics, Humor

The marvels of modern natural language processing:

Michael Glazer, who sent in the example, wonders whether Google Translate has overdosed on old Boris and Natasha segments from Rocky and Bullwinkle:

Read the rest of this entry »

Permalink Comments (12)

Elephant semifics

July 11, 2017 @ 6:24 am· Filed by Mark Liberman under Computational linguistics, Elephant semifics

Smut Clyde at Riddled continues to generate Google Translate poetry:

Read the rest of this entry »

Permalink Comments (11)

Do STT systems have "intriguing properties"?

July 11, 2017 @ 4:55 am· Filed by Mark Liberman under Computational linguistics

In "Intriguing properties of neural networks" (2013), Christian Szegedy et al. point out that

… deep neural networks learn input-output mappings that are fairly discontinuous to a significant extent. We can cause the network to misclassify an image by applying a certain imperceptible perturbation…

For example:

Read the rest of this entry »

Permalink Comments (14)

"The eye of the needle … is being tried to be threaded…"

July 10, 2017 @ 10:23 am· Filed by Mark Liberman under Computational linguistics, Language and politics

Adam Cancryn, "Why a GOP senator from Trump country opposes the Senate health bill", Politico 7/9/2017:

“Collaborating with Democrats on the other side, to me, is not an exercise in futility,” Capito said, noting that she has spoken with Manchin and other Democrats about tackling health care together. “That may be where we end up, and so be it.”

Speculating further than that, she added, is premature. Senate Republicans could quickly strike a deal, pass a bill and follow through on their seven-year repeal pledge before the month is out.

“I think that remains to be seen,” Capito said. “That’s the eye of the needle, and I think it’s being tried to be threaded. But I’m not sure.”

Read the rest of this entry »

Permalink Comments (6)

Amazon Echo Silver

July 9, 2017 @ 11:24 am· Filed by Mark Liberman under Computational linguistics, Humor

Permalink Comments (13)

"… misdemeanor of the 115th Congress.”

June 26, 2017 @ 1:36 pm· Filed by Mark Liberman under Computational linguistics, Humor

David Crisp, "Gianforte: Congress’ newest misdemeanor", Last Best News 6/25/2017:

In case you were wondering whether Greg Gianforte will ever live down his body slam of a reporter for the Guardian, here’s a clue.

The Associated Press reported last week that Gianforte drew boos from the Republican side of the aisle during his brief speech following his swearing in as Montana’s representative in the U.S. House. The murmurs apparently had nothing to do with misdemeanor assault but came in response to Gianforte’s call to “drain the swamp” and for a bill denying pay to members of Congress if they fail to balance the budget.

But what’s really interesting is the C-SPAN transcript of Gianforte’s swearing in. The transcripts, according to a FAQ at the C-SPAN website, are drawn from the closed captioning that scrolls on the screen during sessions of Congress. The transcripts are included on the website to help visitors find the video they want, not to provide an accurate record of the actual speeches.

But they can nevertheless be revealing. On the tape, House Speaker Paul Ryan swears in Gianforte, then says, “Congratulations, you are now a member of the 115th Congress.” On the transcript, Ryan says, “Congratulations, you are now misdemeanor of the 115th Congress.”

Read the rest of this entry »

Permalink Comments (7)

"The Real Threat of AI"

June 26, 2017 @ 7:39 am· Filed by Mark Liberman under Computational linguistics

Kai-Fu Lee has an interesting opinion piece in yesterday's NYT: –"The Real Threat of Artificial Intelligence":

What worries you about the coming world of artificial intelligence?

Too often the answer to this question resembles the plot of a sci-fi thriller. People worry that developments in A.I. will bring about the “singularity” — that point in history when A.I. surpasses human intelligence, leading to an unimaginable revolution in human affairs. Or they wonder whether instead of our controlling artificial intelligence, it will control us, turning us, in effect, into cyborgs.

These are interesting issues to contemplate, but they are not pressing. They concern situations that may not arise for hundreds of years, if ever. […]

This doesn’t mean we have nothing to worry about. On the contrary, the A.I. products that now exist are improving faster than most people realize and promise to radically transform our world, not always for the better. They are only tools, not a competing form of intelligence. But they will reshape what work means and how wealth is created, leading to unprecedented economic inequalities and even altering the global balance of power.

Read the rest of this entry »

Permalink Comments (20)

"One big Donald Trump AIDS"

June 25, 2017 @ 9:34 am· Filed by Mark Liberman under Computational linguistics

As I've observed several times over the years, automatic speech recognition is getting better and better, to the point where some experts can plausibly advance claims of "achieving human parity". It's not hard to create material where humans still win, but in a lot of ordinary-life recordings, the machines do an excellent job.

Just like human listeners, computer ASR algorithms combine "bottom-up" information about the audio with "top-down" information about the context — both the local word-sequence context and various layers of broader context. In general, the machines are more dependent than humans are on the top-down information, in the sense that their performance on (even carefully-pronounced) jabberwocky or word salad is generally rather poor.

But recently I've been noting some cases where an ASR system unexpectedly fails to take account of what seem like some obvious local word-sequence likelihoods. To check my impression that such events are fairly common, I picked a random youtube video from YouTube's welcome page — Bill Maher's 6/23/2017 monologue — and fetched the "auto-generated" closed captions.

Read the rest of this entry »

Permalink Comments (6)

"balls have zero to me to me to me to me to me to me to me to me to"

June 20, 2017 @ 4:29 pm· Filed by Mark Liberman under Computational linguistics, Elephant semifics, Humor

Adrienne LaFrance, "What an AI's Non-Human Language Actually Looks Like", The Atlantic 6/20/2017:

Something unexpected happened recently at the Facebook Artificial Intelligence Research lab. Researchers who had been training bots to negotiate with one another realized that the bots, left to their own devices, started communicating in a non-human language. […]

Read the rest of this entry »

Permalink Comments (7)

Computational linguistics in three acts

June 12, 2017 @ 9:30 am· Filed by Mark Liberman under Computational linguistics

Towards the end of April, I gave a short presentation at the Penn Science Café in a session on "The past, present, and future of AI". I mentioned this in a comment on an xkcd cartoon in "Machine Learning", 5/17/2017, where I also reproduced my opening Science Café slide:

Over the weekend, Fernando Pereira posted a wonderful account of these three eras, with some thoughts about the nature of the underlying problems and possible directions for the future: "A (computational) linguistic farce in three acts", Earning My Turns 6/10/2017.

Permalink Comments (11)

Sentiment analysis disappointment

June 12, 2017 @ 8:19 am· Filed by Mark Liberman under Computational linguistics

A Quinnipiac Poll released on May 10 asked respondents "What is the first word that comes to mind when you think of Donald Trump?" 46 words were used by 5 or more respondents. The full list, with the number of responses for each word, is here — the top 15 words were:

idiot         39
incompetent   31
liar          30
leader        25
unqualified   25
president     22
strong        21
businessman   18
ignorant      16
egotistical   15
asshole       13
stupid        13
arrogant      12
trying        12
bully         11

For other reasons, I've recently been gathering word-linked information about features like frequency, concreteness, positive vs. negative valence, etc. So I thought it would be interesting to look at the (obviously bimodal) distribution of positivity found in this list, and perhaps the distributions of some more subtle properties as well.

Read the rest of this entry »

Permalink Comments (1)

Archive for Computational linguistics

English Verb-Particle Constructions

Gender, conversation, and significance

Helpful Google

Elephant semifics

Do STT systems have "intriguing properties"?

"The eye of the needle … is being tried to be threaded…"

Amazon Echo Silver

"… misdemeanor of the 115th Congress.”

"The Real Threat of AI"

"One big Donald Trump AIDS"

"balls have zero to me to me to me to me to me to me to me to me to"

Computational linguistics in three acts

Sentiment analysis disappointment

Follow us on Twitter

Archives [+/–]

Blogroll [+/–]

Meta