News program presenter meets robot avatar

« previous post | next post »

Yesterday BBC's Radio 4 program "Today", the cultural counterpart of NPR's "Morning Edition", invited into the studio a robot from the University of Sheffield, Mishal Husain and the Mishalbot the Mishalbot, which had been trained to conduct interviews by exposure to the on-air speech of co-presenter Mishal Husain. They let it talk for three minutes with the real Mishal. (video clip here, at least for UK readers; may not be available in the US). Once again I was appalled at the credulity of journalists when confronted with AI. Despite all the evidence that the robot was just parroting Mishalesque phrases, Ms Husain continued with the absurd charade, pretending politely that her robotic alter ego was really conversing. Afterward there was half-serious on-air discussion of the possibility that some day the jobs of the Today program presenters and interviewers might be taken over by robots.

The main thing differentiating the Sheffield robot from Joseph Weizenbaum's ELIZA program of 1966 (apart from a babyish plastic face and movable fingers and eyes, which didn't work well on radio) was that the Mishalbot is voice-driven (with ELIZA you had to type on a terminal). So the main technological development has been in speech recognition engineering. On interaction, the Mishalbot seemed to me to be at sub-ELIZA level. "What do you mean? Can you give an example?" it said repeatedly, at various inappropriate points.

Today's Dilbert strip also features a voice-activated interactive device that is supposed to converse like a person. Dilbert has invented it, and reckons it's a huge breakthrough in the simulation of truly humanlike interaction. He gets his boss to test it by letting him try conversing on any arbitrary topic. (It may be relevant here that Scott Adams is a notorious cynic, and also that the profoundly unpopular cartoon boss has a wife who we never meet.)

Boss: What do you want for dinner?
Device: I don't care. What do you want?
Boss: I was thinking maybe Chinese food.
Device: I'm not in the mood for that.
Boss: (angrily) Then why did you say you don't care?
Device: Now I'm not even hungry.
Boss: Why? What's wrong?
Device: Nothing is wrong.
Boss: (aside, to Dilbert) You nailed it.

Again we are talking about a back-and-forth of clichés. But it's much more complex; notice that the boss's third utterance is a question based on the meaning of (and probing the motivation for) the device's first utterance, which is three conversational turns back. With a bot that could perform like this (annoyingly quarrelsome though it might be), I might be more impressed.

I often get the feeling that journalists think we might be on the threshold of a future where robots (and other computing machines) can actually engage with us conversationally. It's the only explanation for the credulous reporting of things like the Facebook chatbot secret language story ("Researchers who had been training bots to negotiate with one another realized that the bots, left to their own devices, started communicating in a non-human language," etc.).

I was asked in a BBC radio program a couple of days ago about the prospects for seriously usable voice-channel instant translation software obviating the need for us to learn foreign languages any more, and the presenter was surprised that I said "Don't hold your breath," and put it at least 20 years in the future. But the simultaneous interpretation task is far easier than the task of getting a machine to interact on the basis of its own knowledge and opinions, because of the determinate input-output pairing (there's a given source-language utterance, and we broadly know what sort of target-language utterance it should be converted into).

It's New Year's Eve, and everyone in the media is asking experts for their predictions for the coming year. One safe bet for a linguistic scientist would be that 2018 will end without us seeing any intelligent language use by a robot — other than the kind of easily broken ELIZA-style script-based systems that hand out random clichés in response to input utterances, based on not even the foggiest idea about what the inputs might mean.

Though I do see the problem here about deciding, a year from now, whether the prediction held up: you need to have a certain amount of education in linguistics and computer science in order to grasp the distinction I just drew. After all, famously, people with no understanding of the technicalities were taken in by ELIZA more than fifty years ago.



Comments are closed.