Farhad Manjoo, "Apple Doesn’t Need To Make the TV of the Future: The revolution is already here—and it’s called the Xbox", Slate 3/27/2012.
If the rumors are true, Apple will release a television set later this year that it will tout as the most amazing boob tube ever invented.
The biggest selling point will be Apple’s promise to make navigating our viewing choices easier. Say you want to watch Tower Heist on a Saturday night. You’d first check Netflix, because if it’s there, it’ll be streamed free for members. If it’s not, and if you subscribe to Amazon’s Prime service, you ought to check there, because you might get a discount. If that fails, you’ll look for the movie on iTunes, Hulu Plus, or Comcast in whatever order is most convenient for you. The whole process is a frustrating mess, one that Apple will likely try to solve by building a cross-platform search engine into its TV. Instead of going to every service separately, you’ll just say, “Hey TV, I’d like to watch Tower Heist!” and the screen will show you where the flick is playing, and for how much. You’ll just have to choose one and press Play.
When CEO Tim Cook shows off Apple’s TV set this fall, I bet he’ll call voice-activated universal search a revolutionary way to interact with your television. What Cook probably won’t mention is that it already exists. Indeed, much of what Apple is likely to build into its TV is available today on a gadget whose interface is just as easy to use as anything Apple will cook up. The device is called the Xbox 360.
Over the last few months, Microsoft has turned its video-game console into your TV’s best friend.
Rich Jaroslovsky, "Apple TV offers hints at Jobs’s vision for our living rooms", Washington Post 3/29/2012:
All but lost amid the hoopla of its latest iPad, Apple also released a new version of Apple TV, the $99 streaming-video set-top box that the late Steve Jobs used to call “a hobby.” […]
My main beef with the interface, one that will have to be solved in any fully integrated Apple set, was the painful search process when using the included three-button remote control.
It required me to laboriously enter my search term by scrolling through a grid of letters, choosing one at a time, until what I was looking for showed up on an ever-changing list of possible matches.
The process is much easier if you have an iPhone, iPad or iPod touch and download Apple’s free Remote app, which lets you simply type your query on your device. But not every viewer of an Apple-branded set might have an Apple mobile device — or have it handy.
What if you could simply tell your TV what you wanted to watch, and it understood and fetched it? The technology already exists in Siri, the iPhone 4S virtual assistant. Implementing it in a future Apple-branded television would eliminate the current complications without the need for a full-scale keyboard, …
Wilson Rothman, "Xbox pre-emptively strikes at Apple iTV with Comcast, HBO, MLB", MSNBC:
Tuesday, Microsoft's pre-emptive strike against Apple surged with the promised addition of streamed Comcast, HBO and Major League Baseball content.
OK, so why am I so breathless over this? Because not only does the Xbox+Kinect media interface, out since last December, establish a technological precedent for usable voice and gesture TV control, but its search function sniffs through all of your high-value content — from Netflix to Comcast — and lists all options at once. I search for "30 Rock" and see every instance of where and when I can watch it, on any of my compatible services.
Natasha Singer, "The Human Voice, as Game Changer", NYT 3/31/2012:
VLAD SEJNOHA is talking to the TV again.
O.K., maybe you’ve done that, too. But here’s the weird thing: His TV is listening.
“Dragon TV,” Mr. Sejnoha says to the screen, “find movies with Meryl Streep.” Up pops a list of films like “Out of Africa” and “It’s Complicated.”
“Dragon TV, change to CNN,” he says. Presto — the channel flips to CNN.
Mr. Sejnoha is sitting in what looks like a living room but is, in fact, a sort of laboratory inside Nuance Communications, the leading force in voice technology, and the speech-recognition engine behind Siri, the virtual personal assistant on the Apple iPhone 4S.
Here, Mr. Sejnoha, the company’s chief technology officer, and other executives are plotting a voice-enabled future where human speech brings responses from not only smartphones and televisions, cars and computers, but also coffee makers, refrigerators, thermostats, alarm systems and other smart devices and appliances.
It is a wildly disruptive idea. But such systems are already beginning to change the way we interact with the world and, for better and worse, how we think about technology. Until now, after all, we’ve talked only to one another. What if we begin talking to all sorts of machines, too — and, like Siri, those machines respond as if they were human?
"Spansion Announces Partnership with Nuance to Accelerate Voice Recognition Innovation for the Embedded Market", Press Release 3/21/2012:
Spansion Inc. today announced a partnership with Nuance Communications Inc. to accelerate voice recognition innovation for embedded technologies. As leading innovators of semiconductor products and voice recognition respectively, Spansion and Nuance are working together on enhancing the responsiveness and quality of voice recognition for embedded solutions addressing the automotive, gaming and consumer electronics applications.
I could go on, but I think you get the point. Something is happening here.
The technological part of this has already happened. It's partly that gradual improvements in speech and language technology have passed a threshold — though this happened some time ago. It's partly that Moore's law has made processors and memory small enough and cheap enough that you can easily outfit a mobile phone with the power of a 2000-era high-end PC — and you can do the same with a coffee-maker or a refrigerator, if you can think of anything for them to compute. And it's partly that everything is networked, so that information and computation can be shared with back-end servers at will.
(Steerable microphone arrays and source-separation algorithms are part of the picture as well — but again, this is old technology, made increasingly accessible by Moore's Law.)
The social part of all this mostly hasn't happened yet. And as Yogi Berra told us, it's tough to make predictions, especially about the future.
Talking to the TV seems like a no-brainer. It remains to be seen whether and when the public at large will start using this routinely, but an increasing proportion will have the chance to try. It should help that current entertainment-center control systems seem to have emerged from a tenuous cease-fire imposed on warring tribes of push-button salemen.
Hands-free, eyes-free device control in automobiles seems like another no-brainer. People were developing potential products of this kind at Bell Labs when I worked there in the 1980s — the automobile environment offers plenty of electricity, plenty of space, and a reasonable tolerance for costs, so it ought to have been an early success for these technologies. What's different now? Improved speech technology, more powerful embedded systems, and especially ubiquitous wireless networking. However, it still remains to be seen whether the current efforts at Ford and GM and the rest will make talking to your car a routine function for most of us.
There's still some resistance to overcome, and it's not all cultural conservatism. Heesn Wee, "Searching for speech technology's holy grail", CNBC 3/30/2012:
Telephone your credit-card company, health insurer or just about any big consumer-facing company, then speak into the receiver: for new accounts, say "new"; or billing, say "billing." Forget it. You shout and stumble through the phone maze and often land at the directory's start.
Recognize this experience? Somehow, voice recognition — despite some of technology's most awesome achievements (tablets! the remote control!) — remains an anathema. We still can't talk to computers like Captain Picard on Star Trek: The Next Generation.
The same article quotes Bill Meisel (who is the editor of Speech Strategy News, and thus hardly disinterested) to the effect that "voice technology will revolutionize the tech ecosystem the way graphical user interface forever changed personal computing". I hope so — but Bill Gates was saying similar things 15 years ago. A cynic might adapt various wits' observations about fusion power and about Brazil, and suggest that voice technology will always be the interface of the future.
Let me make it clear that I think the relevant technologies are now good enough to support Star Trek Communicator-level interactions with home entertainment systems. When this possibility will become reality — or indeed whether Microsoft has already achieved it with the Xbox + Kinect system — is less clear to me.
The designers and developers and marketers working in this area have my best wishes for success. But what interests me most, these days, is the opportunity that the development of these technologies offers for approaching the sciences of speech and language in new ways. As I wrote in a 2010 obituary for Fred Jelinek in the journal Computational Linguistics,
Independent of their value in practical applications, the algorithms developed by the process that Fred Jelinek pioneered offer marvelous new tools for scientists. Applying these tools to the vast stores of digital speech and text now becoming available, we can observe linguistic patterns in space, time, and cultural context, on a scale many orders of magnitude greater than in the past, and simultaneously in much greater detail. Rather than evoking the impact of particle accelerators, as the ALPAC report did, it may be more appropriate to compare these tools to the invention of the microscope and telescope in the 17th century: Everywhere we look, there are interesting patterns previously unseen.