- Business Insider/Yu Han
- Bill Stasior, who previously led Siri’s development at Apple, says today’s virtual assistants don’t deliver on the promise of being able to understand speech as naturally as other people can.
- This is one of the biggest areas virtual assistants are likely to improve in the next three to five years.
- It’s an industry-wide challenge that other companies working on the technology have acknowledged in the past.
- Visit Business Insider’s homepage for more stories.
If you’ve been paying any attention to the announcements Apple and Google made during their recent developers conferences, you probably already know that big improvements are coming to Siri and the Google Assistant.
During its Worldwide Developers Conference earlier this month, Apple announced that Siri will be getting a more natural voice that’s generated entirely by software when its iOS 13 update for iPhones launches later this year. Google said in May that its digital assistant will be able to understand and process requests up to 10 times faster, and Amazon recently launched a new version of its Alexa-powered Echo Show.
Although digital assistants like Apple’s Siri, the Google Assistant, and Amazon’s Alexa have advanced quickly in recent years, they still have a long way to go says Bill Stasior, who recently departed Apple to join genetic diagnostics and therapy development firm Avellino Labs as a member of its executive advisory committee. Stasior joined Apple in 2012 and led the team responsible for Siri’s development until he left the company in May.
Today’s virtual helpers can do everything from starting up the coffee machine after your morning alarm goes off to making reservations at your favorite restaurant. But the biggest way they’re likely to improve over the next three to five years is simple: they’ll get better at understanding the way we speak.
“In my opinion, none of the virtual assistants really deliver on the promise of eventually being able to understand people as naturally as other people can understand them,” Stasior said in a recent interview with Business Insider. “I think everyone learns what commands work with the assistants and what commands don’t work with the assistants. And while that’s improving very rapidly right now, I think there’s still a long way to go.”
Part of the reason it can be difficult for voice-enabled helpers to understand human speech naturally is because most machines are designed to handle specific tasks, says Stasior. They don’t have a general understanding of the world the way humans do.
“When you want to talk to an assistant, you’re opening the door for almost any task or any question,” he said. “There’s just an incredibly broad variety of language and ways of expressing ourselves. And having that general capability, we’re still a ways away from it.”
That artificial intelligence-based features in the products we use today lack general intelligence is a shortcoming the industry is very much aware of. Companies such as Google and Facebook have acknowledged as much in the past. Yann LeCun, the chief AI scientist for Facebook Research, recently discussed how the industry is working toward advancing the use of self-supervised learning in AI, a technique that could help machines make broader observations about the world through data rather than being trained for specific tasks.
Google has acknowledged in the past that the Google Assistant’s ability to truly understand users is fairly limited today. “The holy grail to me is that we can really understand human language to a point where almost anything I can say will be understood,” Ryan Germick, a principal designer at Google that leads the development of the Assistant’s personality, said to TIME in 2017. “Even if there’s an emotional subtext or some sort of idiom.”
Amazon is said to be working on just that. The online retail giant is developing technology that would be capable of recognizing a user’s emotional state judging by the sound of their voice, Bloomberg reported last month. Doing so could be difficult considering that even humans can sometimes struggle with making certain observations based on the sound of a person’s voice.
As an example, Stasior mentioned how people sometimes mistake one family member for another when talking over the phone. “I think that’s interesting because we sometimes expect machines to be able to tell one person from the other just from the sound of their voice,” he said. “It’s possible that they might be able to do that better than people. But for a starter, we know that [since] it’s hard enough to do as people, that it’s a challenging problem.”
Digital assistants are already getting better at this, too. Both Amazon and Google rolled out the ability for their assistants to identify specific people by the sound of their voice in 2017. Siri will also be capable of learning the voices of specific users on the HomePod when iOS 13 launches later this year.
Although identifying a user’s emotional state might be challenging for a digital assistant to learn, that doesn’t mean it’s impossible. After all, as humans, we’re capable of knowing whether a person is excited or upset by the sound of his or her voice.
“My presumption is that [this] means the data is there, that there [are] some patterns that humans are able to understand about people’s emotional state just from listening,” Stasior said. “So my presumption is that machines will increasingly be able to do that too. Just like you can understand my words and 40 years ago machines could barely do that.”