Leaked documents from tech giant Amazon reveal limited engagement with its Alexa voice assistant on smart speakers, according to a report from Bloomberg this week. This low engagement reflects the difficulty of using today’s voice interfaces and a lack of investment by businesses in new and useful applications — but it doesn’t mean voice UX is dead just yet.
Since their launch in 2014, Amazon’s Echo smart speaker devices have been a runaway success: Today, a quarter of US households own at least one. Global smart speaker shipments grew from 6.5 million units in 2016 to 166.2 million in 2020, according to figures from market researcher Kagan, and Amazon commands a 22% share.
But this growth may be dwindling. According to a report by Bloomberg, internal documents leaked from Amazon assert that the smart speaker market has “passed its growth phase,” and that the company is predicting growth of just 1.2% in the coming years. (An Amazon spokesperson told Bloomberg that “the assertion that Alexa growth is slowing is not accurate”.)
The documents also reveal limited engagement with Alexa, the voice assistant used to interact with Echo devices, Bloomberg reports. Most device owners only use three voice-controlled functions: playing music, setting a timer, and turning on lights. One document reveals that most users discover half the voice features they will ever use within three hours of activating their device. And customers that own devices with screens are more likely to use them at least once a week.
This could be a stumbling block for Amazon, which has positioned voice as central to its future user experience. “When you experience great voice apps, it makes tapping on an app so circa 2005,” Amazon CEO Andy Jassy told CNet in an interview in September. And it raises doubts about the significance of voice as a channel through which to reach customers.
Why aren’t more customers talking to Alexa?
Reports of limited engagement with Alexa come as “no surprise whatsoever” to Ben Sauer, an independent design consultant and former head of conversation design at Babylon Health. “The problems have been well understood in industry for years.”
“The core problem,” Sauer says, “relates to our evolution as a species.” While our interaction with screen-based interfaces has evolved over many decades, our expectations for voice interfaces are set by conversations with human beings. “Voice interfaces tend to disappoint us very quickly,” he says. “When someone first starts using a smart speaker, they realise quickly that the technology isn’t even close to matching a human conversation, so their use becomes rather conservative.”
Voice interfaces tend to disappoint us very quickly.
Ben Sauer, design consultant
Unlike screens, Sauer adds, voice interfaces do not display what functions are possible. “You have to remember what it can and cannot do,” he explains. “Until the technology is much more capable, intelligent, and flexible, most of us will stick to the basics (music, cooking timers, etc.) because we’re not capable of remembering its ability.”
These shortcomings are exacerbated by the limited functionality of conversational AI, adds Carolina Milanesi, principal analyst at consumer technology consulting firm Creative Strategies. “Conversational AI is still difficult, meaning that we are still having to make an effort to learn how to speak to these assistants,” she says.
Voice assistants have also run up against the difficulty of distinguishing multiple voices in a domestic setting, as well as privacy concerns among users, Milanesi explains. “The reality of this is that even with voice tagging and user identification, handling a family dynamic is much harder than targeting an individual, especially when privacy concerns lead people not to associate their voice to their identity.”
Voice UX as a customer channel
Nevertheless, some companies have developed apps for Amazon’s Echo devices (known as Skills) and for Google’s Nest product line. Mostly, these have been content publishers whose products are suitable for audio, says John Campbell, founder of voice experience agency Rabbit & Pork. This include audiobooks, especially cookbooks, and meditation apps.
There have been some applications beyond publishing, Campbell says. Rabbit & Pork has worked with insurance provider LV, for example, allowing customers to ask questions about their insurance policies. Other potential use cases include branding, customer service and e-commerce.
Mostly, however, businesses have yet to enable even basic functions. “There’s nothing at the moment in the UK where I could go ‘Alexa, what’s my bank balance?’ or ‘How much did I spend last week?’,” Campbell explains. One reason for this is that such apps would require the requisite data to be available via an API. But, Campbell says, “UK companies haven’t done those integrations.”
The quality of voice apps has also suffered from a lack of investment, Milanesi says. “Judging from the Skills you find on Echo devices, it does not seem there was a huge investment, to be honest,” she says. “Yes, there are a ton of Skills but the quality of many is questionable, in my opinion.”
There are a ton of [Alexa] Skills but the quality of many is questionable, in my opinion.
Carolina Milanesi, Creative Strategies
Ultimately, says Sauer, there hasn’t been a business need for most organisations to engage customers through smart speakers, says Sauer. “Brands have been waiting to see if this channel starts to pay off as a way to connect with customers, and for many, it hasn’t, except in specific cases, like automating customer service,” he says. This week’s news from Amazon is unlikely to change this, he adds.
The future of voice UX
The fact that many Alexa owners aren’t chatting to their devices does not spell the end of voice as a channel for reaching customers, however.
Smart speakers are often described as ‘training wheels’ for voice UX, says Campbell, helping users get comfortable with talking to a machine. Now, voice interfaces are being built into other devices, most notably cars and TVs, he explains. Amazon, Google and Apple are all courting carmakers, hoping they will incorporate their respective voice assistants into their vehicles. Amazon’s new TVs, meanwhile, incorporate Alexa.
Milanesi believes that experiences that combine voice and screen are more likely to engage users. “Voice and visual is the way to go,” she says. “The combination of using voice to make a request and having a screen help with the content delivery offers many more opportunities for brands to create a richer experience.”
Amazon is also touting Alexa as a tool for use in business settings. Its Alexa for Business solution, which has yet to be launched in the UK, proposes that employees use smart speaker devices to book meetings, check stock levels, and other business functions. Milanesi is sceptical of the potential of voice in a work setting, however, “because of the many identities an assistant would have to deal with.”
Sauer concludes that voice is likely to expand beyond smart speakers. “There’s plenty of evidence that the prevalence and slowly increasing reliability of voice interfaces is making it more acceptable for use in some new settings,” he says.
But cultural factors may limit its spread, Sauer adds. “Voice, as a channel, remains more constrained than screens in social situations,” he explains. “Whilst it’s okay now to ask Alexa to play music in front of your family, most people (in the West perhaps) still aren’t comfortable messaging their friends around other people using voice. So some domains, like the office, may only see little or no progress on this front.”
“I would not disregard this channel,” concludes Milanesi. “Just be cognisant it will take time.”
Pete Swabey is editor-in-chief of Tech Monitor.