The days of typing search terms into a browser or clicking
through drop-down menus may be fading. Rising in their place is a much more
natural interface: voice.
Market research firm Gartner predicts consumer demand for
voice devices such as Amazon Echo and Google Home will generate $3.5 billion by
2021.
Statistics regarding consumer adoption of these products are
staggering.
In January, Google announced it sold “more than one Google
Home device every second since Google Home Mini started shipping in October.”
Google Assistant - which powers all of the Google Home products (original, Mini
and Max) and also works on Android phones and tablets, iPhones, TVs and watches
- is now available on more than 400 million devices.
But Amazon still dominates the smart-speaker market, with
third-party research estimating the Alexa products control 76% of the total
user base. In December, Amazon said “tens of millions of Alexa-enabled devices
sold worldwide” during the holiday season, and the Echo Dot was the top-selling
product from any manufacturer in any category across all of Amazon.
As consumers become more comfortable conversing with a
device for information and shopping, brands are investing more resources -
human and financial - to develop voice-enabled solutions.
In part two of our series on voice, we take a look at the
characteristics of an effective voice interface.
Voice first
“You’ve got people like Google and Amazon and Apple spending
billions and billions of dollars pushing people towards voice assistants. With
those companies creating the market demand for people expecting to be able to
use these to make their lives better, then travel providers need to believe
that. There’s going to be a fast groundswell of consumer expectation.”
Subscribe to our newsletter below
That’s a prediction from Charlie Cadbury, co-founder of
Dazzle, a conversational platform for the travel industry.
In the next month, Dazzle will launch a white-label solution
that brands can adopt to converse with their customers via voice-activated
speakers such as Amazon’s Echo products and Google Home, as well as through SMS
messaging and chatbots.
“Each one of these natural language channels is just another
way to get to the core assistant, which is a way for brands to understand a
particular question being asked and serve back the right response,” Cadbury
says.
This platform-agnostic architecture - which allows the conversation to start on one device and continue on another – is one of the characteristics of an effective natural language
processing solution.
“What’s happening is called a voice-first revolution, but
it’s not a voice-only revolution,” says Sina Kahen, co-founder and chief
consultant at Vaice, a pro bono consulting team of 15 MBA students from Imperial College in
London who want to bridge the gap between brands and
voice technology developers.
“The initial interaction may be instigated via voice, but the
journey needs to include a multi-modality of experiences - your phone, your TV
screen, your car - everything’s connected, so the conversation can continue. Voice won't necessarily replace the interfaces we have but will supplement them."
The "what"
Brands looking to adopt voice technology need to consider
two fundamental things – what information they will provide and how they will
provide it. To tackle the “what,” Cadbury and Kahen both advise companies to
start with their existing catalog of FAQs.

If brands really want to understand the best way of utilizing voice they need to hire storytellers, they need to hire philosophers, they need to hire people who understand the power of communication.
Sina Kahen - Vaice
“The best practice of user-centered design principles is to
get very close to the customers and not look at the service as the solution but
look at the pain points,” Cadbury says.
“In terms of UK rail, it’s ‘Is my train on time this morning?’
That’s a real basic one, but if you get to the station and your train is going
to be half an hour late, that train company has taken half an hour away from
your life. Whereas if you get that information when you are still at home, you
get half an hour more at home.”
After building an initial directory of typical questions and
answers, brands can bolster their voice solution over time by addressing less common
issues and by adding information from new incoming queries – making the system “smarter” over time.
Brands may also want to enable what Cadbury calls a “human
in the loop.”
“With Dazzle, if it can’t answer, the question gets sent to
a human agent who can look back at the whole back and forth of the conversation
to that point in time and then jump in and respond with their knowledge,” he
says.
The "how"
The more complex aspect of voice interfaces is determining “how”
questions are answered.
“Our prediction is if brands really want to understand the
best way of utilizing voice they need to hire storytellers, they need to hire
philosophers, they need to hire people who understand the power of communication
and how to hold meaningful and deep conversations,” Kahen says.
That's a very different hiring strategy than the one used to create communication platforms of the past.
“Historically when you are building websites and mobile
phone apps you are visually- or software-focused,” Cadbury says.
“Whereas this is all about how to make a bouncy, engaging
delightful conversation. So it’s about how we stop thinking like software
engineers and start thinking like linguists.”
The most effective voice interfaces understand a broad range
of utterances, which are the phrases that can be used to ask a question. Someone
wanting to know the time of their upcoming flight may ask, “What time does
my flight leave?” or “When does my flight leave?” or “At what time does my
flight depart?” – and ideally the system understands those are three ways to seek
the same information.
Kahen recommends brands design their voice
interfaces based on common principles of first-date communications: “No awkward
silences, no forced conversations, no lack of personality, responsiveness.”
“Voice is something innate within us to
communicate. We want to make sure brands are able to mimic that human
development as opposed to creating a robotic interaction where it’s unnatural,”
he says.
Relationship building
Cadbury says Dazzle sees brand adoption of voice as a three-part
process. It starts with getting consumers to trust the interface, which builds each
time they ask a question and easily get the right answer.
As trust is established, consumers will use the voice
assistant more often, which generates valuable data that brands can use to
personalize the conversation.
And the final phase comes when the communication channel can be
used to drive revenue.
“The more that you feel like you are being spoken to as an
individual, rather than just a customer of that airline, for example, the
happier you will be and the more you will use the service because you feel it
is a really trusted advisor. And the minute you have that trusted relationship
which is personalized to you, then there is the opportunity for the travel
operator to sell you ancillary services that are really going to deliver a
benefit to you as a traveler,” he says.
As brands look to build that trusted relationship, they
should also consider the persona that will represent their interactions with
consumers.
“Voice will really dictate how brands are perceived by
consumers like never before,” Kahen says. “Most brands haven’t thought about
what does our voice sound like? Are they male or female? Are they deep-voiced
or high-pitched? Young or a bit older?”