The 6 Principles For Designing Voice Interfaces


Digital Arts

Image: iStock

Learn how to design interfaces that you speak to using Amazon's Alexa, Google Home or Apple's Siri that feel like talking to a human.

In the future, using your voice to interact with a machine won’t be remarkable.

Just as the mouse, graphic interface, and touchscreen were incorporated into our daily lives, voice will simply become normal. Of course: we’re not there yet. Right now, the biggest players in tech are investing in weaning us off our screens toward these new ways of interacting, but the new standards for seamless and intuitive user experiences haven’t been set. The opportunity this presents is too big to ignore, even given every challenge that comes along with an exciting emerging technology.

Consider the complexity of human speech. It’s something we take for granted thousands of times a day, but our simplest requests are packed with meaning and instructions, and even those can hinge on mere details and shifts in context.

Take ordering a taxi for instance, a request that, in an analogue world, can be made in myriad ways – waving, standing under a taxi sign, simply shouting "Taxi!". But for Amazon’s Echo, only particular iterations do the trick. "Alexa, ask Uber for a ride" is no problem, but "Alexa, order me an Uber" doesn’t hit the mark.

These small variations in phonetics and structure can hinge on language proficiency, regional dialect, adherence to idioms, social cues and more besides. It’s a breadth of alternatives that proves tricky for fellow humans, let alone machines.

All signs point to significant improvements in language-processing accuracy in the years ahead, but technical proficiency never guarantees great products. Getting the user experience right is a question of design.

The search for that design is making this a dynamic time for voice interaction. Apple’s Siri has been making its way slowly into user’s lives, but their new HomePod (below) is accelerating the prominence of the voice assistant. Amazon is introducing features into their Echo developer kits that highlight and enable building more human-like qualities into Alexa.

However, more than any one feature or product, we believe that great interfaces are built on principles. We believe it so much that we built a guide for the designers of today with six key principles that will underpin the great voice interfaces of the future, even as they grow and develop.

1. Craft a conversational interface

Our expectations of voice are deeply-ingrained, so much so that we often don’t even recognise them explicitly. Chief among them is that a voice interaction is not a one-sided process. We need a conversation.

Some conversations are meandering, some are quick and to-the-point, some start and stop a few times. Different conversations serve different purposes, so the first question to ask yourself is: where do I want this conversation to go?

Your answer shapes the way you design the subsequent user journey, ensuring that all actions and responses are relevant to the user’s need and cut out unnecessary steps or confusion along the way.

2. Use the path of least resistance

We often rely on users to look for the easiest way to complete a task, but context and situation can radically change the definition of ‘easy’. Put simply, voice isn’t a one-size-fits-all interface. 

For a user who’s occupied with driving, or needs to take a single action like pausing music, voice is certainly safer and more efficient than multiple taps and swipes. However, reading and replying to a day’s worth of emails could take that same user much longer with a voice assistant than a traditional screen and cursor.

Knowing exactly when and how to introduce an element of voice interaction so that it feels ‘easy’ requires more than just a knowledge of your users. It also requires a sense of empathy, the ability to put yourself in the user’s shoes and truly consider their context, their competing tasks, their desired outcomes, and what they want to do next.


The Amazon Echo Dot, providing access to the Alexa voice assistant.

3. Follow familiar sequences and structure

We have high expectations for things that communicate in the same way we do. Voice assistants exist in the realm of communication we are most familiar with but they’re still limited, they don’t have the flexibility that we’re used to our voices providing. 

The same is true when using visual interfaces, but those limitations are built-in and self-evident. When looking at an on-screen menu, users can easily locate themselves in the flow of interaction and see their options for progression at a glance.

Voice interfaces don’t have that luxury. A lack of sign-posting or direction can lead to a tiring user experience full of guesswork and circular interactions as exasperated users work to narrow down their options.

This makes establishing a familiar flow for similar interactions crucial, but that can’t happen in isolation: reliable and predictable user experiences are built by designers using reliable and predictable methods. Using sequence diagrams to represent patterns of interactions, designers embed the right process into the user’s experience, making sure those users are always heading towards their desired goal. 

4. Consider context and security

Unless you have regular access to a Jumbotron, using your voice is a more public action than using most visual interfaces. Designers choosing when and how spoken information is requested need to keep well aware that this, along with ‘voice traffic’ and other factors, can interfere with spoken instructions and derail interactions.

In practice, this means designing interactions that conspicuously break ‘seamlessness’ in the name of security or propriety, and allow multiple channels of interaction to work in concert. In banking, for instance, a spoken interaction can pause to use biometric verification instead of asking a user to speak their password aloud. 

When users are on a packed bus or in the office, a voice interaction is going to be less preferable than if they’re at home or in their car. As long as the distinctions are clear to the user, the shifting modes of an app aren’t seen as a temperamental bug, instead it shows up as a human-centric feature.


Google's voice assistant lives inside its Google Home speaker, and in every modern Android phone.

5. Build empathy with the machine

Social animals that we are, we instinctively navigate and negotiate relationships with speech. We want more to a conversation than a transactional exchange of information, so voice interfaces that only ever stick to the facts will feel lacking, even if users can’t quite describe why. These distinct personality traits are as much about establishing a social dynamic between human and machine (or user and service) as about subtly expressing a brand.

Building empathy is a fundamental element of creating an effective voice interface – not only for how easily information is shared, but how easily users acclimate to a voice interface at all. The two are actually connected: using emotive expressions to recognise and address the user helps build trust in both the interface, the service and, thereby, the fundamental elements for usability.

That being said, context is never absent. The specific service you’re designing for, and the moments it creates, inform the intonation, phrasing and sentence structure of the interface. Sarcasm is welcome when you’re bantering with a home assistant, but a wisecracking interface in the classroom might not be appreciated quite as much.

6. Access for All

The optimal outcome for any interface is ease of use and value gained from each interaction. This means creating fair, democratic, and accessible tools for all users – voice is no different. 

Designers have to carefully consider potential barriers to entry for users: an accent, a speech impediment, a bilingual family or office – all natural and common circumstances which could shut down a voice interface if it’s not inclusively designed. Careful research with frequent testing will uncover many of these barriers, but maintaining diverse design and build teams means there will be fewer blindspots from the beginning.

An empathic interface is only going to be built by empathetic designers. Beyond facilitating information exchange, it helps to set expectations, allows the user to forgive mistakes, and builds a deeper relationship between human and machine.

In conclusion

Voice interaction certainly poses more of a challenge for designers than purely visual systems, but more and more aspects of our lives are incorporating it, and the shift is inevitable. The technical process will become easier as back-end sophistication increases – but the importance of staying human-centred will never go away.

If you’re willing embrace these principles, you can pave the way for design to be at the heart of the next phase of human-machine interaction.

Comments