A 'how-to guide’ from a user experience designer's point of view
How does a user move through a Voice UI (VUI)? What do I need to consider to make it as frictionless as possible? What should I be aware of before even starting? And what exactly does it actually mean to design for voice?
The usage of voice UIs is on the rise. The number of users for voice assistants, especially smartphones and smart speakers, is increasing among all age groups.
Most of us have seen a video about someone toying around with Alexa or Google home and failing miserably in their mission. Probably everyone has at least once tried to search something via voice input or give a command to call a friend and our phone just made some totally random selection from the phone book. ‘Why?’ and ’No!’ are consequently some very common reactions to those results. Voice UIs are supposed to improve our efficiency, however there are still loads of limitations to it and we as designers should help provide information about it.
We are used to high fidelity from other humans and expect the Voice UI to interact with us in a similar way. With natural language we have a lot of context which we assume the listener knows previously, the evolving language, changing accents and so on. Then, if the Voice Interface doesn’t understand, the user feels extremely frustrated. But how do we teach all of this to a Voice UI?
Let’s take a look at how all of this affects the work of a user experience designer. Where to start and what to consider. In general I’d say that designing for Voice isn’t that much different than planning a usual digital service. You benchmark, ideate ideas, execute user interviews, collect user insights, prepare user journeys and so on. But how do you make sure the user gets through the service without any friction and will get a pleasant experience that will let them come back? With Voice, we are designing from scratch, figuring out best practises.
I obviously don’t have it figured out yet either, but here are a few steps that might help with designing your Voice UI.
Create conversation flows
Start your process by writing down different flows. How a conversation could go and what all the others directions of a conversation could be. Imagine the case of setting an alarm for the next morning. If you are setting the alarm on your phone or watch, you open an app, put in the time, select the day and save it. If however you give a voice command, there are no limitations with all the different words we use and ways we express ourselves.
“Set an alarm for tomorrow, 7 in the morning.”
“I want to wake up tomorrow, Monday, at 7am.”
“Set alarm. tomorrow. 7a,.”
“Wake me up at seven tomorrow morning.”
Those are just a few examples of what could happen when different people are setting an alarm and the options are endless. Prepare for different scenarios.
Act the conversations and create prototypes
With the flows created, draw out prototypes to convey the message. This doesn’t need to be anything fancy, just a minimal collection of stills to tell the story. Together with those prototypes, act the conversations you have written down earlier with other people. During this step you’ll realise some flaws your conversation flow still has and where the flow is already very natural and clear.
Use natural language
Design a natural language, use breaks between questions and answers but also time them accurately. Help the user to return to the flow if errors occur and don’t let the user ever end up in a no-return situation. Use different ways to recover rather than repeating ‘I don’t understand’. Create a language that is helpful and make functions discoverable similarly to a regular interface. This is definitely still a difficulty as the underlying technology is way more complicated than for example using your mouse or keyboard to explore the options. Also remember, languages evolve with their accents and slang and we still need to find ways to teach that to our technology.
Train custom vocabulary and patterns
And last but not least, if the use-case requires, consider training the underlying speech-to-text model with custom vocabulary and common speech patterns. This might not directly be related to the work of a designer, but it is good to be aware of and prepare for it. As a team, test out different speech to text models and select the one that achieves the highest accuracy.
The internet is full of excellent recommendations and case studies, Google Home and Alexa offer extensive development kits, but we still have a long way to go to really achieve a proper conversation between the user and the interface. The adoption is on the rise, but there are still many issues to solve before we have a flawless and pleasing user experience.