Clik here to view.

Voice User Interfaces(VUIs) are becoming increasingly popular be it Google, Cortana, Hound, Siri or Alexa. Let’s ensure we design them with care. This article covers some basic design steps as well as do’s and dont’s by exploring psychological and linguistic principles to ensure that voice-based interactions are seamless and provide a great user experience.
Sample Dialogs:
Sample dialogs are a representation of the interaction between a user and the VUI (voice user interface). Pick out a few common use cases of the VUI and write up “blue-sky” i.e. best path sample dialogs for the same. Also, write dialogs for possible error situations.
Clik here to view.

Flow:
After writing sample dialogs, design flows i.e. visual representation of all the paths that can be taken through the VUI system.
Clik here to view.

Confirmations:
After designing the basic flow, one needs to make sure that the user is understood. Confirmations might be implicit, explicit, non-speech, visual or generic.
APP: what kind of pizza will you like to have
USER: Margherita small.
APP: 1 small size Margherita pizza. Is that correct?
USER: Yes.
Command & Control or Conversational:
It is important to decide whether the VUI system is going to be command and control i.e. the user needs to do something explicit like push a button to indicate the beginning of a conversation, or Conversational i.e. such that an explicit indication is not required. In conversational systems, natural turn-taking techniques are used such as:
a) Asking a question: User will respond when the VUI asks something.
b) Using eye contact: in case you have an avatar.
c) Pauses: a pause by one speaker is used as an occasion for the other speaker to speak
d) Explicit direction: eg.
VUI: Can u tell me the name of the movie you want to see
USER: Inception
VUI: Ok playing Inception
Conversational Markers:
Conversational markers are an important way to let the people know where they are at a conversation. They include:
a) Timelines( First, Halfway there and Finally)
b) Acknowledgements(Thanks, Alright, Sorry about that)
c) Positive feedback(Good job, Nice to hear that)
Error Handling:
Following are some of the ways a VUI can make an error:
a) No speech detected: There are two ways to handle this error:
b) Call out explicitly( eg. VUI: I’m sorry, I didn’t hear anything? What is the movie you wanna play?)
c) Do Nothing
Explicit call outs can be used in cases where the system is only voice, or the process can’t proceed without the user input.
In other cases where the system can proceed without user input, or the user can move forward in another way like pressing a button etc., it makes more sense to do nothing.
b) Speech Detected but nothing Recognized:
The strategies for handling such cases are similar to the above i.e.
a) Call out explicitly.
b) Do nothing.
c) Recognized but not handled or recognized but incorrectly:
The system might either respond incorrectly or not respond at all to a user response. This might happen in cases where the system is programmed incorrectly. eg.
MEDICAL ASSISTANT: How are you feeling?
USER: My arm sort of hurts.
MEDICAL ASSISTANT: I’m sorry I did not understand. How are you feeling?
Strategies to handle such cases include better anticipation and more exhaustive training data collection.
Best Practices:-
Escalating Error:
Can have detailed prompts which help in reducing errors by escalating them. For eg.
VUI: What‘s the city and the state you live in?
USER: Chennai
VUI: I’m sorry I didn’t get that? which city and which state?
USER: Chennai, Tamil Nadu
Context Awareness:
A system needs to be aware of the context and store history.
Eg. USER: Hey google Who directed the movie Joker
GOOGLE: The movie Joker was directed by Todd Philips.
USER: where was he born?
GOOGLE: He was borne in Brooklyn, New York
USER: Can you repeat that?
GOOGLE: Sure. He was borne in Brooklyn, New York.
Help And other universals:
A system should be able to tell what all it could do. eg. Cortana provides a visual set of examples of what all it can do like call, message, calendar, reminder, note, alarm.
Latency:
Latencies are delays in information caused by poor connectivity, system processing and database access. You can handle it by either telling about the delay eg. One moment please, while I look up for the record or visual and non-verbal cues.
Disambiguation:
It happens in case the user doesn’t provide all of the information. For eg.
USER: Call Arvind
SYSTEM: Displays all Arvind Arvind A, Arvind G and Arvind B
USER: Call Arvind A
SYSTEM: Home phone or mobile?
USER mobile
SYSTEM: Calling
Barge-In:
Barge-In refers to the fact that the user can ‘Barge-in’ or interrupt a system in between. this is important in cases where the system produces detailed information and the user wants to stop or interrupt. A Way of doing it can be using hotwords like stop. Eg. if ALEXA is playing a song, the user says the hotword ALEXA STOP. it forces the system to stop the song.
Timeouts:
It is important to figure out when the user stops speaking. Some VUIs do so by having either default or configuring an end of speech timeout point. This refers to the length of the pause the user takes before the system decides the user is done speaking. Similarly, there is ‘no speech timeout’ and ‘too much speech timeout’.
Emotion and sentiment Analysis:
It means using Natural Language Processing to extract information about how a user is feeling. For eg., you can categorize into positive and negative categories based on the choice of words for a user’s response to a question.
While the concept of VUI was popularized more than 50 years back with series such as Star Trek, it is only in the last few years that it has moved out of the lab and into our pockets and living rooms. This has gone hand in hand with advancements in computing and machine learning. As the basic capabilities of VUI’s become more robust, service providers are beginning to focus on context-awareness, personality and natural conversation. VUI is a challenging user experience problem that draws on several allied fields such as psychology and linguistics to increase the ease of interaction. It is conceivable that in the near future, VUI technology will be sophisticated and mature enough to replace point and click technology as the chosen method of human-computer interaction.
Image may be NSFW.Clik here to view.
Designing for Voice User Interfaces was originally published in Designerrs on Medium, where people are continuing the conversation by highlighting and responding to this story.