Communication Repair Strategies for Voice Assistants

How voice assistants could be better conversational partners

Does this conversation with Alexa, Amazon’s voice assistant, sound familiar?

Mother: Alexa! (pause) Myanmar is South or South East Asian?

Alexa: Sorry, I’m not sure.

Son: Myanmar is South or South East Asian?

(Alexa beeps)

(Family conversation pauses)

Mother: Try again.

Son (louder): Is Myanmar part of South East Asia?

Alexa: I’m not quite sure how to help you with that.

Son: Hmmm…

Mother: Alexa. Who is South East Asia? What country is East Asia?

Alexa: There are no UN recognized countries in Eastern Europe.

Son: This is getting annoying.

Father (raised voice): Alexa. Define South East Asian countries.

Alexa: This might answer your question. There are eleven sovereign states in South East Asia: Brunei, Cambodia, East Timor, Indonesia, Laos, Malaysia, Myanmar, Philippines, Singapore, Thailand, Vietnam. Did that answer your question?

Mother: Yes.

Alexa: Thanks for your feedback.

The example highlights how a family and a voice assistant would perform communication repair — the act of restoring common understanding after conversational partners misunderstand each other.

When addressing the voice assistant, family members add pauses and slow down their speaking rate; it’s something they wouldn’t normally do when talking to a person. When the voice assistant misses the first time, they try repeating the same request phrase. Communication still fails so they try increasing their volume, exaggerating their pronunciation, and varying their sentence structure. Finally, one of the users, the father, rephrases the request to “Define South East Asian countries” which the voice assistant can understand.

Even among people of the same cultural background, it’s common to have communication breakdowns or failure to exchange information. Perceptual and language differences, noise, inattention, and information overload are just some of the contributing factors leading to communication breakdowns.

Communication breakdowns are normal; we expect them to happen between users and voice assistants. The challenge lies in communication repair.

There are three response types that signal communication breakdowns with voice assistants:

Acting on misunderstanding

Mother: Alexa. Who is South East Asia? What country is East Asia?

Alexa: There are no UN recognized countries in Eastern Europe.

Acting on a misunderstanding suggests that the voice assistant was able to pick up some keywords but it was not able to match the user’s intent correctly. Although it was a mismatch, the failed user input could be kept in the voice assistant’s short-term memory in anticipation for the user’s next attempt. The voice assistant could also try to diagnose what could be the problem. For example:

Mother: Alexa. Who is South East Asia? What country is South East Asia?

Alexa: There are no UN recognized countries in Eastern Europe. Did I answer your question?

Mother: No.

Alexa: Try asking me again. One question at a time please.

Mother: What country is in South East Asia?

Neutral clarification responses

Mother: Alexa! (pause) Myanmar is South or South East Asian?

Alexa: Sorry, I’m not sure.

Neutral clarification responses could either mean that the whole request of the user is unintelligible or it did not match any predefined user intents. In case of no matches, the voice assistant could make use of the intelligible keywords to help the user come up with a better request. For example:

Mother: Alexa! (pause) Myanmar is South or South East Asian?

Alexa: Sorry, I’m not sure. Try asking me facts about South East Asia.

Mother: Tell me facts about South East Asia.

Specific clarification responses

Mother: Set an alarm for 6:30.

Alexa: Is that 6:30 in the morning or evening?

Specific clarification responses are good because the voice assistant is making the effort to repair communication. The voice assistant was able to recognize the intention of the user and it knows that it is missing a certain information (e.g. morning or evening).

In the recommendations above, the voice assistant is actively making the effort to repair communication by teaching the user how to say their requests better. In other words, the voice assistant is performing discourse scaffolding.

Discourse scaffolding is a method of teaching communication strategies with the goal of transferring skills and responsibility to the learner. An example of discourse scaffolding is when a child says “milk” and a parent responds with “you wanna drink milk” to demonstrate an expanded request.

In an ethnographic study, Beneteau and colleagues observed ten families interact with the Amazon Echo Dot for a month. They observed that families often cooperate to create successful requests to the voice assistant. One way is by teaching each other how to phrase their requests. Beneteau and colleagues identified six types of discourse scaffolding strategies that families use. Voice assistants could leverage these strategies when attempting to repair communication and when giving advice to the user.

Direct instruction

Direct instruction is when a user tells another what they should say or why something has happened. For example, when the voice assistant fails to detect any keywords, it can say:

Sorry, I wasn’t able to hear what you said. Can you say it a bit louder please?

Modeling

Modeling is when a user says an utterance to demonstrate a desired response. For example, voice assistants can say the following to suggest “facts” as a keyword:

Try asking: Facts about South East Asia.

Redirection

Redirection is when a user refocuses the conversation on a desired topic. If the voice assistant detects two questions, it can either inform the user to ask one question at a time, or return a specific clarification response to make them choose between two topics.

I’m not quite sure how to help you with that. Please ask one question at a time.

Expansion

Expansion is when a user adds on to something said by another user. If the voice assistant cannot understand the user’s intent because it’s too short, it should encourage the user to expand their request. For example:

Sorry, I’m not sure. Can you say it again with a full sentence?

Contraction

Contraction is when a user shortens or summarizes an utterance. If the voice assistant cannot understand the user’s intent because it’s too long, consider asking the user to contract their request. For example:

Sorry, I’m not sure. Can you try asking me again using only the keywords?

Consulting

Consulting is when a user asks another for assistance. In home settings, voice assistants are usually invoked in front of other people. For example, family members might be encouraged to help if the voice assistant would ask:

I’m not quite sure what you mean. Is there anyone else in the room who can help clarify?

One of the general design guidelines for voice user interfaces is to learn from interpersonal communication to design interactions as intuitively as possible. By observing how people interact with voice assistants and applying speech language pathology, researchers were able to arrive at specific design recommendations on how we can build voice assistants better.

Most developers and designers work with software development frameworks like Alexa Skills and Dialogflow. Aside from the transcript of the user’s original request, current frameworks provide limited feedback as to why the matching failed. That is, when the natural language understanding engine fails to identify the user’s request, there are no meta-information about the cause of the error.

At the very least, voice assistant frameworks should tell developers and designers if the error was due to no words detected, too few keywords recognized, or too much keywords detected. Such meta-information are not difficult to generate so we can expect them to be rolled-out in the near future.

Website Design & SEO Delray Beach by DBL07.co

Delray Beach SEO