Overview
The following section will give an overview of the research conducted in this project. Current user behaviour with voice devices were identified. The biggest problems users face when using voice devices, specifically with purchasing via voice, were also explored. A potential solution to these problems, using conversational UX design practiced will be finally discussed.
Market research
1. User Adoption of Voice Technology
Source: Voicevbot.ai
With over 250 million active monthly users in the United States alone, voice technology, either with smart phones such as Siri or smart home devices such as Amazon Alexa, are experiencing significant growth rates amongst users. Users value how this technology allows them to perform tasks on previous spent time. Tasks such as checking the weather for the day or getting the directions to their next meeting, used required undivided attention to type the specific information into certain devices. Now, with the use of technology such as machine learning and natural language processing, users can access this information by simply asking their voice assistants, without any interruption of their current activity.
Source: Voicebot.ai
The tasks users perform are simple and straightforward. Research from voicebot.ai, a leading research institute in artificial intelligence and voice technology, suggests that tasks such as asking questions and playing music are the most common. More complex tasks such as purchasing with voice devices are the lowest of use cases. Research suggests that purchasing via a voice device is not often done for two reasons; Complexity of sub tasks leading to users errors and privacy issues.
2. Purchasing with Voice
Source: Narvar.
The purchases that users of voice devices are engaging with are that of little nominal value (under $100) and from businesses they have a purchase history with. However, purchasing via voice in general is quite uncommon. The issues users face can be reduce to do two fundamental issues. Firstly, users feel that privacy is an issue when purchasing via voice, especially from companies that they have no previous history with. Secondly, overall, the purchasing experience via voice is frustrating for users. The available technology cannot provide the vast information users need to make a purchase. Users end up frustrated and abort the task via voice and continue using their mobile or desktop devices to complete the purchase.
The research identifies two key issues that future conversational commerce products need to solve in order to be successful. First, users need to feel secure when purchasing via voice. They need to already have an existing, trustworthy relationship with businesses before committing to purchasing via this new technology. Secondly, the user experience for purchasing currently needs improving. Users are not getting the information they require to commit to a purchase. Instead, they are receiving an impersonal experience that is filled with errors such as not getting the correct information to a question or misinterpreting a question made by a user. Designers who aim to create meaningful and effective voice experiences require an understanding of the fundamentals of Conversational UX Design, which will now be discussed.
Conversational UX Design
Studies in Conversational UX Design. Source: Springer
Designing for Conversation
To create amazing products and experiences with voice, designers must understand the structure of a conversation, and the content and information both a user and artificial agent will need to successfully navigate through each section of a conversation. IBM are one of the leaders in the field of conversational UX design and have created a set of principles and guidelines that VUX designers should follow when designing new voice applications. This set of principles are called the Natural Conversation Framework (NCF.). There are 4 categories to the NCF. 1. Interaction Model. 2. Common Activity Modules. 3. Navigation Method. 4. Sequence metrics. The first three will be discussed in this section and the fourth will be elaborated on in the ‘test’ section of the project
Interaction Model
All conversations are built upon sequences, which are essentially the utterances people say in a natural back and forth conversation. What makes conversations complex however, is that all sequences can be expanded upon. Sequence expansions are tools people use to get clarification on a certain topic within a conversation. There are 5 types of sequence expansions; Screening, Eliciting, Repeating, Paraphrasing, and Closing. A voice agent should be designed to both initiate and accommodate the five types of sequence expansions. Below is a screenshot of a user conversation with IBM’s ‘Alma’, an artificial conversational agent.
Sequence Expansion Types. Source: Conversational UX Design: A Practitioner’s Guide to the Natural Conversation Framework
2. Common Activity Modules
The common activity modules are a set of conversational patterns that a conversational product should be able to execute, in order to be conversationally competent . There are three classes of patterns that a conversational agent should adhere to for delivering the main content of a voice application. See screenshot of the pattern classes and related sub-patterns below.
Common Activity Patterns. Source: Conversational UX Design: A Practitioner’s Guide to the Natural Conversation Framework
3. Navigation Method
A user should have the ability to navigate through a conversation using sequence expansions. These navigation patterns are the conversational equivalent of graphical user interfaces that allow users to drop and drag files for example. There are six types of sequences that allow for a user and artificial agent to navigate a conversation:
1. Capability Check: Unlike a graphical interface, a user has no option to explore or investigate what a conversation can do. Therefore, a voice product should always provide a description of the systems functionality to narrow and guide the users scope of topics of conversation.
2. Repeats: Repeats in a conversational interface hold the same purpose of ‘going back’ on a graphical interface. This method allows the user to get clarification on an utterance they did not understand.
3. Paraphrase: A paraphrase expansion allows a user to receive help on a particular subject or topic with a conversational agent. This allows the conversational agent to reduce a previous complex utterance into something simpler and easier to understand. Paraphrasing is similar to that of a ‘tool tip’ in a graphical interface.
4. Close Sequence: A user should always have the ability to close a sequence when they have received the relevant information to progress further in the conversation.
5. Abort Sequence: This sequence can be seen as the ‘escape’ button of the conversational interaction model. If a user does not want any further information about a particular sequence, they should have the ability to close it and move to another topic.
6. Close Conversation: When a user feels that the conversation is completed, it should be very easy to end their engagement with the conversational agent. This strategy mirrors the same process as the ‘close sequence’ method of navigation but it serves at a more macro level, as it closes the conversation completely.
Research conclusions
Voice technology, and particularly conversational commerce are bleeding edge fields in the UX community. With advancements and technologies such as machine learning and neural networks, and the exponential user adoption rates, voice devices have the potential to become pivotal in personal ubiquitous computing.
Having a clear understanding of the biggest problems and obstacles users currently face, creates clarity and insight for the biggest design issues that must be resolved. With design issues apparent, one can begin to create a solution by applying the learnings from the Natural Conversation Framework.