Bixby is designed to be a reliable, trustworthy, and friendly assistant. Whether users are talking, tapping, or pressing to interact with Bixby, it will feel like a unified experience. Bixby's dialog should follow these principles:
Speak naturally
Using everyday language and an appropriate tone of voice via SSML, Bixby sounds like your companion. By encouraging conversation, people interact with Bixby by communicating naturally.
Be helpful
Bixby helps decrease the time it takes to get a person to their goal. Through quick answers, guided questions, and predictive abilities, Bixby puts the user's needs first.
Adapt to the context
Bixby communicates seamlessly between devices, whether or not it has a screen and regardless of where a person is using the device. This ability to work across devices means that people can move in and out of hands-free mode during a single conversation.
Keep it simple
Bixby gives users the information they want in the least amount of words possible. With reduced cognitive load and using simple ways to navigate, people are empowered to interact with Bixby.
These principles guide designers and developers in building voice-first Bixby experiences. While there are many ways a conversation can occur across various devices, Bixby streamlines these experiences by using dialog patterns. Since users might not see an interface while interacting with Bixby, these principles and patterns are meant to deliver an enhanced voice experience to users.
Just as design systems are used to maintain visual interfaces, this design guide aims to apply that same framework to voice interfaces. By demonstrating common user Bixby interactions, designers and developers can reference this document to do the following:
The voice-first design patterns cover Bixby's experiences along two axes:
Since conversations are a two-way interaction, many patterns link to others in order to continue the back-and-forth chain of what Bixby needs to achieve the user's goal.
Users talk to Bixby when they need help to do something. As an assistant, Bixby is there to complete the goals that a user sets out to do, which could mean making something new like setting an alarm, or finding the latest weather forecast.
For straightforward tasks that users need immediately, Simple Creation is best. When a user is taking an action that only has a couple choices (on or off, call or don't call), the Simple Action framework works best. Instant Reaction works similarly in cases where dialog is unnecessary.
When users need to find information, Look-Up gets them to their results. If there are many results, though, Bixby can either recommend one as a Suggestion or use List Reading to inform the user of several options. There are many ways to read out a list, but when there are different categories of information, using Summary and Fluid Reading could be best.
The user creates an object such as an alarm or a timer on a device.
User: "Start a timer for 5 minutes"
Bixby: I started a timer for 5 minutes.
User: "Remind me to work out tomorrow morning at 9"
Bixby: I'll remind you to work out at 9 AM tomorrow.
User: "Set up lunch with Mary at 2 next Thursday"
Bixby: I added Lunch with Mary at 2 PM next Thursday, March 17.
One of two actions occurs (yes/no, on/off), typically interacting with another device that provides real-world feedback, such as lights.
User: "Turn on the lights"
Bixby: Turning on kitchen lights.
User: "Call Tim"
Bixby: Calling Tim Smith.
Similar to Simple Action, but in cases where no dialog is necessary because it would interrupt an experience, or the completed action clearly occurred.
User: "Increase volume"
Bixby: (audible noise or visual indicator volume increased)
User: "Turn on the TV"
Bixby: (TV turns on)
User: "Turn on the light"
Bixby: (Light turns on)
User: "Open YouTube"
Bixby: (opens the YouTube app)
Similar to a query result where more than one possible option or follow-up action can occur.
User: "What's the weather like today?"
Bixby: It's partly cloudy and 67 degrees right now.
User: "When is Christmas?"
Bixby: Christmas day is Tuesday, December 25.
Bixby provides a suggestion using personalization and learning to narrow down choices after a search.
User: "Find gas stations near me"
Bixby: The closest gas station is Shell San Jose Downtown. It's 0.5 miles away and has a 5-star rating. Do you want this one?
User: "Find chicken recipes"
Bixby: This oven-roasted chicken takes 30 minutes to prepare and serves 4. Do you want to choose this?
User: "Play music"
Bixby: The most popular music app is Spotify. Listen here?
User: "Sure"
Bixby provides a list without asking the user to take action.
(not reading) User: "Show shopping list"
Bixby: [shows view]
(reading) User: "Check reminders"
Bixby: Go to DMV tomorrow at 9 AM and file tax returns Friday at 9 AM.
User goal: Get relevant pieces of information about the topic asked.
Requirements:
Voice: People don't like hearing long lists, so here is where SSML can be used to increase the reading speed. Lists can be read in a variety of ways depending on how many items there are, similar to disambiguation. The differences in list reading depend on the data types being read and how detailed the user needs a response.
Bixby should read up to 3 items within the same category before giving the option to hear more. This reduces the cognitive load on the user.
Bixby: At 9 AM, interview. At 11 AM, call Rachel. At 1 PM, pick up lunch. Want to hear more?
If there are many items in a list, it could help to give a high-level summary before introducing more options:
User: "Check notifications" Bixby: 10 notifications from Slack, Twitter, CNN, and two others. Do you want to hear Slack notifications?
See more on Summary and Fluid Reading.
Screen: Provide full details, but not everything needs to be read out loud. Things will be highlighted as they are read for users to follow along.
Situational:
Bixby reads a summary of relevant information, then prompts to provide organized information at the user's pace.
User: "What is my schedule today?"
Bixby: You have 16 events from 9 AM to 7 PM. At 9 AM, interview. At 11 AM, call Rachel. Want to hear more?
Bixby acts as an assistant during a conversation. This means that Bixby seeks to understand what the user is asking for and then acts on it in a way that is helpful, contextual, and natural.
If the user is asking to do something difficult to reverse, like sending a text or payment, a Confirmation is the way to go. Sometimes to complete something, Bixby will also need to ask for Missing Information or prompt the user to give more details.
Bixby can use Disambiguation to help users choose between similar options. When Bixby hears something that doesn't quite make sense, a General Reprompt can get the conversation back on track. Or in cases where users aren't sure how to answer, Bixby can use No Response dialog to continue to the conversation.
Bixby prompts the user to make sure they are certain about a sensitive request, such as sending a message or making a purchase.
User: "Send 'hey what's up?' to Tim"
Bixby: "Hey what's up?". Ready to send?
User: "Order a small iced tea from Flower Cafe"
Bixby: That will be $3. Pay with your usual credit card?
User: "Sure"
Bixby: Great. Your order will be ready in 10 minutes.
The user answers Bixby's questions, but leaves out information required for Bixby to move to the next step.
User: "Start timer"
Bixby: How long?
User: "Add an event for Thursday"
Bixby: What time?
If Bixby doesn't understand a user's intent or utterance, Bixby will rephrase the question to continue the conversation.
User: "Play music"
Bixby: Do you want to play Spotify?
User: [something Bixby doesn't successfully parse]
Bixby: I didn't catch that. Where do you want to listen?
Bixby provides options that the user can choose from, especially when the items are named with similar-sounding or identical words, or must be chosen from a set of closely-related items such as household lights or coffee shop locations.
User: "Call Tim"
Bixby: Do you want to call Tim Smith?
User: "Call Tim"
Bixby: Do you want to call Tim Smith or Tim Andrews?
User: "Call Tim"
Bixby: Do you want to call Tim Smith, Tim Andrews, Tim Wong, or hear more?
User: "When is my lunch meeting?"
Bixby: Do you mean the lunch meeting on Thursday or Friday?
User goal: Complete an action after ensuring that Bixby is choosing the right option.
Requirements:
Voice: Disambiguation is how Bixby helps users choose the correct option while staying conversational. The designer and developer can disambiguate in many ways, depending on how many options are available.
If there are two options, it is usually best to use a "this or that" approach:
Do you want to [action] [object 1] or [object 2]? If the objective is clear, Bixby could just present the two options: [object 1] or [object 2]?
When there are three or more, Bixby presents items as a list, typically grouping them by threes before moving on to the next set:
Do you want to [action] [object 1], [object 2], [object 3] or hear more?
If Bixby functionality is available to make a guess (similar to a suggestion) as to which option the user wants, perhaps based on previous behavior, then do so.
Do you want to [action] [prediction 1]?
Situational: If a user says "no" to any of the disambiguation options, return to the original prompt. If Bixby doesn't understand what the user is saying, see No Response, below. It is up to the developer to choose how many times Bixby reprompts before ending the conversation.
Provides options after a General Reprompt if the user says something that Bixby can't match to an intent.
User: "Play music"
Bixby: Do you want to use Spotify, TuneIn Radio, or iHeartRadio?
User: [something Bixby doesn't successfully parse]
Bixby: There's Spotify, TuneIn Radio, or iHeart Radio. Which do you want?
User: [something else Bixby doesn't successfully parse]
Bixby: You can say "first one" for Spotify, "second one" for TuneIn Radio, and so on. What would you like?
User goal: Move along with Bixby in the intended direction or topic.
Requirements:
Voice: There are different ways to help a user achieve their goal when Bixby cannot match utterances to intents.
There might be different responses in scenarios where a user stays silent after a question rather than saying something Bixby doesn't understand. The system default is to have Bixby respond with the same question as it would in General Reprompt, but you can supply a variation for cases when a user is silent using the no-response-message
key in the input view.
If Bixby recognizes a phrase at a certain confidence level (for instance, it is 75% sure what the user said) then Bixby can use partial matching.
Bixby: Do you want to use Spotify, TuneIn Radio, or iHeartRadio? User: "[something] radio" Bixby: Did you mean iHeart Radio?
If the confidence level is not available, then Bixby can provide follow-up questions similar to general reprompt, but with more targeted questioning. One way is by repeating the initial prompt with options:
Bixby: I found [option 1], [option 2], [option 3]. Which do you want?
Another way to give reprompts is having Bixby say sample utterances to give the user a clearer direction:
Bixby: You can say "first" for [option 1], "second" for [option 2], and so on. What would you like?
Use the currentRepromptCount()
Expression Language function with conditionals to provide more explicit guidance on successive reprompts.
Screen: Keep the same options available on the screen with options for a user to exit. If Bixby verbally states other ways of phrasing things, ensure that the options are written the same way on the screen to avoid confusing the user.
Situational: Do not prompt more than three times total (one general reprompt and two after). Once these turns are reached, it is acceptable to end the experience. It is up to the developer and designer to phrase reprompts in a way that does not blame the user, provides examples of what users can say, and gives users opportunities to correct themselves.