Bixby Developer Center

Guides

Using SSML

Bixby's dialog can include a subset of tags from Speech Synthesis Markup Language (SSML), a W3C standard for enriching text-to-speech.

To use SSML, you must observe the following rules:

  • SSML is only valid inside the speech key in dialog templates.
  • Speech must start with the <speak> tag and end with the </speak> closing tag. If these tags are not present, the speech will not be recognized as containing SSML.
  • The speech string must be enclosed in quote marks, and quotes inside the string must be escaped with a \ character.
template ("The French word for cat is 'chat.'") {
speech ("<speak>The French word for cat is <lang xml:lang=\"fr-FR\">chat</lang>.</speak>")
}

Currently, Bixby supports the following SSML tags:

  • <lang>: specify the natural language of the enclosed content
  • <audio>: embed audio clips via URL

Bixby Voices

Bixby supports a nonstandard voice= attribute for the <lang> tag that specifies the name or server profile of a Bixby voice to read the enclosed text in. This takes the place of the standard SSML <voice name=""> tag.

Voice names can be used in the <lang> tag to specify a voice. This is optional, but can aid Bixby's pronunciation.

speech ("<speak>The French word for cat is <lang xml:lang=\"fr-FR\" voice=\"M01\">chat</lang>.</speak>")

Use the name in the Voice column to specify a voice appropriate to the language and locale. Alternatively, you can specify the server profile in the "Profile" column. You must specify the locale in the lang attribute.

VoiceLocaleProfile
윤정ko-KRF01
우호ko-KRM01
유리ko-KRF04
두리ko-KRF05
Stephanieen-USF03
Johnen-USM02
Lisaen-USF05
Juliaen-USF04
张喆(Zangzhe)zh-CNF02
王聪(Wangcong)zh-CNM02
Amyen-GBF02
Chrisen-GBM02
Mariede-DEF01
Jande-DEM01
Sandraes-ESF01
Davides-ESM01
Louisefr-FRF01
Valentinfr-FRM01
Angelait-ITF01
Andreait-ITM01

Audio Clips

The <audio> SSML tag allows you to include an audio clip that Bixby plays as part of the dialog. The clip is played in serial with any other speech (that is, the clip is played where the <audio> tag occurs, not simultaneously as background audio).

  • The audio clips must match the following specifications:
    • WAV format
    • Mono PCM encoding
    • 16-bit (little endian)
    • 24 KHz sample rate
  • Clips must be specified with an HTTPS URL, hosted on an internet-accessible server with a valid SSL certificate.
  • Clips must be less than 5 MB and less than 120 seconds in duration
  • A single response can have up to a maximum of 5 clips

If there is an error in fetching the audio clip from the specified link, Bixby will not play the rest of the dialog

You can convert an existing audio sample to the proper format using FFmpeg with the following options:

ffmpeg -f s16le -ar 24000 -ac 1 -i input_file destFile.wav

Example

template ("Now Bixby can play animal sounds! Listen to this one.") {
speech ("<speak>Now Bixby can play animal sounds! <audio src=\"http://example.com/animal.wav\"></audio></speak>")
}

The <audio> tag only supports the src attribute.