The presence of Speech Synthesis Markup Language (SSML) tags within the voice text response is crucial to creating an impactful conversation for the end-user interacting with the skill. Such tags are input within the necessary Response components while designing a flow to render the effect of desired emotions in the responses intended for the end-user. In other words, SSML tags when made use of in an output speech response allows controlling how Alexa generates the speech. It is to be noted that SSML tags are used most often to add pauses and other speech effects.
This user-centric guide comprehensively outlines the overall function, purpose, and relevance of SSML tags when incorporated within an interaction flow. To help you get started here’s waking you through the purpose SSML tags serve, supported by most common scenarios when it is implemented in the prescribed format within a voice response in the VOGO Voice’s interaction builder platform.
- VOGO Voice account: https://www.vogovoice.com/
- Access rights to the Interaction builder platform
- Knowledge in Speech Synthesis Markup Language
The Core Purpose of SSML Tags
“SSML is a markup language that provides a standard way to mark up text for the generation of synthetic speech. The Alexa Skills Kit supports a subset of the tags defined in the SSML specification. The specific tags supported are listed in Supported SSML Tags.”
When a skill returns a response to a user’s request, what happens in the backend is that Alexa converts the text rendered in the Response component to speech. Alexa is driven to automatically handle normal punctuation, such as pausing after a period or modulating a sentence ending in a question mark with a suitable pitch.
Yet, in certain cases, it may arise that the Skill Designer may want additional control over how Alexa generates the speech from the text rendered in the Response component. For example, the designer may want a longer pause, or a string of digits read back as a standard telephone number. To gain this type of control, the Alexa Skills Kit provides Speech Synthesis Markup Language (SSML) support.
SSML Syntax Use Cases
Below are 5 select uses cases to impart the requisite knowledge and illustrate some of the most commonly used scenarios wherein SSML tags can be employed to deliver the intended effect via a tailored speech response.
Sample Use Case 1: Applying a whispering effect to the speech especially when wanting to provide a hint or cue in a striking yet subtlest manner when the user is hesitating to respond is depicted below.
Say: What is 12 plus 10?
<amazon:effect name="whispered">Hey, I can help you if you don't Know the answer. Would you like that?
Sample Use Case 2: Customizing a speech response with an exciting emotion embedded in its tone with a fitting SSML tag as shown below represents congratulating the user on completing the specified game level.
Say: <amazon:emotion name="excited" intensity="medium">
Congratulations! You have successfully completed this level
Sample Use Case 3: Enhancing background effects with suitable audio snippets is made possible with the use of SSML tags, here being the barking sound of a dog integrated into the speech.
Say: This is the audio of a dog barking.
Sample Use Case 4: By using an appropriate SSML tag, you can apply a pause/break in the speech, an example of which is demonstrated below as 3 seconds.
Say: Welcome to the demo skill. <break time="3s"/> Are you ready to continue?
Sample Use Case 5: Represented below is a scenario of applying the needful SSML syntax to phonemically furnish the selected letters within a voice text response to render the correct pronunciation.
Say: Check back again for up to the <phoneme alphabet="ipa" ph='mɪnɪt'>minute</phoneme> news.
For more information on how to use, explore more examples, and gain an in-depth understanding of SSML syntax refer to https://developer.amazon.com/en-US/docs/alexa/custom-skills/speech-synthesis-markup-language-ssml-reference.html