VOGO Voice – Knowledge Base

Response Component

Estimated reading time: 4 min

The multimedia capabilities (voice/computer/camera icons) are crucial elements built into the interaction builder platform to create a memorable voice experience for the user with regard to a custom skill.

This user-centric guide begins by walking you through the anatomy of the Response component and progresses towards detailing the sections of the voice and visual responses. It also presents an outline of how to make use of the template engine within a text response of the flow. The final section of the guide presents an overview of how to reference slot input values within an interaction.

Anatomy of the component

Voice Icon – Enables the voice capability to communicate the intended message via voice action to the user.
Computer Icon – Enable display capabilities to communicate visually to the user through display messages/images etc.
Camera Icon – Enables the multimedia capabilities for streaming audio and video support via its in-built features to play audio/video/Spotify, etc. 
Anchor (top left) – Connects the different components and holds each one of those cards to its respective places within an interaction flow.

Details on Voice Response

Say: The first response/voice action spoken to the user.
🖍 Note: Ensure the voice input data is appropriately spaced after each punctuation mark in order to avoid the punctuation mark being read along with the statement. 

Expect Response

Prompt and Reprompt are the two types of expecting responses implemented within the voice feature to help the user on how to proceed further based on the voice action conveyed through the first response/say.

Prompt and Reprompt are used for first and second replies initiated to the user in the absence of any voice input on the user’s part past the first response/say. In the event of a reprompt message not being defined, the Prompt message will automatically appear in the reprompt field. 

Expectations: To help the user continue with the interaction, the voice capability makes use of predefined expectations such as Yes/No. Besides presetting a pair of alternatives (Yes/No) there is also the provision to choose custom intents as well as predefined intents from the respective drop-down list. The voice action spoken by the user is matched with the preset/custom/predefined intents to help the user interact with the skill.

Strikes: In the absence of expectations from the user’s end, a First Strike and Second Strike are initiated. Strikes are tied to the expectations for the purpose of prompting a message across in the absence of an expected response on the user’s part. The context of the voice action spoken through the strikes could be the prompt or a custom message that begins with ‘Sorry,  I didn’t get that…’ If the user fails to respond after the first strike, a second strike is immediately initiated to help the user to continue with the interaction. If again, the user fails to respond after the second strike then the interaction is discontinued. 

Details on Visual Response (computer icon)

There are eight different display layout backgrounds to choose from while building the custom skill. The visual projection of text/data/image on the device is defined by the input of details provided in the select layout fields. 

Defining field values of display layout (computer icon).

Title: Provide a main heading/title related to the display text of the message.
Subtitle: Give a caption to be displayed as a subtitle. 
Display message: Furnish the text content of the message to be displayed.
Image URL (small): Provide the URL of the small size image representing the skill. The recommended value for the small size is SMALL (720 x 480 – width x height in pixels)
Image URL (large): Provide the URL of the large size image selected for the skill. The recommended value for the large size is X_LARGE (1920 x 1280 – width x height in pixels)
🖍 Note: If you do not provide a large or small size image, the system will use either one of the given images and provide it for both. 
Accessibility text: Provide an alternative text to the image (small/large).
Background Image URL: Supply the URL of the background image that will fill the screen.
Accessibility text. Provide an alternative text to the background image.
🖍 Note: For selected display layouts, some of the field values will be inactive.

Defining the additional field values for the List display layout 
When clicking on the List display layouts, the call out presents extra fields titled Items and Subtext.

Items: Represent the list or data pertaining to the skill. Also choose the data type in the field beside.
🖍 Note: Choose the type from the drop-down list depending on the data pulled into the Items field.
Subtext: Give an underlying description in support of the displayed list/data.

Details on Media Response (camera icon)

Supports: Audio and Video Streaming 

Type: Play Audio

URL: Provide URL to the audio file.

👍 Note: The audio file must be hosted at an Internet-accessible HTTPS endpoint on port 443. The supported formats for the audio file include AAC/MP4, MP3, PLS, M3U/M3U8, and HLS. Bitrates: 16kbps to 384 kbps

Title: Provide thetitle of the audio file. This will be shown if there is a display screen.
Subtitle: Provide asubtitle of the audio file. This will be shown if there is a display screen.
Image URL: Supply the URL of the image intended to show up on the screen.
Background Image URL: Give the URLof the watermark background image.
Pause Response: The voice response made use of when the user avails the pause function during his/her interaction with the skill.
Resume Response: The voice response spoken to the user when he/she avails the pickup/resume function within a skill. 
TTL After Pause: Time to live (TTL) after pause can be set from 1 min to the maximum extent being a day. The purpose is to cache past the pause within the set time limit. In other words it is a temporal limit for saving the video in the database and replaying it to the user in the event of the user pausing and then resuming the video.

👍 Note: For audio responses, the total response message needs to be less than 90 seconds. Also when intending this to be a video response card when it doesn’t support TV, it will run an automatic check and roll it back to audio and display. Furthermore, in the instance of it not supporting an audio or video file, it will automatically roll back to a normal voice and display. 

Type: Play Video
URL: Provide an HTTPS URL to video content.

👍 Note: The supported formats for video file include HLS, MPEG-TS, SmoothStreaming (SS), MP4, M4A & also AAC, Dolby, Dolby Digital Plus for those video files accompanying audio format. Does not support WAV file format.

Title: Title of the video file to be displayed on the display screen.
Subtitle: Subtitle of the video file that will show up on the display screen.

Was this article helpful?
Dislike 0
Previous: Link Account
Next: Delete Settings