Non-UTF-8 characters needed for some languages

From DocWiki

Jump to: navigation, search

Non-UTF-8 characters needed for some languages

Problem Summary When using TTS for some languages, such as French or Spanish, characters are needed that are not in the UTF-8 character set. For example: ç,é or ñ if not handled correctly, will cause the TTS server to generate an error message, and the prompt will not be heard.
Error Message None.
Possible Cause By default, most TTS engines expect to receive characters only in the UTF-8 character set, which are generally only those characters that are in the ASCII character set. Some characters in languages like French or Spanish do not belong to that set, such as ç,é or ñ. When such characters are required the script writer must specify the appropriate encoding explicitly. For most languages, use the ISO-8859-1 encoding, unless otherwise specified. TTS vendors document the different ISO encoding required by their TTS engines for different languages.
Recommended Action

In order to specify such encoding, you must use SSML markup; therefore, you cannot use plain text. The following example shows how character coding can be specified:

<?xml version="1.0" encoding="ISO-8859-1"?>
<speak>
Buonas tardes. Le estoy hablando en español.
</speak>

This can be provided in a file or can be specified in a TTS text expression in a Workflow step. When using the Expression Editor in the Workflow Editor, certain characters must be "escaped" in order to be evaluated properly. Enter the same text as follows when specified explicitly in a text expression for TTS:

u"<?xml version=\"1.0\" encoding=\"ISO-8859-1\"?>
<speak>
Buonas tardes. Le estoy hablando en español.
</speak>

Note the use of the u"<text string>" syntax. That allows the use of the "\" escape character within the string to escape the quote (") characters. You do not need to escape the non-UTF-8 characters such as theñ.

Release Release 7.0(1)
Associated CDETS # None.

Rating: 0.0/5 (0 votes cast)

Personal tools