ServiceGrid Article - Character Encoding

From DocWiki

Jump to: navigation, search

Contents

Overview

Cisco ServiceGrid uses UTF-8 as character encoding internally. This encoding supports most of the international character sets. For uploading and downloading data, the encoding can be set as additional option.

Database

The Cisco ServiceGrid database uses UTF-8 and supports all international character sets. When uploading or downloading data, the encoding can be set as an option.

Char encoding.jpg


Encoding options for data exchange with Cisco ServiceGrid

To ensure that the characters are encoded correctly when data is uploaded to the Cisco ServiceGrid database or downloaded into files, it is important to choose the encoding format that is supported by the data source or data target.

The Cisco ServiceGrid database uses UTF-8 encoding and supports all international character sets. When uploading or downloading data, the encoding can be set as an option.

Choose the encoding as per the following rules:

  • UTF-8: For tools or format used to generate data for uploading or reading data from download support UTF-8 encoding. UTF-8 is the encoding that supports nearly all international character sets.
  • ISO 8859-1: For all Western European languages including English, German, Italian, Spanish, Portuguese, and French.
  • ISO 8859-2: For Eastern European languages including Polish, Czech, Slovakian, Serbian, and so on.


ISO 8859 1

ISO 8859-1 more formally cited as ISO/IEC 8859-1 or less formally as Latin-1, is part 1 of ISO/IEC 8859 (a standard character encoding of the Latin alphabet). It was originally developed by the ISO, but later jointly maintained by the ISO and the IEC. The standard, when supplemented with additional character assignments (in the C1 range between hexadecimal codes 0x80 and 0x9F), is the basis of two widely-used character maps known as ISO 8859-1 (note the extra hyphen) and Windows-1252.

ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character encoding is used throughout America, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard Romanization of East-Asian languages.

Each character is encoded as a single eight-bit code value. These code values can be used in almost any data interchange system to communicate in many European languages:

Afrikaans, Galician, Occitan, Albanian, German, Portuguese, Basque, Icelandic, Rhaeto-Romanic, Breton, Irish (new orthography), Scottish, Gaelic, Catalan, Italian, Spanish, Danish, Latin, Swahili, English, Luxembourgish, Swedish, Faroese, Norwegian, Walloon

More about ISO 8869-1 can be found in Wikipedia:

http://en.wikipedia.org/wiki/ISO-8859-1

ISO-8859-2

ISO 8859-2, more formally cited as ISO/IEC 8859-2 or less formally as Latin-2, is part 2 of ISO/IEC 8859, a standard character encoding defined by ISO. ISO 8859-2 encodes what it refers to as Latin alphabet no. 2, consisting of 191 characters from the Latin script, each encoded as a single 8-bit code value.

ISO 8859-2:1987, more commonly known by its preferred mime name of ISO 8859-2 (note extra hyphen), is the IANA charset name for this standard used together with the control codes from ISO/IEC 6429 for the C0 (0x00-0x1F) and C1 (0x80-0x9F) parts. Escape sequences (from ISO/IEC 6429 or ISO/IEC 2022) are not to be interpreted. This character set also has the aliases ISO 8859-2, latin2, l2 and csISOLatin2.

This encoding shares a lot of assignments with Windows-1250 but is not a strict subset of it (unlike the case with Windows-1252 and ISO 8859-1).

These code values can be used in almost any data interchange system to communicate in the following European languages:

Bosnian, Polish, Slovak, Croatian, Romanian, Slovenian, Czech, Serbian, Upper Sorbian, Hungarian, Serbo-Croatian, Lower Sorbian.

More about ISO-8869-2 can be found in Wikipedia:

http://en.wikipedia.org/wiki/ISO-8859-2

UTF-8

Unicode is an industry standard allowing computers to consistently represent and manipulate text expressed in any of the world's writing systems. Developed in tandem with the Universal Character Set standard and published in book form as The Unicode Standard, Unicode consists of a repertoire of about 100,000 characters, a set of code charts for visual reference, an encoding methodology and a set of standard character encodings, an enumeration of character properties such as upper and lower case, a set of reference data computer files, and rules for normalization, decomposition, collation, and rendering.

UTF-8 (8-bit UCS/Unicode Transformation Format) is a variable-length character encoding for Unicode. It is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is consistent with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed.

For encoding one character, UTF-8 uses one to four bytes (strictly, octets).

One byte is needed to encode the 128 US-ASCII characters (Unicode range U+0000 to U+007F).Two bytes are needed for Latin letters with diacritics and for characters from Greek, Cyrillic, Armenian, Hebrew, Arabic, Syrian and Thaana alphabets (Unicode range U+0080 to U+07FF). Three bytes are needed for the rest of the Basic Multilingual Plane (which contains virtually all characters in common use). Four bytes are needed for characters in the other planes of Unicode, which are rarely used in practice.

For more information about UTF 8, go to URL:

http://en.wikipedia.org/wiki/UTF-8


For a complete list of Cisco ServiceGrid Articles, go to the List of Articles page.

Rating: 0.0/5 (0 votes cast)

Personal tools