Unicode and Character Conversion

Symbian OS was designed from the beginning to support worldwide locales. Psion had been unable to address considerable interest from the Far East for its SIBO models, because characters were represented within SIBO by every conceivable means - C strings, arrays, integers, assembler symbols - all of which could be used to contain other data. It was not commercially feasible to modify SIBO to support 16-bit characters. Symbian OS would not repeat the same mistake, so, although Symbian OS up to and including v5 used narrow 8-bit characters, its architects planned for 16-bit characters from the beginning. Note

With 16-bit characters, character values in the range 0-65 535 are allowed. These are mapped onto the Unicode code page whose 256 lowest characters match those of the ISO Latin 1 code page, which in turn is (with a few exceptions) the same as Windows Latin1. Unicode is big enough to provide code points for most of the characters used by practically all the world's living languages - and many of its dead languages too. In addition to providing code points for character glyphs in languages such as Chinese, Japanese, Korean, Thai, Hebrew, and Arabic, Unicode provides standards to support these languages' typesetting conventions, including cues for left-to-right and right-to-left changeovers.

The strategy for the technical foundation of Symbian OS was to define classes to represent text and to use them everywhere text was required, and nowhere else. By simply changing a compiler flag, it would then be possible to rebuild Symbian OS with 16-bit or 8-bit characters. Compiled code would be incompatible and so would application data files, but there would be virtually no source code changes.

So symbols were assigned for all the critical text classes such as TText and TDesc. In the narrow build, these were equated to TText8and TDesc8. In the wide build, they were equated to TTextl6 and TDesci6. The setting of the _unicode macro would be used to control which was which.

Other classes and macros included in this scheme are as follows:

So symbols were assigned for all the critical text classes such as TText and TDesc. In the narrow build, these were equated to TText8and TDesc8. In the wide build, they were equated to TTextl6 and TDesci6. The setting of the _unicode macro would be used to control which was which.

Other classes and macros included in this scheme are as follows:

Symbol

Narrow

Wide

Meaning

TText

TText8

TText16

Character

L

_L8

_L16

Literal (old-style)

LIT

LIT8

LIT16

Literal (new-style)

TDesc

TDesc8

TDescl6

Nonmodifiable descriptor

TDes

TDes8

TDes16

Modifiable descriptor

TPtrc

TPtrc8

TPtrcl6

Nonmodifiable pointer descriptor

TPtr

TPtr8

TPtr16

Modifiable pointer descriptor

TBufc

TBufc8

TBufc16

Nonmodifiable buffer descriptor

TBuf

TBuf8

TBuf16

Modifiable buffer descriptor

HBufc

HBufc8

HBufc16

Heap descriptor

TLex

TLex8

TLex16

Lexer

You can find these definitions throughout e32def.h, e32std.h, e32des8.h,and e32desl6.h. The narrow classes are present even on wide builds.

You can find these definitions throughout e32def.h, e32std.h, e32des8.h,and e32desl6.h. The narrow classes are present even on wide builds.

Unless you are writing for the Psion PDAs that run Symbian OS v5, you don't have to worry about narrow builds any more. But the idioms that were created to help source compatibility between narrow and wide builds are still worth following, if only to make your code more readable:

■ Code all your general-purpose text objects to use the neutral variants in the first column of the table above, for example, TText or TDesc.

■ Where you are using descriptors to refer not to text but to binary data, code specifically 8-bit classes, for example, TDesc8 - and use Tint8or TUint8 rather than TText8, for an individual byte.

There are a few tricky cases. Some types of data are awkward, especially data in communications protocols that looks like a string, but is actually binary data that should always use 8-bit characters. Examples include the HELO used to log on to SMTP or the AT commands used for modems.

Another big issue with communications is that much of the text sent between a Symbian OS phone and the outside world, for example, in e-mails, will be encoded in non-Unicode character sets. For this situation, a dedicated library (charconv.h, charconv.lib) is provided to enable conversion, in both directions, between Unicode and other character sets. It also provides functionality for converting, again in both directions, between ordinary 2-byte Unicode and its two transformation formats UTF-7 and UTF-8 (these are ASCII-compatible encodings of Unicode that use sequences of multiple bytes to encode non-ASCII characters). The library can be extended with plug-ins to support whatever character sets are appropriate to the device.

+1 0

Average user rating: 5 stars out of 1 votes

Post a comment

  • Receive news updates via email from this site