Unicode routines

Allegro can manipulate and display text using any character values from 0 right up to 2^32-1 (although the current implementation of the grabber can only create fonts using characters up to 2^16-1). You can choose between a number of different text encoding formats, which controls how strings are stored and how Allegro interprets strings that you pass to it. This setting affects all aspects of the system: whenever you see a function that returns a char * type, or that takes a char * as an argument, that text will be in whatever format you have told Allegro to use.

By default, Allegro uses UTF-8 encoded text (U_UTF8). This is a variable-width format, where characters can occupy anywhere from one to four bytes. The nice thing about it is that characters ranging from 0-127 are encoded directly as themselves, so UTF-8 is upwardly compatible with 7-bit ASCII ("Hello, World!" means the same thing regardless of whether you interpret it as ASCII or UTF-8 data). Any character values above 128, such as accented vowels, the UK currency symbol, and Arabic or Chinese characters, will be encoded as a sequence of two or more bytes, each in the range 128-255. This means you will never get what looks like a 7-bit ASCII character as part of the encoding of a different character value, which makes it very easy to manipulate UTF-8 strings.

There are a few editing programs that understand UTF-8 format text files. Alternatively, you can write your strings in plain ASCII or 16-bit Unicode formats, and then use the Allegro textconv program to convert them into UTF-8.

If you prefer to use some other text format, you can set Allegro to work with normal 8-bit ASCII (U_ASCII), or 16-bit Unicode (U_UNICODE) instead, or you can provide some handler functions to make it support whatever other text encoding you like (for example it would be easy to add support for 32 bit UCS-4 characters, or the Chinese GB-code format).

There is some limited support for alternative 8-bit codepages, via the U_ASCII_CP mode. This is very slow, so you shouldn't use it for serious work, but it can be handy as an easy way to convert text between different codepages. By default the U_ASCII_CP mode is set up to reduce text to a clean 7-bit ASCII format, trying to replace any accented vowels with their simpler equivalents (this is used by the allegro_message() function when it needs to print an error report onto a text mode DOS screen). If you want to work with other codepages, you can do this by passing a character mapping table to the set_ucodepage() function.

Note that you can use the Unicode routines before you call install_allegro() or allegro_init(). If you want to work in a text mode other than UTF-8, it is best to set it with set_uformat() just before you call these.