Unicode routines

Allegro can manipulate and display text using any character values from 0 right up to 2^32-1 (although the current implementation of the grabber can only create fonts using characters up to 2^16-1). You can choose between a number of different text encoding formats, which controls how strings are stored and how Allegro interprets strings that you pass to it. This setting affects all aspects of the system: whenever you see a function that returns a char * type, or that takes a char * as an argument, that text will be in whatever format you have told Allegro to use.

By default, Allegro uses UTF-8 encoded text (U_UTF8). This is a variable-width format, where characters can occupy anywhere from one to four bytes. The nice thing about it is that characters ranging from 0-127 are encoded directly as themselves, so UTF-8 is upwardly compatible with 7-bit ASCII ("Hello, World!" means the same thing regardless of whether you interpret it as ASCII or UTF-8 data). Any character values above 128, such as accented vowels, the UK currency symbol, and Arabic or Chinese characters, will be encoded as a sequence of two or more bytes, each in the range 128-255. This means you will never get what looks like a 7-bit ASCII character as part of the encoding of a different character value, which makes it very easy to manipulate UTF-8 strings.

There are a few editing programs that understand UTF-8 format text files. Alternatively, you can write your strings in plain ASCII or 16-bit Unicode formats, and then use the Allegro textconv program to convert them into UTF-8.

If you prefer to use some other text format, you can set Allegro to work with normal 8-bit ASCII (U_ASCII), or 16-bit Unicode (U_UNICODE) instead, or you can provide some handler functions to make it support whatever other text encoding you like (for example it would be easy to add support for 32 bit UCS-4 characters, or the Chinese GB-code format).

There is some limited support for alternative 8-bit codepages, via the U_ASCII_CP mode. This is very slow, so you shouldn't use it for serious work, but it can be handy as an easy way to convert text between different codepages. By default the U_ASCII_CP mode is set up to reduce text to a clean 7-bit ASCII format, trying to replace any accented vowels with their simpler equivalents (this is used by the allegro_message() function when it needs to print an error report onto a text mode DOS screen). If you want to work with other codepages, you can do this by passing a character mapping table to the set_ucodepage() function.

Note that you can use the Unicode routines before you call install_allegro() or allegro_init(). If you want to work in a text mode other than UTF-8, it is best to set it with set_uformat() just before you call these.

set_uformat - Set the global current text encoding format.
get_uformat - Finds out what text encoding format is currently selected.
register_uformat - Installs handler functions for a new text encoding format.
set_ucodepage - Sets 8-bit to Unicode conversion tables.
need_uconvert - Tells if a string requires encoding conversion.
uconvert_size - Number of bytes needed to store a string after conversion.
do_uconvert - Converts a string to another encoding format.
uconvert - Hih level string encoding conversion wrapper.
uconvert_ascii - Converts string from ASCII into the current format.
uconvert_toascii - Converts strings from the current format into ASCII.
empty_string - Universal string NULL terminator.
ugetc - Low level helper function for reading Unicode text data.
ugetx
ugetxc - Low level helper function for reading Unicode text data.
usetc - Low level helper function for writing Unicode text data.
uwidth - Low level helper function for testing Unicode text data.
ucwidth - Low level helper function for testing Unicode text data.
uisok - Low level helper function for testing Unicode text data.
uoffset - Finds the offset of a character in a string.
ugetat - Finds out the value of a character in a string.
usetat - Replaces a character in a string.
uinsert - Inserts a character in a string.
uremove - Removes a character from a string.
ustrsize - Size of the string in bytes without null terminator.
ustrsizez - Size of the string in bytes including null terminator.
uwidth_max - Number of bytes a character can occupy.
utolower - Converts a letter to lower case.
utoupper - Converts a letter to upper case.
uisspace - Tells if a character is whitespace.
uisdigit - Tells if a character is a digit.
ustrdup - Duplicates a string.
_ustrdup - Duplicates a string with a custom memory allocator.
ustrcpy - Copies a string into another one.
ustrzcpy - Copies a string into another one, specifying size.
ustrcat - Concatenates a string to another one.
ustrzcat - Concatenates a string to another one, specifying size.
ustrlen - Tells the number of characters in a string.
ustrcmp - Compares two strings.
ustrncpy - Copies a string into another one, specifying size.
ustrzncpy - Copies a string into another one, specifying size.
ustrncat - Concatenates a string to another one, specifying size.
ustrzncat - Concatenates a string to another one, specifying size.
ustrncmp - Compares up to n letters of two strings.
ustricmp - Compares two strings ignoring case.
ustrnicmp - Compares up to n letters of two strings ignoring case.
ustrlwr - Replaces all letters with lower case.
ustrupr - Replaces all letters with upper case.
ustrchr - Finds the first occurrence of a character in a string.
ustrrchr - Finds the last occurence of a character in a string.
ustrstr - Finds the first occurence of a string in another one.
ustrpbrk - Finds the first character that matches any in a set.
ustrtok - Retrieves tokens from a string.
ustrtok_r - Reentrant function to retrieve tokens from a string.
uatof - Converts a string into a double.
ustrtol - Converts a string into an integer.
ustrtod - Converts a string into a floating point number.
ustrerror - Returns a string describing errno.
usprintf - Writes formatted data into a buffer.
uszprintf - Writes formatted data into a buffer, specifying size.
uvsprintf - Writes formatted data into a buffer, using variable arguments.
uvszprintf - Writes formatted data into a buffer, using size and variable arguments.

Allegro Manual

API

Unicode routines