Unicode routines
Allegro can manipulate and display text using any character values from 0
right up to 2^32-1 (although the current implementation of the grabber can
only create fonts using characters up to 2^16-1). You can choose between a
number of different text encoding formats, which controls how strings are
stored and how Allegro interprets strings that you pass to it. This setting
affects all aspects of the system: whenever you see a function that returns
a char * type, or that takes a char * as an argument, that text will be in
whatever format you have told Allegro to use.
By default, Allegro uses UTF-8 encoded text (U_UTF8). This is a
variable-width format, where characters can occupy anywhere from one to four
bytes. The nice thing about it is that characters ranging from 0-127 are
encoded directly as themselves, so UTF-8 is upwardly compatible with 7-bit
ASCII ("Hello, World!" means the same thing regardless of whether you
interpret it as ASCII or UTF-8 data). Any character values above 128, such
as accented vowels, the UK currency symbol, and Arabic or Chinese
characters, will be encoded as a sequence of two or more bytes, each in the
range 128-255. This means you will never get what looks like a 7-bit ASCII
character as part of the encoding of a different character value, which
makes it very easy to manipulate UTF-8 strings.
There are a few editing programs that understand UTF-8 format text files.
Alternatively, you can write your strings in plain ASCII or 16-bit Unicode
formats, and then use the Allegro textconv program to convert them into
UTF-8.
If you prefer to use some other text format, you can set Allegro to work
with normal 8-bit ASCII (U_ASCII), or 16-bit Unicode (U_UNICODE) instead, or
you can provide some handler functions to make it support whatever other
text encoding you like (for example it would be easy to add support for 32
bit UCS-4 characters, or the Chinese GB-code format).
There is some limited support for alternative 8-bit codepages, via the
U_ASCII_CP mode. This is very slow, so you shouldn't use it for serious
work, but it can be handy as an easy way to convert text between different
codepages. By default the U_ASCII_CP mode is set up to reduce text to a
clean 7-bit ASCII format, trying to replace any accented vowels with their
simpler equivalents (this is used by the allegro_message() function when it
needs to print an error report onto a text mode DOS screen). If you want to
work with other codepages, you can do this by passing a character mapping
table to the set_ucodepage() function.
Note that you can use the Unicode routines before you call install_allegro()
or allegro_init(). If you want to work in a text mode other than UTF-8, it
is best to set it with set_uformat() just before you call these.
- set_uformat - Set the global current text encoding format.
- get_uformat - Finds out what text encoding format is currently selected.
- register_uformat - Installs handler functions for a new text encoding format.
- set_ucodepage - Sets 8-bit to Unicode conversion tables.
- need_uconvert - Tells if a string requires encoding conversion.
- uconvert_size - Number of bytes needed to store a string after conversion.
- do_uconvert - Converts a string to another encoding format.
- uconvert - Hih level string encoding conversion wrapper.
- uconvert_ascii - Converts string from ASCII into the current format.
- uconvert_toascii - Converts strings from the current format into ASCII.
- empty_string - Universal string NULL terminator.
- ugetc - Low level helper function for reading Unicode text data.
- ugetx
- ugetxc - Low level helper function for reading Unicode text data.
- usetc - Low level helper function for writing Unicode text data.
- uwidth - Low level helper function for testing Unicode text data.
- ucwidth - Low level helper function for testing Unicode text data.
- uisok - Low level helper function for testing Unicode text data.
- uoffset - Finds the offset of a character in a string.
- ugetat - Finds out the value of a character in a string.
- usetat - Replaces a character in a string.
- uinsert - Inserts a character in a string.
- uremove - Removes a character from a string.
- ustrsize - Size of the string in bytes without null terminator.
- ustrsizez - Size of the string in bytes including null terminator.
- uwidth_max - Number of bytes a character can occupy.
- utolower - Converts a letter to lower case.
- utoupper - Converts a letter to upper case.
- uisspace - Tells if a character is whitespace.
- uisdigit - Tells if a character is a digit.
- ustrdup - Duplicates a string.
- _ustrdup - Duplicates a string with a custom memory allocator.
- ustrcpy - Copies a string into another one.
- ustrzcpy - Copies a string into another one, specifying size.
- ustrcat - Concatenates a string to another one.
- ustrzcat - Concatenates a string to another one, specifying size.
- ustrlen - Tells the number of characters in a string.
- ustrcmp - Compares two strings.
- ustrncpy - Copies a string into another one, specifying size.
- ustrzncpy - Copies a string into another one, specifying size.
- ustrncat - Concatenates a string to another one, specifying size.
- ustrzncat - Concatenates a string to another one, specifying size.
- ustrncmp - Compares up to n letters of two strings.
- ustricmp - Compares two strings ignoring case.
- ustrnicmp - Compares up to n letters of two strings ignoring case.
- ustrlwr - Replaces all letters with lower case.
- ustrupr - Replaces all letters with upper case.
- ustrchr - Finds the first occurrence of a character in a string.
- ustrrchr - Finds the last occurence of a character in a string.
- ustrstr - Finds the first occurence of a string in another one.
- ustrpbrk - Finds the first character that matches any in a set.
- ustrtok - Retrieves tokens from a string.
- ustrtok_r - Reentrant function to retrieve tokens from a string.
- uatof - Converts a string into a double.
- ustrtol - Converts a string into an integer.
- ustrtod - Converts a string into a floating point number.
- ustrerror - Returns a string describing errno.
- usprintf - Writes formatted data into a buffer.
- uszprintf - Writes formatted data into a buffer, specifying size.
- uvsprintf - Writes formatted data into a buffer, using variable arguments.
- uvszprintf - Writes formatted data into a buffer, using size and variable arguments.