Of course, and that is possible since the standard is Unicode.
That has nothing to do with Unicode. And there are many standards, which is why text programming is such a mess.
To me, Unicode is a standard that is also used for the Windows O.S. That is the whole point about Unicode.
For example: If I'm in Russia and I'm writing a text, my software is interpreting (along with the OS) this text for example using UTF-8, so the program read at real time Unicode code points, save it in RAM using UTF-8, and shows me correctly that text.
The same what it's doing right know while I'm writing this text.
When I save the text It's automatically saved as UTF-8 (editors doesn't ask you how to encode the text when you save the file, since you were using UTF-8 the software assumes that it must be saved as UTF-8), so if someone else want to read that text, must set his text editor to read UTF-8.
But if someone want to transform UTF-8 to UTF-16 for example, it must have Unicode, otherwise there is no way to do the job. For that reason I can't understand you when you say: "Besides, even if everybody was using Unicode, that doesn't tell you what the encoding is." If you weren't using Unicode then you couldn't be using UTF-8 nor other encoding type... right?
The whole point of Unicode is to define a single mapping standard that supports basically every language known to humans. That doesn't mean that everybody necessarily uses it. Many people still don't use it yet and most people are still oblivious that it even exists.
Unicode is just a standard mapping of characters to numbers. These numbers are referred to as code points. The entire mapping is referred to as a character set. For example:
Character Decimal Hexadecimal Code Point
A 65 41 U+0041
B 66 42 U+0042
C 67 43 U+0043
Effectively, that is all Unicode defines. The problem is that it doesn't specify how to store that in memory. Sure, these numbers are small (65-67) so they fit in a single byte, but Unicode includes hundreds of thousands of characters. Some code points require at least 4 bytes to represent them. The problem is that if we assume that every character is 4 bytes then some text (like English text) is going to be wasting 4x as much memory as it needs (for every character, 3/4 bytes would always be zero) because most English characters fit in a single byte.
To describe the standards for representing Unicode code points in memory they came up with character encodings. Unicode encodings actually describe how to represent Unicode code points in memory. For example, UTF-8 is one such encoding. The first non-zero bits of the first byte indicate how many bytes a particular character uses. To know what the code point is, you need to extract those bits and string them all together to form the real code point.
UTF-32 is another possible encoding. There is no real magic here (AFAIK). Instead, every character is 4-bytes wide. This works great for languages that always use 4-bytes to represent characters (I imagine Chinese dialects would) because there's no complex counting of variable-width characters.
So even if everybody is using Unicode, there are binary incompatibilities between the character encodings used to represent Unicode. If I send my text to you as UTF-16 and you try to interpret it as UTF-8 you're going to get the wrong text.
When you save text it's entirely up to your text editor to determine which character encoding to write the file in. It could write it as UTF-7, UTF-8, UTF-16, UTF-32, or some completely non-Unicode encoding if it so chooses. Most editors allow you to change the encoding used.
Of curse the Allegro API take care of it. that is his job
There are some things that Allegro can't possible take care of. For example, if you open a text file Allegro has no way to know what encoding that text file is written in. It can try to guess, but can't be sure. That's why most applications that handle text allow you to change the encoding at run-time. If you see a bunch of missing characters or the characters appear to be gibberish (i.e., an English document appears in Chinese characters) then it could be that the editor is using the wrong encoding.
Web browsers are probably the most common application that we use daily that encounter all sorts of character encodings.
The compiler doesn't need to know about the encoding format of the source code file.
Yes it does. For the compiler to understand the text it needs to know how the text is encoded. Older compilers, like C and C++ compilers will generally assume ASCII (or ASCII compatible, like UTF-8). IIRC, the Visual C# compiler expects UTF-16, but I'm not too sure (maybe it can guess).
I think, When you save your file as UTF-8 what you're doing is telling to the IDE to save the "" surrounded strings to be saved as UTF-8 strings...
No. You're telling the editor to save the entire file as UTF-8. It just so happens that UTF-8 is ASCII compatible so C and C++ compilers shouldn't even notice.
Instead if I set my IDE to send UTF-8 strings, my IDE would be writing two bytes that are needed to represent that letter in UTF-8. Which can be used with the Allegro API...
Yes, but remember that it's the Allegro API that understands those characters, not the compiler. The compiler sees those as completely different characters. However, because they're in a string literal (i.e., "") it doesn't care. It just adds those bytes to the executable program as they are and expects the executable program to know what they mean to it.
If I'm using UTF-8 then I would be saving UTF-8, If I have to load strings better to be encoded using UTF-8 otherwise wouln'd work.
If you are in control of all inputs and outputs then you can choose whatever you want. You can require your files to be UTF-8 and require network communication to be UTF-8, etc. This will work fine. It's the times when you don't know that you have to care. That probably won't happen with a game, but it might happen if you write other programs that need to deal with variable inputs and outputs. For example, unless Allegro has an API to determine the encoding used by the terminal (or Command Prompt), you would need to use some other method to determine what the expected output is for you to be able to write non-English characters reliably to stdout or stderr.
So the question now becomes, if Windows NT supports both ASCII and Unicode, why do Unicode programs run faster. To answer this you have to understand Windows NT itself. All operating systems have what is called a "kernel." The kernel is the heart of the OS; it is the lowest level, the innards or guts of the OS. In Windows NT the kernel is written in Unicode, and therefore only understands Unicode. When an ANSI program runs on Windows NT, the OS must convert the strings from ASCII to Unicode. This takes both time to convert everything, and memory to store both copies (ASCII and Unicode). Whereas a Unicode program has straight access to the kernel and is faster. Now on modern computers running at gigahertz speeds and having hundreds of megs of RAM this speed difference is minimal, but it does exist. The simple fact remains, the same program running as either ANSI or Unicode, the Unicode version will always run faster.
Notice that the author says "Unicode". They don't say "UTF-8" or "UTF-16". Either the kernel supports all Unicode encodings (possible, but unlikely) or it supports one (I would guess UTF-16). That means that if you use any other encoding (e.g., UTF-8) then it would be just as slow. Besides, the kernel only needs to understand text if you're giving it text to process. If you're reading from a device or file the kernel will give it to you exactly as it got it (it's binary data that could be text or could be an image; it doesn't know).
If You Just Want To Support UTF-8 In Your Game
Then you control all of the inputs and outputs by using Allegro's APIs. That is fine and should work flawlessly (writing to the screen; I don't know about to the terminal). Just know that Allegro won't necessarily know what to do if you need to process files or streams from uncontrolled sources. For example, an existing network protocol.
- I'm guessing because I don't know the exact number.