Allegro.cc - Online Community

Allegro.cc Forums » Programming Questions » char and string problem

This thread is locked; no one can reply to it. rss feed Print
char and string problem
AMCerasoli
Member #11,955
May 2010
avatar

So I was declaring and initializing a char like this:

char example=164;

so.. when I do:

std::cout<<example<<std::endl; it shows me a "ñ" letter...

Becuase this: char ene="ñ";give me an error
and this char ene='ñ'; give me a: ▓

but since al_draw_text only accepts a pointer, I have to do:

char *example;

and if I do:

example=164;

I get an error... (Obviously)

If I do:

example="Ñ";

the console shows me this: Ñ, two symbols...

Then if I do:

    char example[1];

    example[0]='ñ';

I get again this symbol: ▓

And if I do:

char example[]="ñ";

I get this: ├ ... another symbol...

if I do: char *example="ñ";
I get: ├▒

So... I don't know what else to do...

I just want one char, with just one letter...

The estrange thing is, that if do...

#SelectExpand
1 2#include <iostream> 3 4#include "allegro5/allegro.h" 5#include "allegro5/allegro_font.h" 6#include "allegro5/allegro_ttf.h" 7 8 9int main() 10{ 11 ALLEGRO_DISPLAY *display ; 12 ALLEGRO_FONT *font ; 13 14 al_init(); 15 al_init_font_addon(); 16 al_init_ttf_addon(); 17 18 display = al_create_display(300, 300); 19 font = al_load_ttf_font("consola.ttf", 40,0); 20 21 22 std::string example; 23 const char *temp; 24 25 example="ñÑñÑ"; 26 temp=example.data(); 27 28 std::cout << example << std::endl; 29 30 while(true){ 31 32 al_draw_text(font,al_map_rgb(255, 255, 255),0 ,0 , 0, temp); 33 al_flip_display(); 34 } 35 36 37 return 0; 38}

{"name":"ene.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/d\/9d6cdd902b557784e8ed5d39b9701b4b.jpg","w":423,"h":401,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/d\/9d6cdd902b557784e8ed5d39b9701b4b"}ene.jpg

al_draw_text shows correctly the "ñ" letter... So what can I do?

I don't get it what is happening? :o

Conclusion:

What is happening is that the console is using Extended ASCII, and std::string was not made to use UTF-8, that was the whole problem... it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implications E.g. you can't count the amount of characters by length() | size(). Instead one has to iterate through the string, parse all UTF-8 multibytes and count each multibyte as one character.

For that reason is better to use the Allegro UTF-8 API...

If I do char temp[]={0xC3 ,0xB1,'\0'}; Allegro shows me perfectly the "ñ" letter since those are the bits (represented that way in Hex) that correspond to the "ñ" in UTF-8.

The console still don't knowing what is happening so it shows me "├▒" but that is fine, I won't use the console...

Mark Oates
Member #1,146
March 2001
avatar

Did you try
al_draw_text(font,al_map_rgb(255, 255, 255),0 ,0 , 0, example.c_str());

gnolam
Member #2,030
March 2002
avatar

Your console and your code are using different character encodings.

--
Move to the Democratic People's Republic of Vivendi Universal (formerly known as Sweden) - officially democracy- and privacy-free since 2008-06-18!

AMCerasoli
Member #11,955
May 2010
avatar

Did you try
al_draw_text(font,al_map_rgb(255, 255, 255),0 ,0 , 0, example.c_str());

Thanks man, I didn't know that...

gnolam said:

Your console and your code are using different character encodings.

Well yes... Code::Blocks is using UTF-8... and the console I don't know...

The problem is that, when I do example.erase(temp.length()-1); to delete one charter from the string works fine when using normal charters, but if I use the "ñ" charter, since it add two charters ( ex:├▒ ) the erase function thinks that there should be just one charter (and it erase just one charter) when actually there are two... So remains one charter there... where? I dunno... And when I type the next charter it breaks (al_draw_text).

bamccaig
Member #7,536
July 2006
avatar

Long story short, your code editor and output device (terminal or command prompt window) are using different character sets or encodings so the data comes out wrong. You need to sync them up so that they're using the same formats. To do it in a proper way that should work on all machine configurations, you'll need to use an API that knows how to convert to and from the internal and external formats. This often requires determining (or guessing) the external format, as well as choosing your internal format.

It can be very complicated and I still haven't figured it out yet so I can't really give you any code advice. However, if you get everything speaking the same language then you shouldn't notice that your program is character-encoding-stupid, at least not on your own machine(s). If you actually intend to release the program for others to use then it's best to do it right though.

Generally, Unicode is the preferred character set these days because it can basically cover every human language (and then some) that you'd ever want to support. There are some exceptions still I think, but it's still a growing standard. There are many ways that Unicode can be represented in a computer however. Among the most popular are UTF-8 and UTF-16. UTF-8 is good where space or bandwidth are more limited than processing power, and UTF-16 (or UTF-32) are more preferred where space is less of an issue than processing power is. There are also some languages that are only represented in 4 or more bytes, so using something like UTF-8 for them won't save you anything (and will actually cost you space). In any case, UTF-8 is a great choice for a character encoding because it's ASCII compatible, which means most editors and compilers will already understand the important characters. The characters above the ASCII range often don't matter to something like a compiler, so it doesn't hurt to use UTF-8 anyway. It might be easier to use UTF-16 (two-byte wide characters) in memory though to represent the characters to make processing strings easier, but only if you actually need to process characters. That's basically where your current program is going wrong. You're trying to process characters, assuming a character is a single-byte (or char), when in reality a character can be 1 byte or it can be 6 bytes. You don't know until you check the string, byte-by-byte. You basically need to use an API that is aware of the character encoding.

Long story short, if you can, get everything speaking UTF-8: your text editor and your terminal or command prompt window. If you have the time, look for tutorials on how to write character set and character encoding aware programs (and report back to us with the results :P). :-/ I think that parts of Allegro 5 are Unicode aware so maybe you can just use the Allegro 5 API to do it right, but I haven't learned those APIs yet so I'm no help with that either. :P

torhu
Member #2,727
September 2002
avatar

I'm assuming you're saving your source as utf-8. That's probably the best way to do it when using Allegro.

Try setting your console to utf-8, the command is 'chcp 65001'.

And if you have trouble with 'extended' characters in strings, you can use hexadecimal escape codes in strings, like:
const char * = "A\x42C"; // "ABC"

bamccaig
Member #7,536
July 2006
avatar

torhu said:

Try setting your console to utf-8, the command is 'chcp 65001'.

For some reason, this completely foobars my console in my XP VM. Vim won't run when I do that, and it even insta-crashed the entire VM once. :o

AMCerasoli
Member #11,955
May 2010
avatar

I could change the CHCP but just temporally... And now I realize that on Windows 7 what my program does is just show "I" charters and things like that, and the CHCP of my windows 7 is the same in this PC (Win Vista)... both says "Active code page: 850"... What might be happening? can I change the CHCP at run time?

In my XP machine also runs fine, and the code page is 435, or something like that...

I'm going to check the UTF-8 API of Allegro, but I think the problem is other...

PS: I Attached an example, if someone want to test it to see what happens... is an .exe

EDIT: Well I think I'm gonna say god bay to std::string, I'm using the al_draw_ustr instead... and I think is pretty much the same compared with std::string... the problem is that I'm going to need to save some text to disk... So... Lets see

EDIT Again: It hasn't a function like std::string.data()? or std::string.c_str()?

bamccaig
Member #7,536
July 2006
avatar

Joel Spolsky (a rather well known developer and blogger) wrote a pretty good article on the use of Unicode that I think somewhat tries to explain how to do things "right". It would have been nice if he'd have shown some programming code, preferably C, to back up the article, but it's a good read anyway.

SiegeLord
Member #7,827
October 2006
avatar

EDIT Again: It hasn't a function like std::string.data()? or std::string.c_str()?

al_cstr and al_cstr_dup respectively.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Edgar Reynaldo
Member #8,592
May 2007
avatar

AMCerasoli
Member #11,955
May 2010
avatar

bamccaig said:

Joel Spolsky (a rather well known developer and blogger) wrote a pretty good article on the use of Unicode [www.joelonsoftware.com] that I think somewhat tries to explain how to do things "right". It would have been nice if he'd have shown some programming code, preferably C, to back up the article, but it's a good read anyway.

Really really thank you... You opened my mind. Everyone should read that article.

char array[2] = {164 , '\0'}
al_draw_text(al_font , al_map_rgb(255,255,255) , x , y , ALLEGRO_ALIGN_LEFT , array);

That doesn't work because the "164" Keystroke belongs to the Extended ASCII and I'm using UTF-8... I don't know if I'm wrong but since the "ñ" letter uses 2 bytes I can't store it like other letters...

What is happening is that the console is using Extended ASCII, and std::string was not made to use UTF-8, that was the whole problem... it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implications E.g. you can't count the amount of characters by length() | size(). Instead one has to iterate through the string, parse all UTF-8 multibytes and count each multibyte as one character.

For that reason is better to use the Allegro UTF-8 API...

If I do char temp[]={0xC3 ,0xB1,'\0'}; Allegro shows me perfectly the "ñ" letter since those are the bits (represented that way in Hex) that correspond to the "ñ" in UTF-8.

The console still don't knowing what is happening so it shows me "├▒" but that is fine, I won't use the console...

Go to: