char and string problem

char and string problem

AMCerasoli

Member #11,955

May 2010

So I was declaring and initializing a char like this:

char example=164;

so.. when I do:

std::cout<<example<<std::endl; it shows me a "ñ" letter...

Becuase this: char ene="ñ";give me an error
and this char ene='ñ'; give me a: ▓

but since al_draw_text only accepts a pointer, I have to do:

char *example;

and if I do:

example=164;

I get an error... (Obviously)

If I do:

example="Ñ";

the console shows me this: ├æ, two symbols...

Then if I do:

    char example[1];

    example[0]='ñ';

I get again this symbol: ▓

And if I do:

char example[]="ñ";

I get this: ├ ... another symbol...

if I do: char *example="ñ";
I get: ├▒

So... I don't know what else to do...

I just want one char, with just one letter...

The estrange thing is, that if do...

#SelectExpand
  1
  2#include <iostream>
  3
  4#include "allegro5/allegro.h"
  5#include "allegro5/allegro_font.h"
  6#include "allegro5/allegro_ttf.h"
  7
  8
  9int main()
 10{
 11   ALLEGRO_DISPLAY *display ;
 12   ALLEGRO_FONT    *font    ;
 13
 14   al_init();
 15   al_init_font_addon();
 16   al_init_ttf_addon();
 17
 18   display = al_create_display(300, 300);
 19   font = al_load_ttf_font("consola.ttf", 40,0);
 20
 21
 22    std::string example;
 23    const char *temp;
 24
 25    example="ñÑñÑ";
 26    temp=example.data();
 27
 28    std::cout << example << std::endl;
 29
 30    while(true){
 31
 32    al_draw_text(font,al_map_rgb(255, 255, 255),0 ,0 , 0, temp);
 33    al_flip_display();
 34    }
 35
 36
 37    return 0;
 38}

{"name":"ene.jpg","src":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/d\/9d6cdd902b557784e8ed5d39b9701b4b.jpg","w":423,"h":401,"tn":"\/\/djungxnpq2nug.cloudfront.net\/image\/cache\/9\/d\/9d6cdd902b557784e8ed5d39b9701b4b"}

al_draw_text shows correctly the "ñ" letter... So what can I do?

I don't get it what is happening?

Conclusion:

What is happening is that the console is using Extended ASCII, and std::string was not made to use UTF-8, that was the whole problem... it is perfectly possible to store UTF-8 strings in a std::string, however doing so can cause some implications E.g. you can't count the amount of characters by length() | size(). Instead one has to iterate through the string, parse all UTF-8 multibytes and count each multibyte as one character.

For that reason is better to use the Allegro UTF-8 API...

If I do char temp[]={0xC3 ,0xB1,'\0'}; Allegro shows me perfectly the "ñ" letter since those are the bits (represented that way in Hex) that correspond to the "ñ" in UTF-8.

The console still don't knowing what is happening so it shows me "├▒" but that is fine, I won't use the console...

Mark Oates

Member #1,146

March 2001

Did you try
al_draw_text(font,al_map_rgb(255, 255, 255),0 ,0 , 0, example.c_str());

--
Visit CLUBCATT.com for cat shirts, cat mugs, puzzles, art and more <-- coupon code ALLEGRO4LIFE at checkout and get $3 off any order of 3 or more items!

AllegroFlare • AllegroFlare Docs • AllegroFlare GitHub

gnolam

Member #2,030

March 2002

Your console and your code are using different character encodings.

--
Move to the Democratic People's Republic of Vivendi Universal (formerly known as Sweden) - officially democracy- and privacy-free since 2008-06-18!

AMCerasoli

Member #11,955

May 2010

Mark Oates said:

Did you try
al_draw_text(font,al_map_rgb(255, 255, 255),0 ,0 , 0, example.c_str());

Thanks man, I didn't know that...

gnolam said:

Your console and your code are using different character encodings.

Well yes... Code::Blocks is using UTF-8... and the console I don't know...

The problem is that, when I do example.erase(temp.length()-1); to delete one charter from the string works fine when using normal charters, but if I use the "ñ" charter, since it add two charters ( ex:├▒ ) the erase function thinks that there should be just one charter (and it erase just one charter) when actually there are two... So remains one charter there... where? I dunno... And when I type the next charter it breaks (al_draw_text).

bamccaig

Member #7,536

July 2006

Long story short, your code editor and output device (terminal or command prompt window) are using different character sets or encodings so the data comes out wrong. You need to sync them up so that they're using the same formats. To do it in a proper way that should work on all machine configurations, you'll need to use an API that knows how to convert to and from the internal and external formats. This often requires determining (or guessing) the external format, as well as choosing your internal format.

It can be very complicated and I still haven't figured it out yet so I can't really give you any code advice. However, if you get everything speaking the same language then you shouldn't notice that your program is character-encoding-stupid, at least not on your own machine(s). If you actually intend to release the program for others to use then it's best to do it right though.

Generally, Unicode is the preferred character set these days because it can basically cover every human language (and then some) that you'd ever want to support. There are some exceptions still I think, but it's still a growing standard. There are many ways that Unicode can be represented in a computer however. Among the most popular are UTF-8 and UTF-16. UTF-8 is good where space or bandwidth are more limited than processing power, and UTF-16 (or UTF-32) are more preferred where space is less of an issue than processing power is. There are also some languages that are only represented in 4 or more bytes, so using something like UTF-8 for them won't save you anything (and will actually cost you space). In any case, UTF-8 is a great choice for a character encoding because it's ASCII compatible, which means most editors and compilers will already understand the important characters. The characters above the ASCII range often don't matter to something like a compiler, so it doesn't hurt to use UTF-8 anyway. It might be easier to use UTF-16 (two-byte wide characters) in memory though to represent the characters to make processing strings easier, but only if you actually need to process characters. That's basically where your current program is going wrong. You're trying to process characters, assuming a character is a single-byte (or char), when in reality a character can be 1 byte or it can be 6 bytes. You don't know until you check the string, byte-by-byte. You basically need to use an API that is aware of the character encoding.

Long story short, if you can, get everything speaking UTF-8: your text editor and your terminal or command prompt window. If you have the time, look for tutorials on how to write character set and character encoding aware programs (and report back to us with the results ). :-/ I think that parts of Allegro 5 are Unicode aware so maybe you can just use the Allegro 5 API to do it right, but I haven't learned those APIs yet so I'm no help with that either.

--
I mean the best with what I say. It doesn't always sound that way.

torhu

Member #2,727

September 2002

I'm assuming you're saving your source as utf-8. That's probably the best way to do it when using Allegro.

Try setting your console to utf-8, the command is 'chcp 65001'.

And if you have trouble with 'extended' characters in strings, you can use hexadecimal escape codes in strings, like:
const char * = "A\x42C"; // "ABC"

---
Smokin' Guns - spaghetti western FPS action

bamccaig

Member #7,536

July 2006

torhu said:

Try setting your console to utf-8, the command is 'chcp 65001'.

For some reason, this completely foobars my console in my XP VM. Vim won't run when I do that, and it even insta-crashed the entire VM once.

--
I mean the best with what I say. It doesn't always sound that way.

AMCerasoli

Member #11,955

May 2010

I could change the CHCP but just temporally... And now I realize that on Windows 7 what my program does is just show "I" charters and things like that, and the CHCP of my windows 7 is the same in this PC (Win Vista)... both says "Active code page: 850"... What might be happening? can I change the CHCP at run time?

In my XP machine also runs fine, and the code page is 435, or something like that...

I'm going to check the UTF-8 API of Allegro, but I think the problem is other...

PS: I Attached an example, if someone want to test it to see what happens... is an .exe

EDIT: Well I think I'm gonna say god bay to std::string, I'm using the al_draw_ustr instead... and I think is pretty much the same compared with std::string... the problem is that I'm going to need to save some text to disk... So... Lets see

EDIT Again: It hasn't a function like std::string.data()? or std::string.c_str()?

bamccaig

Member #7,536

July 2006

Joel Spolsky (a rather well known developer and blogger) wrote a pretty good article on the use of Unicode that I think somewhat tries to explain how to do things "right". It would have been nice if he'd have shown some programming code, preferably C, to back up the article, but it's a good read anyway.

--
I mean the best with what I say. It doesn't always sound that way.

SiegeLord

Member #7,827

October 2006

AMCERASOLI said:

EDIT Again: It hasn't a function like std::string.data()? or std::string.c_str()?

al_cstr and al_cstr_dup respectively.

"For in much wisdom is much grief: and he that increases knowledge increases sorrow."-Ecclesiastes 1:18
[SiegeLord's Abode][Codes]:[DAllegro5]:[RustAllegro]

Edgar Reynaldo

Major Reynaldo

May 2007

AMCERASOLI said:

I just want one char, with just one letter...

char array[2] = {164 , '\0'}
al_draw_text(al_font , al_map_rgb(255,255,255) , x , y , ALLEGRO_ALIGN_LEFT , array);

Eagle and Allegro 5 binaries | Older Allegro 4 and 5 binaries | Allegro 5 compile guide
King Piccolo will make it so Then I could just flip it with some sweet, sweet matrix math. In either case, I’ll ride this wagon until the wheels fall off. Thanks to those that keep it running

AMCerasoli

Member #11,955

May 2010

bamccaig said:

Joel Spolsky (a rather well known developer and blogger) wrote a pretty good article on the use of Unicode [www.joelonsoftware.com] that I think somewhat tries to explain how to do things "right". It would have been nice if he'd have shown some programming code, preferably C, to back up the article, but it's a good read anyway.

Really really thank you... You opened my mind. Everyone should read that article.

Edgar Reynaldo said:

char array[2] = {164 , '\0'}
al_draw_text(al_font , al_map_rgb(255,255,255) , x , y , ALLEGRO_ALIGN_LEFT , array);

That doesn't work because the "164" Keystroke belongs to the Extended ASCII and I'm using UTF-8... I don't know if I'm wrong but since the "ñ" letter uses 2 bytes I can't store it like other letters...

For that reason is better to use the Allegro UTF-8 API...

If I do char temp[]={0xC3 ,0xB1,'\0'}; Allegro shows me perfectly the "ñ" letter since those are the bits (represented that way in Hex) that correspond to the "ñ" in UTF-8.

The console still don't knowing what is happening so it shows me "├▒" but that is fine, I won't use the console...