Allegro.cc - Online Community

Allegro.cc Forums » Off-Topic Ordeals » UTF-8 text and terminal on Windows

This thread is locked; no one can reply to it. rss feed Print
UTF-8 text and terminal on Windows
Polybios
Member #12,293
October 2010

I'd like to output some UTF-8 text to a terminal on Windows. Of course, it doesn't work with cmd.exe. I've read about setting a 'magic' codepage 65001, but this doesn't work for me either.

So I'm looking for a replacement terminal app for Windows which supports that. I've tried MSYS already to no avail.
Do you have any suggestions? :)

torhu
Member #2,727
September 2002
avatar

Codepage 65001 works for me, but maybe you're doing something I'm not?

By the way, you have to make sure it's set to a font that supports the characters you want to see. Mine is set to Consolas.

Polybios
Member #12,293
October 2010

I'm getting two boxes with question marks for one UTF-8 character with either Consolas or Lucida Console. If it was just the glyphs missing, there should only be one of those tiny boxes per character, I guess. So it's probably not a font-problem. I've checked the fonts, the glyphs are there. :-/

torhu
Member #2,727
September 2002
avatar

Hm. Well, UTF-8 support in Windows still sucks.

Elias
Member #358
May 2000

Output as utf16 instead of utf8 maybe, at least worth a try. al_ustr_encode_utf16 might be helpful.

--
"Either help out or stop whining" - Evert

Polybios
Member #12,293
October 2010

I've further tested this crap with cp 65001.
Looks like cout and puts do work with UTF-8, just the printf family of functions doesn't... Now why's that? ::)

torhu
Member #2,727
September 2002
avatar

Could be because printf outputs one byte at a time, while the others don't, since they have no need to inspect the contents of the string. Just guessing.

Polybios
Member #12,293
October 2010

It's ... very interesting behavior. When I put multibyte characters into a %s argument-string, it doesn't work either.
Reading input doesn't seem to work at all. As soon as there is a multibyte character, the usual functions just fail and return empty / garbage strings.

But I've finally managed to find something on the matter
here. Input can be fixed by installing a custom streambuffer on input streams among some other stuff that needs to be done.

furinkan
Member #10,271
October 2008
avatar

There's some free OS out there that supports UTF-8 on the terminal. Was it... Line Ucks? ;D

Polybios
Member #12,293
October 2010

I know. But I need to port it to Windows.
Now I was finally able to read wstrings without problems via ReadConsoleW WinApi, yay!

For wprintf to work at all, you have to call _setmode(_fileno(stdout), _O_U16TEXT) beforehand plus everything needs to be converted to wstrings, which I don't want to do.

I guess I'll just re#define printf to some custom function. snprintf-ing and then fputs-ing UTF-8 works with codepage 65001 ::)

furinkan
Member #10,271
October 2008
avatar

Eww... I'm really sorry. :-/

You could use Allegro's routines to write the UTF-8 to a file. I believe you could use fputs() and al_fwrite(). Your editor obviously supports UTF-8...

Unless you need this log to be real time. ???

Polybios
Member #12,293
October 2010

Ok, it's solved:

  • Output via chchp 65001, re#defining printf to sprintf to a buffer which is then puts-ed out, since normal printf just won't work with multibyte-chars

  • Input via ReadConsoleW and subsequent conversion to UTF-8 with WideCharToMultiByte

What a crap thing to do.

I was surprised that cmd.exe did pass all files found by * in a certain directory to my program via argc/argv, though. Last time I checked (long time ago), you had to do the scanning yourself. :o

torhu
Member #2,727
September 2002
avatar

Polybios said:

I was surprised that cmd.exe did pass all files found by * in a certain directory to my program via argc/argv, though. Last time I checked (long time ago), you had to do the scanning yourself. :o

Are you sure? I just tested with VS 9, and that definitely didn't happen... :-/

Polybios
Member #12,293
October 2010

Yes, it works. I'm using g++ / MinGW, though, maybe it's a special feature of their runtime?

torhu
Member #2,727
September 2002
avatar

Yes, GCC is doing it because Unix shells usually do it. In other words, cmd.exe had nothing to do with it.

Edgar Reynaldo
Member #8,592
May 2007
avatar

Why are you guys talking about compilers? What do they have to do with whether cmd.exe globs * into a file list? It's easy to see it does, on Vista at least with this tiny program :

#include <cstdio>

int main(int argc , char** argv) {
  
  for (int i = 0 ; i < argc ; ++i) {
    printf("Arg %d = '%s'\n" , i , argv[i]);
  }

  return 0;
}

Try passing * or *.* or something similar to the program and you will see cmd.exe turns the *s into batches of command line parameters.

Arthur Kalliokoski
Second in Command
February 2005
avatar

For compilers that do it the Microsoft way, you have to link in glob.obj or something, it's been that way lo these many years. DJGPP had a VMS-like way of globbing through all the subdirectories with a "../*" approach. The cmd.exe program only loads up the globbing program and passes on the arguments verbatim.

“Throughout history, poverty is the normal condition of man. Advances which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people. Whenever this tiny minority is kept from creating, or (as sometimes happens) is driven out of a society, the people then slip back into abject poverty. This is known as "bad luck.”

― Robert A. Heinlein

torhu
Member #2,727
September 2002
avatar

Which version of VS does that?

Arthur Kalliokoski
Second in Command
February 2005
avatar

This MSDN article says it's Setargv.obj. Maybe I was thinking of the old Borland compilers with glob.obj or something.

“Throughout history, poverty is the normal condition of man. Advances which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people. Whenever this tiny minority is kept from creating, or (as sometimes happens) is driven out of a society, the people then slip back into abject poverty. This is known as "bad luck.”

― Robert A. Heinlein

torhu
Member #2,727
September 2002
avatar

Wow. I guess Microsoft must have had powerful enemies at that time. Maybe God, Satan, and Hitler teamed up with Mighty Mouse or something. It's not every day that M$ do something that doesn't not make sense :P

Arthur Kalliokoski
Second in Command
February 2005
avatar

I did a lot of assembler programs on DOS back in the day, and the Program Segment Prefix only had room for 127 bytes to store parameters. For DOS compilers that needed a long command line, a '@' prefix was used to specify a file that had all the needed info.

Windows has improved on that somewhat in the meantime, be grateful.

“Throughout history, poverty is the normal condition of man. Advances which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people. Whenever this tiny minority is kept from creating, or (as sometimes happens) is driven out of a society, the people then slip back into abject poverty. This is known as "bad luck.”

― Robert A. Heinlein

torhu
Member #2,727
September 2002
avatar

That's not the same thing, though :P

Arthur Kalliokoski
Second in Command
February 2005
avatar

“Throughout history, poverty is the normal condition of man. Advances which permit this norm to be exceeded — here and there, now and then — are the work of an extremely small minority, frequently despised, often condemned, and almost always opposed by all right-thinking people. Whenever this tiny minority is kept from creating, or (as sometimes happens) is driven out of a society, the people then slip back into abject poverty. This is known as "bad luck.”

― Robert A. Heinlein

Go to: