UTF-8 text and terminal on Windows
Polybios

I'd like to output some UTF-8 text to a terminal on Windows. Of course, it doesn't work with cmd.exe. I've read about setting a 'magic' codepage 65001, but this doesn't work for me either.

So I'm looking for a replacement terminal app for Windows which supports that. I've tried MSYS already to no avail.
Do you have any suggestions? :)

torhu

Codepage 65001 works for me, but maybe you're doing something I'm not?

By the way, you have to make sure it's set to a font that supports the characters you want to see. Mine is set to Consolas.

Polybios

I'm getting two boxes with question marks for one UTF-8 character with either Consolas or Lucida Console. If it was just the glyphs missing, there should only be one of those tiny boxes per character, I guess. So it's probably not a font-problem. I've checked the fonts, the glyphs are there. :-/

torhu

Hm. Well, UTF-8 support in Windows still sucks.

Elias

Output as utf16 instead of utf8 maybe, at least worth a try. al_ustr_encode_utf16 might be helpful.

Polybios

I've further tested this crap with cp 65001.
Looks like cout and puts do work with UTF-8, just the printf family of functions doesn't... Now why's that? ::)

torhu

Could be because printf outputs one byte at a time, while the others don't, since they have no need to inspect the contents of the string. Just guessing.

Polybios

It's ... very interesting behavior. When I put multibyte characters into a %s argument-string, it doesn't work either.
Reading input doesn't seem to work at all. As soon as there is a multibyte character, the usual functions just fail and return empty / garbage strings.

But I've finally managed to find something on the matter
here. Input can be fixed by installing a custom streambuffer on input streams among some other stuff that needs to be done.

furinkan

There's some free OS out there that supports UTF-8 on the terminal. Was it... Line Ucks? ;D

Polybios

I know. But I need to port it to Windows.
Now I was finally able to read wstrings without problems via ReadConsoleW WinApi, yay!

For wprintf to work at all, you have to call _setmode(_fileno(stdout), _O_U16TEXT) beforehand plus everything needs to be converted to wstrings, which I don't want to do.

I guess I'll just re#define printf to some custom function. snprintf-ing and then fputs-ing UTF-8 works with codepage 65001 ::)

furinkan

Eww... I'm really sorry. :-/

You could use Allegro's routines to write the UTF-8 to a file. I believe you could use fputs() and al_fwrite(). Your editor obviously supports UTF-8...

Unless you need this log to be real time. ???

Polybios

Ok, it's solved:

  • Output via chchp 65001, re#defining printf to sprintf to a buffer which is then puts-ed out, since normal printf just won't work with multibyte-chars

  • Input via ReadConsoleW and subsequent conversion to UTF-8 with WideCharToMultiByte

What a crap thing to do.

I was surprised that cmd.exe did pass all files found by * in a certain directory to my program via argc/argv, though. Last time I checked (long time ago), you had to do the scanning yourself. :o

torhu
Polybios said:

I was surprised that cmd.exe did pass all files found by * in a certain directory to my program via argc/argv, though. Last time I checked (long time ago), you had to do the scanning yourself. :o

Are you sure? I just tested with VS 9, and that definitely didn't happen... :-/

Polybios

Yes, it works. I'm using g++ / MinGW, though, maybe it's a special feature of their runtime?

torhu

Yes, GCC is doing it because Unix shells usually do it. In other words, cmd.exe had nothing to do with it.

Edgar Reynaldo

Why are you guys talking about compilers? What do they have to do with whether cmd.exe globs * into a file list? It's easy to see it does, on Vista at least with this tiny program :

#include <cstdio>

int main(int argc , char** argv) {
  
  for (int i = 0 ; i < argc ; ++i) {
    printf("Arg %d = '%s'\n" , i , argv[i]);
  }

  return 0;
}

Try passing * or *.* or something similar to the program and you will see cmd.exe turns the *s into batches of command line parameters.

Arthur Kalliokoski

For compilers that do it the Microsoft way, you have to link in glob.obj or something, it's been that way lo these many years. DJGPP had a VMS-like way of globbing through all the subdirectories with a "../*" approach. The cmd.exe program only loads up the globbing program and passes on the arguments verbatim.

torhu

Which version of VS does that?

Arthur Kalliokoski

This MSDN article says it's Setargv.obj. Maybe I was thinking of the old Borland compilers with glob.obj or something.

torhu

Wow. I guess Microsoft must have had powerful enemies at that time. Maybe God, Satan, and Hitler teamed up with Mighty Mouse or something. It's not every day that M$ do something that doesn't not make sense :P

Arthur Kalliokoski

I did a lot of assembler programs on DOS back in the day, and the Program Segment Prefix only had room for 127 bytes to store parameters. For DOS compilers that needed a long command line, a '@' prefix was used to specify a file that had all the needed info.

Windows has improved on that somewhat in the meantime, be grateful.

torhu

That's not the same thing, though :P

Arthur Kalliokoski
Thread #614672. Printed from Allegro.cc