Sunday, March 20, 2011

How to output utf8 encoded characters normally in c/c++ console application?

Here's what I'm getting now by wprintf:

1胩?鳧?1敬爄汯?瑳瑡獵猆慴畴??

Is utf8 just not supported by windows?

From stackoverflow
  • No, Windows doesn't support printing UTF-8 to the console.

    When Windows says "Unicode", it means UTF-16. You need to use MultiByteToWideChar to convert from UTF-8 to UTF-16. Something like this:

    char* text = "My UTF-8 text\n";
    int len = MultiByteToWideChar(CP_UTF8, 0, text, -1, 0, 0);
    wchar_t *unicode_text = new wchar_t[len];
    MultiByteToWideChar(CP_UTF8, 0, text, -1, unicode_text, len);
    wprintf(L"%s", unicode_text);
    
    Alan : Is utf8 a subset of utf16?
    RichieHindle : @Alan: No, they are both methods of encoding Unicode text into bytes. UTF-16 uses at least 16 bits per character, whereas UTF-8 will use 8 bits per character for most Western characters. UTF-8 is popular on the web and on operating systems other than Windows, while Windows uses UTF-16. They do the same job, in different ways. See http://en.wikipedia.org/wiki/UTF-8 and http://en.wikipedia.org/wiki/UTF-16/UCS-2 for full details.
  • wprintf supposed to receive a UTF-16 encoded string. Use the following for conversion:

    Use MultiByteToWideChar with CP_UTF8 codepage to do the conversion. (and don't do blind casting from char* into wchar_t*).

    Let_Me_Be : It is supposed to receive UTF-16 only on Windows (which is actually against the standard, cause UTF-16 is a variable length encoding). Most other platforms expect UTF-32.

0 comments:

Post a Comment