I've heard there are (used to be?) ambiguous mappings between Unicode and SHIFT_JIS codes. This KB article somewhat proves this.
So the question is: will I lose any data if I take SHIFT_JIS-encoded text, convert it to Unicode and back?
Details: I'm talking about Windows (XP and on) and .NET (which in theory relies on NLS API).
From stackoverflow
-
Yes, it looks like this will still lose data:
using System; using System.Text; class Test { static void Main(string[] args) { Encoding shiftJis = Encoding.GetEncoding(932); byte[] original = new byte[] { 0x87, 0x90 }; string text = shiftJis.GetString(original); byte[] backAgain = shiftJis.GetBytes(text); Console.WriteLine("{0:x}{1:x}", backAgain[0], backAgain[1]); } }This prints 81E0, as predicted by the page you linked to.
0 comments:
Post a Comment