tag:blogger.com,1999:blog-2083400580050165811.post2545654704927684119..comments2023-06-02T15:10:12.074+01:00Comments on Illegal Argument Exception: I18N: comparing character encoding in C, C#, Java, Python and RubyUnknownnoreply@blogger.comBlogger2125tag:blogger.com,1999:blog-2083400580050165811.post-75530310415536013722010-11-16T20:25:48.776+00:002010-11-16T20:25:48.776+00:00In UTF-16, a code unit is 16 bits (2 bytes). Unico...In UTF-16, a <b>code unit</b> is 16 bits (2 bytes). Unicode <i>code points</i> (or characters) are composed of one or two <b>code units</b>. The <b>code unit</b> values U+D800-U+DBFF and U+DC00-U+DFFF are reserved to form surrogate pairs (4 byte sequences).<br /><br />UTF-16 is a variable-width encoding (like UTF-8 or some of the legacy Asian character sets).<br /><br />UCS2 is restricted to the McDowellhttps://www.blogger.com/profile/15240682237791734569noreply@blogger.comtag:blogger.com,1999:blog-2083400580050165811.post-18875547114477475212010-11-16T05:56:30.490+00:002010-11-16T05:56:30.490+00:00I always had this question.Is a w_char or Unicode ...I always had this question.Is a w_char or Unicode in Windows UTF16?All the documentation refers to w_char as fixed size 2 bytes which it looks more than UCS2, since UTF16 is variable size 2-4 bytes.Anonymousnoreply@blogger.com