This is how:
For example, 'w' with diacritical circumflex accent: ŵ; with tilde: w̃.
Those characters are produced as small 'w'
followed by combined diacritical marks circumflex accent and tilde, code points 0x0302 and 0x0303, respectfully.
First of all, don't mess it up: followed by the diacritic, not prefixed by. Also, remember that in little-endian (internal representation of .NET string is UTF-16LE) lower byte comes first; 0x0302 and 0x0303 code point come in the byte order 2, 3, 3, 3. However, the function
System.Text.Encoding.GetBytes
and
System.Text.Encoding.GetChars
take care of that:
Encoding Class (System.Text)[
^].
If you don't want to insert Unicode text in code files, these functions are useful:
Char.ConvertFromUtf32 Method (Int32) (System)[
^],
Char.ConvertToUtf32 Method (Char, Char) (System)[
^].
Here, its important to understand: UTF-32 is the only UTF which represents a character as the word numerically equal to the Unicode code point. It does not work with .NET characters but with .NET strings or pairs of characters. This is because .NET characters are not always really characters: some represent either low or high
surrogate from a
surrogate pair, so a "real" Unicode character (beyond
BMP) is represented as two .NET characters.
See also:
UTF-16 — Wikipedia, the free encyclopedia (
for surrogate pairs),
Universal Character Set characters — planes — Wikipedia, the free encyclopedia (
for BMP).
(Don't get me wrong: the diacritical marks have nothing to do with surrogate pairs; with the diacritical, you only use BMP, unless the "main" character code point is beyond it. I wrote previous character to explain the purpose of .NET UTF-32 functions which are not well explained my Microsoft MSDN help.)
—SA