Not byte! How can it be? You should use
System.String
which is the Unicode string, so it supports most languages at the same time. When you communicate through the network or any other kind of stream, all the text data is converted to/from the array of bytes anyway, but each character takes different number of bytes, 1 to 4, because Unicode supports code points in the range 0 to 0x10FFF. The particular presentation depends on UTF used for serialization. Internally, in memory, .NET (and Windows itself) uses UTF-16LE, where each character takes a 2-byte words or two such words called
surrogate pairs, which is needed for characters beyond
Base Multilingual Plane (BMP) which takes first 00 to 0xFFFF code points (excluding special ranges reserved for surrogates themselves).
All UTFs are equivalent. Despite their names showing number of bits, they all support all code points. In the files, there are usually detected by the BOM. Please see:
http://en.wikipedia.org/wiki/Unicode/[
^],
http://en.wikipedia.org/wiki/Code_point/[
^],
http://en.wikipedia.org/wiki/Byte_order_mark/[
^],
http://unicode.org/[
^],
http://unicode.org/faq/utf_bom.html[
^].
[EDIT]
In memory, you always work with strings. When you need to pass the via network or persist it in the file, you choose some encoding which presents the text in the form of array if characters and visa versa. You need to choose only one of UTFs. Prefer UTF-8. To do it directly, use the class
System.Text.Encoding
or/and its derived classes for every particular encoding. Please see:
http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx[
^].
You can directly use the methods
GetBytes
(text to array of bytes) and
GetChars
(to get Unicode characters).
For example, to get a string from array of bytes:
byte[] data =
string value = new string(System.Text.Encoding.UTF8.ToChars(data));
—SA