data type should i use to read chinese and english characters

Question

0.00/5 (No votes)

See more:

stream

Which data type should i use to read chinese and english characters from a stream?

Should i use Byte or Char?

Posted 22-Mar-12 0:53am

Balaji1982

Add a Solution

Comments

Lakamraju Raghuram 22-Mar-12 6:59am

what is the prog lang you are speaking off

Balaji1982 22-Mar-12 7:06am

c#

2 solutions

Solution 1

Hi,

Nvarchar in database side for inserting data and in front end while passing parameter u need to add Nbefore the value!

verify the below link!

http://forums.asp.net/t/1427585.aspx/1?C+datatype+for+all+world+languages[^]

Happy coding!!!

Posted 22-Mar-12 1:11am

visnumca123

Comments

Uday P.Singh 25-Mar-12 5:54am

where does OP asked for database insertion?

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Accepted Answer · 2012-03-22T07:26:00

Not byte! How can it be? You should use System.String which is the Unicode string, so it supports most languages at the same time. When you communicate through the network or any other kind of stream, all the text data is converted to/from the array of bytes anyway, but each character takes different number of bytes, 1 to 4, because Unicode supports code points in the range 0 to 0x10FFF. The particular presentation depends on UTF used for serialization. Internally, in memory, .NET (and Windows itself) uses UTF-16LE, where each character takes a 2-byte words or two such words called surrogate pairs, which is needed for characters beyond Base Multilingual Plane (BMP) which takes first 00 to 0xFFFF code points (excluding special ranges reserved for surrogates themselves).

All UTFs are equivalent. Despite their names showing number of bits, they all support all code points. In the files, there are usually detected by the BOM. Please see:

http://en.wikipedia.org/wiki/Unicode/[^],
http://en.wikipedia.org/wiki/Code_point/[^],
http://en.wikipedia.org/wiki/Byte_order_mark/[^],

http://unicode.org/[^],
http://unicode.org/faq/utf_bom.html[^].

[EDIT]

In memory, you always work with strings. When you need to pass the via network or persist it in the file, you choose some encoding which presents the text in the form of array if characters and visa versa. You need to choose only one of UTFs. Prefer UTF-8. To do it directly, use the class System.Text.Encoding or/and its derived classes for every particular encoding. Please see:
http://msdn.microsoft.com/en-us/library/system.text.encoding.aspx[^].

You can directly use the methods GetBytes (text to array of bytes) and GetChars (to get Unicode characters).

For example, to get a string from array of bytes:

C#

byte[] data = //let's say, received from network...

//...
string value = new string(System.Text.Encoding.UTF8.ToChars(data));

—SA