Traditional Chinese characters aren’t being read from network stream

Question

1.00/5 (1 vote)

See more:

Hi Guys,

I am facing this unique situation in my Application. I am having a Windows Service installed and running 24/7 on my server. This server basically listens to a server port offsite. The data received will be a mix of English characters and Traditional Chinese. I write the data received from the Offsite Server to a db table and file.

The issue is that the traditional Chinese characters aren’t being read as they have to be read. English characters are read perfectly.

I am using the following code

TcpClient clientSocket
NetworkStream networkStream = clientSocket.GetStream();

byte[] bytes = new byte[clientSocket.ReceiveBufferSize + 1];
networkStream.Read(bytes, 0, clientSocket.ReceiveBufferSize);

string clientdata = Encoding.GetEncoding(1252).GetString(bytes);

This clientdata contains the received string sent from the Offsite Server. This data is not as I excepted for Chinese characters.

I have tried using both Big5 and 1252 encoding.

Any help would be good.

Thanks
Balaji V

Posted 21-Mar-12 19:46pm

Balaji1982

Add a Solution

2 solutions

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Bernhard Hiller · Answer 1 · 2012-03-21T20:55:00

Solution 1

You have to use the same encoding as was used by the "Offsite Server" when sending the data. Try to contact the developers of that product, that's easier than trying all possible encodings.

Posted 21-Mar-12 20:55pm

Bernhard Hiller

Sergey Alexandrovich Kryukov · Answer 2 · 2012-03-21T21:03:00

Solution 2

No! Don't use Big5 or 1252. .NET works with Unicode. Well, you could use anything in between, but when you present the text, write it to file, etc., you should only use Unicode. As to the 1252 encoding, it only support Latin, not Chinese.

By the way, I'm pretty much sure all your characters are transmitted through the communications just fine.

Pre-Unicode era has finished, just face it.

—SA

Posted 21-Mar-12 21:03pm

Sergey Alexandrovich Kryukov

Comments

Balaji1982 22-Mar-12 3:15am

Hi Sa,

What Unicode Encoding should i use that supports both english and traditional chinese?

there are quite a few i guess

UTF7, UTF8, Unicode, BigEndianUnicode, UTF32

Sergey Alexandrovich Kryukov 22-Mar-12 3:32am

Any of them, but UTF-8 is the most economic and is the standard-de-facto for the Web. As it is byte-oriented, it takes only one byte for Latin and basic punctuation (for the code points falling into the ASCII range), other code points are of variable size. Overall, it gives considerable economy in size. Besides, there is only one UTF-8, there are no UTF-8LE or UTF-8BE.
--SA

Sergey Alexandrovich Kryukov 22-Mar-12 3:35am

All UTFs are equivalent. Despite the name with number of bits, they all support 0 to 0x10FFF code points. In the files, there are usually detected by the BOM. Please see:

http://en.wikipedia.org/wiki/Unicode
http://en.wikipedia.org/wiki/Code_point
http://en.wikipedia.org/wiki/Byte_order_mark

http://unicode.org/
http://unicode.org/faq/utf_bom.html

--SA