Click here to Skip to main content
15,890,527 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
Hi Guys,

I am facing this unique situation in my Application. I am having a Windows Service installed and running 24/7 on my server. This server basically listens to a server port offsite. The data received will be a mix of English characters and Traditional Chinese. I write the data received from the Offsite Server to a db table and file.

The issue is that the traditional Chinese characters aren’t being read as they have to be read. English characters are read perfectly.

I am using the following code

TcpClient clientSocket
NetworkStream networkStream = clientSocket.GetStream();

byte[] bytes = new byte[clientSocket.ReceiveBufferSize + 1];
networkStream.Read(bytes, 0, clientSocket.ReceiveBufferSize);

string clientdata = Encoding.GetEncoding(1252).GetString(bytes);

This clientdata contains the received string sent from the Offsite Server. This data is not as I excepted for Chinese characters.

I have tried using both Big5 and 1252 encoding.

Any help would be good.

Thanks
Balaji V
Posted

You have to use the same encoding as was used by the "Offsite Server" when sending the data. Try to contact the developers of that product, that's easier than trying all possible encodings.
 
Share this answer
 
No! Don't use Big5 or 1252. .NET works with Unicode. Well, you could use anything in between, but when you present the text, write it to file, etc., you should only use Unicode. As to the 1252 encoding, it only support Latin, not Chinese.

By the way, I'm pretty much sure all your characters are transmitted through the communications just fine.

Pre-Unicode era has finished, just face it.

—SA
 
Share this answer
 
Comments
Balaji1982 22-Mar-12 3:15am    
Hi Sa,

What Unicode Encoding should i use that supports both english and traditional chinese?

there are quite a few i guess

UTF7, UTF8, Unicode, BigEndianUnicode, UTF32
Sergey Alexandrovich Kryukov 22-Mar-12 3:32am    
Any of them, but UTF-8 is the most economic and is the standard-de-facto for the Web. As it is byte-oriented, it takes only one byte for Latin and basic punctuation (for the code points falling into the ASCII range), other code points are of variable size. Overall, it gives considerable economy in size. Besides, there is only one UTF-8, there are no UTF-8LE or UTF-8BE.
--SA
Sergey Alexandrovich Kryukov 22-Mar-12 3:35am    
All UTFs are equivalent. Despite the name with number of bits, they all support 0 to 0x10FFF code points. In the files, there are usually detected by the BOM. Please see:

http://en.wikipedia.org/wiki/Unicode
http://en.wikipedia.org/wiki/Code_point
http://en.wikipedia.org/wiki/Byte_order_mark

http://unicode.org/
http://unicode.org/faq/utf_bom.html

--SA

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900