Click here to Skip to main content
15,891,777 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I have a data whose encoding format is in UTF-8.While converting the UTF-8 data to ISO,certain characters gets broken. I need to remove all the unicode broken characters present in the ISO encoded data.I would like to do this in c#.Please suggest some solution.

Current code is as follows:

VB
Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
strRawJobtext = utf8.GetString(nonunicodeBytes);
//nonunicodeBytes is a raw input data,
nonunicodeBytes = Encoding.Convert(utf8, iso, utf8.GetBytes(strRawJobtext));
strRawJobtext = iso.GetString(nonunicodeBytes);


Thanks,
Ruthra Vijayakumar
Posted
Updated 28-May-13 21:39pm
v2
Comments
Sergey Alexandrovich Kryukov 29-May-13 3:02am    
What do you mean my ISO, exactly? There are many ISOs. However, for .NET Unicode is the standard, and Unicode is used internally for strings (UTF-16LE). Please show the declaration of ISO.

Anyway, only Unicode covers all the character repertoire. Any of UTFs will do, but nothing else.

—SA
Ruthra vijayakumar 29-May-13 3:39am    
I have added the declaration part

1 solution

Please see my comment to the question. Your input can be in any encoding, but if your input is UTF-8, it potentially covers all the Unicode character repertoire. It means, that you should use only Unicode for further processing, nothing else. You can store the data in any of the UTFs, they all are equivalent, but UTF-8 is the most preferable in almost all cases. With other encodings, you may loose characters.

—SA
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900