Remove unicode characters from ISO encoded string in c#

Question

1.00/5 (1 vote)

See more:

I have a data whose encoding format is in UTF-8.While converting the UTF-8 data to ISO,certain characters gets broken. I need to remove all the unicode broken characters present in the ISO encoded data.I would like to do this in c#.Please suggest some solution.

Current code is as follows:

VB

Encoding iso = Encoding.GetEncoding("ISO-8859-1");
Encoding utf8 = Encoding.UTF8;
strRawJobtext = utf8.GetString(nonunicodeBytes);
//nonunicodeBytes is a raw input data,
nonunicodeBytes = Encoding.Convert(utf8, iso, utf8.GetBytes(strRawJobtext));
strRawJobtext = iso.GetString(nonunicodeBytes);

Thanks,
Ruthra Vijayakumar

Posted 28-May-13 20:50pm

Ruthra vijayakumar

Updated 28-May-13 21:39pm

v2

Add a Solution

Comments

Sergey Alexandrovich Kryukov 29-May-13 3:02am

What do you mean my ISO, exactly? There are many ISOs. However, for .NET Unicode is the standard, and Unicode is used internally for strings (UTF-16LE). Please show the declaration of ISO.

Anyway, only Unicode covers all the character repertoire. Any of UTFs will do, but nothing else.

—SA

Ruthra vijayakumar 29-May-13 3:39am

I have added the declaration part

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Sergey Alexandrovich Kryukov · Answer 1 · 2013-05-28T21:05:00

Please see my comment to the question. Your input can be in any encoding, but if your input is UTF-8, it potentially covers all the Unicode character repertoire. It means, that you should use only Unicode for further processing, nothing else. You can store the data in any of the UTFs, they all are equivalent, but UTF-8 is the most preferable in almost all cases. With other encodings, you may loose characters.

—SA