Remove non ascii characters removes chinese characters

Question

1.00/5 (2 votes)

See more:

I'm using the following regular expression to remove non ascii from a string:

Regex.Replace(item, @"[^\u0020-\u007E]", string.Empty);

I'm coming into a problem when parsing a chinese string. It removes all the characters completely which I don't want. Is there away around this?

What I have tried:

I've tried the following

Regex.Replace(item, @"[^\u0020-\u007E]", string.Empty);

Posted 12-Jan-17 4:28am

Member 4336594

Updated 12-Jan-17 4:53am

Add a Solution

Comments

Tomas Takac 12-Jan-17 10:48am

What exactly is your requirement? Chinese characters are not ASCII so "removing non-ascii characters" part works as intended. If you want to remove all characters that are not letters or numbers have a look at Char.IsLetterOrDigit method.

Jochen Arndt 12-Jan-17 10:50am

If you have a string containing only chinese characters, it will off course remove all of them resulting in an empty string.

Member 4336594 12-Jan-17 11:00am

I encountered a problem using a web service and the string I passed to it contained a non ascii character and failed. To get around this I removed all non-ascii characters. The requirement is to still be able to process other languages which this won't. The original character looked like a 't' without the tail (forgive me for the poor description).

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

#realJSOP · Accepted Answer · 2017-01-12T04:53:00

Solution 1

Well, Chinese characters are Unicode characters, which are, by definition, non-ascii. What result did you expect?

Posted 12-Jan-17 4:53am

#realJSOP

Comments

Member 4336594 12-Jan-17 11:20am

This is the character I wanted to remove :
http://images.devs-on.net/Image/1UIb2MJ0yzmafaTM-Region.png
Which the above code does, but also Chinese and other languages as well. I'm not sure I'm using the correct terminology. I would like to remove any of these characters.