Click here to Skip to main content
15,891,905 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
I'm using the following regular expression to remove non ascii from a string:
Regex.Replace(item, @"[^\u0020-\u007E]", string.Empty);


I'm coming into a problem when parsing a chinese string. It removes all the characters completely which I don't want. Is there away around this?

What I have tried:

I've tried the following
Regex.Replace(item, @"[^\u0020-\u007E]", string.Empty);
Posted
Updated 12-Jan-17 4:53am
Comments
Tomas Takac 12-Jan-17 10:48am    
What exactly is your requirement? Chinese characters are not ASCII so "removing non-ascii characters" part works as intended. If you want to remove all characters that are not letters or numbers have a look at Char.IsLetterOrDigit method.
Jochen Arndt 12-Jan-17 10:50am    
If you have a string containing only chinese characters, it will off course remove all of them resulting in an empty string.
Member 4336594 12-Jan-17 11:00am    
I encountered a problem using a web service and the string I passed to it contained a non ascii character and failed. To get around this I removed all non-ascii characters. The requirement is to still be able to process other languages which this won't. The original character looked like a 't' without the tail (forgive me for the poor description).

1 solution

Well, Chinese characters are Unicode characters, which are, by definition, non-ascii. What result did you expect?
 
Share this answer
 
Comments
Member 4336594 12-Jan-17 11:20am    
This is the character I wanted to remove :
http://images.devs-on.net/Image/1UIb2MJ0yzmafaTM-Region.png
Which the above code does, but also Chinese and other languages as well. I'm not sure I'm using the correct terminology. I would like to remove any of these characters.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900