Click here to Skip to main content
15,890,897 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hi,
I was trying to find the encoding type of a file like unicode, utf8, utf8 with BOM, ANSI etc. I was able to find all the encoding type but ANSI(Encoding.Default/Windows- 1252). I am not able to differentiate ANSI and UTF8. Tried different custom class like (Ude, TextFileEncodingDetector etc) which guesses it but not exactly right. Is there any way to do it?
Posted

Unless the document uses any characters >= 0x80, 1252 and UTF-8 would be indistinguishable (unless a BOM is present).

If it does use characters >= 0x80, it would be a matter of checking the documents for tell-tale indicators, see:

http://en.wikipedia.org/wiki/UTF-8#Codepage_layout
http://en.wikipedia.org/wiki/Windows-1252#Codepage_layout
 
Share this answer
 
v2
Comments
jebin Cherian 18-Sep-12 10:22am    
Thanks for the reply Yvan. Will that differ according to the languages.
Yvan Rodrigues 18-Sep-12 10:30am    
This would be true of all languages.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900