Click here to Skip to main content
15,895,423 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Friends,

I want to upload .txt only in file uploader.
But when i rename fileName.exe to fileName.txt and upload, it got uploaded.
Its eventually wrong!!

Is possible to check file content in C# for .txt files?

I come across validating file content for .pdf, .xls, .doc


Thanx in Advance
~~Karthik.J~~
Posted

A text file can contain, well, any text. 'Text' means any printable character. So you could validate it by checking that it's either valid UTF-8, and that all the code points are greater than 31, or if it's not valid UTF-8, assume that it's ANSI and check that all the byte values are greater than 31. I don't remember the UTF-8 standard that well so simply checking the byte values may be okay (I think UTF-8 only uses byte values that aren't control characters already).

If you expect to read UTF-16 text files then you're pretty much screwed because the byte values in there can be anything.

However, as long as you don't set the file to be executable (Linux) or set it to an executable name (Windows), does it actually matter if it's an EXE in disguise? It can't be run anyway.
 
Share this answer
 
You can try to read a file and see if it contains only what APPEARS to be valid text, but a txt file is raw data, there's no headers you can check.

Why do you think people are going to rename exe files and upload them ?
 
Share this answer
 
Comments
[no name] 22-Aug-12 7:13am    
@chris need to validate for my high secured appln
Christian Graus 22-Aug-12 7:15am    
But, if it's named .txt, then what does it matter ? So long as no-one can get in to your server to rename it and run it, and if someone has rename rights, they could probably write an exe there anyhow ? There's no clear way to do this 100% of the time. You could perhaps anticipate that you want it to not be an exe, and look for headers in that regard, but to validate that it's a txt file and not an image of a Gameboy ROM, a weird, forgotten image format or something else ? Very hard.
There is no full proof method what you are asking for.

One thing you can do is to check the binary array length of the content and compare it with text string content.
If same, then we are sure there is no non-readable characters.

The best way is to use some factor, like if more than 90% is readable, then mark it as text file.

This method needs some testing.

Hope this helps.
cheers
 
Share this answer
 
You obviously would have to check the actual content of the file. See if this Link[^] helps.

Every format has a unique byte layout that should tell them apart even if you rename the extension.

[EDIT:] obviously a real text file is just going to be plain text so no format layout as such
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900