Click here to Skip to main content
15,891,864 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Dear Guys.......

i want to find and process all duplicate files in my system.

i supposed to use byte by byte comparison of files. but here the problem is i want to search each files with other files in a directory,folder and sub folders. how to find and list all files in a folder and its sub folder hierarchy.

please give me an idea to find duplicate files in my directory tree.
if possible give me the idea and code if you guys have...

With regards

Irfad C
Posted

Here is the idea of what you should do:

  1. Use the class DirectoryInfo[^] to traverse the file system structure.
  2. Add all files found with their complete path to a list making sure not to add the same file more than once.
  3. Calculate the MD5 hash[^] for the content of each and every file in your list and store that value along with the files path. (maybe SHA-2[^])
  4. Partition all files into sets so that any two files will be in one set if and only if their MD5 or SHA-2 hashes are identical.
  5. For each set do a binary compare of all files in that set against each other.


Regards,

—MRB
 
Share this answer
 
v2
Comments
Mehdi Gholam 9-Dec-11 5:12am    
5'ed
thatraja 9-Dec-11 5:15am    
5!
LanFanNinja 9-Dec-11 5:18am    
+5 I agree!
Michel [mjbohn] 9-Dec-11 5:19am    
and my 5
_Tushar Patil 18-Dec-11 22:48pm    
+5
This will retrieve the paths of all files in a directory including those in sub directories.

C#
string[] files = System.IO.Directory.GetFiles(
    "PathToDirectory", "*.*", System.IO.SearchOption.AllDirectories);
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900