Click here to Skip to main content
15,879,535 members
Articles / Desktop Programming / Win32

A Quick Program for File Hash Comparison with MD5 and SHA1

Rate me:
Please Sign up or sign in to vote.
4.43/5 (3 votes)
1 Apr 2016CPOL2 min read 10.6K   7   1
A quick program to compute and measure hash comparison

Introduction

In general, if we need to detect changes in file system or directory of files, we generally use file system watcher provided in .NET. However, after learning its side effects, it seems that it is just a suggestive class which is not having any real benefits as such. Another reason not to use file system watcher class is that, it generally doesn't care about file content and it takes care about file system in general. So I have found hashing a better way. 

Background

In this article, I will try to answer a common question programmers ask about hashing, i.e., what time it will take to compute hash of files in my directory, what if I am having sub folders in parent folder. Will it be fast enough for normal application deployment file structures having few Mbs of files. To answer these questions, I wrote a small utility and ran it on my file structure with around 45 files having few Mbs of size of whole directory. And the result was fast enough. It took only 50-60 milliseconds to compute hash and it took the same time to validate the hash.

Using the Code

Please observe below code file. I tried computing hash in both MD5 and SHA1 hash algos. Both algos take the same time to hash file content. Please note that we are here hashing actual file content. If there would be any change in file content, even a new space or a character, hash of whole file will be changed. However, it is also important to note that any change in file attributes like last file modification time, etc. won't effect hash result.

C#
public class DeploymentFile
   {
       public string FilePath { get; set; }
       public bool IsFilePathValid { get; set; }
       public string HashedValue { get; set; }
       public bool IsFileModified { get; set; }

       public DeploymentFile(string filePath)
       {
           FilePath = filePath;
           IsFilePathValid = true;
           IsFileModified = false;
           if (File.Exists(filePath))
               HashedValue = ComputeHashSHA(filePath);
           else
               IsFilePathValid = false;
       }

       public bool IsExist(string FilePath)
       {
           return File.Exists(FilePath);
       }

       //public string ComputeHashMD5(string filename)
       //{
       //    using (var md5 = MD5.Create())
       //    {
       //        using (var stream = File.OpenRead(filename))
       //        {
       //            return (Encoding.Default.GetString(md5.ComputeHash(stream)));
       //        }
       //    }
       //}

       public string ComputeHashSHA(string filename)
       {
           using (var sha = SHA1.Create())
           {
               using (var stream = File.OpenRead(filename))
               {
                   return (Encoding.Default.GetString(sha.ComputeHash(stream)));
               }
           }
       }
   }

Shown below is code for Form which displays all controls. You may observe that I am using a stopwatch to measure the time taken for whole process of computation of hash.
IMPORTANT: Please note that if message box appears, the stopwatch measures all time while user clicks and closes the message box. So to measure accurately, one may disable the message box.

C#
public partial class FileValidator : Form
    {
        public FileValidator()
        {
            InitializeComponent();
        }
        List<DeploymentFile> DeployList;
        List<DeploymentFile> ValidationList;
        String filePath;
        

        #region ComputeHash
        private void ComputeHash_Click(object sender, EventArgs e)
        {
           DeployList = new List<DeploymentFile>();
           foreach (var item in GetListOfFilesInDeployFolder())
               DeployList.Add(new DeploymentFile(item));
           FilesGrid.DataSource = DeployList;
        }

        #endregion ComputeHash

        #region ValidateFileHash
        private void ValidateHash_Click(object sender, EventArgs e)
        {            
            Stopwatch stopwatch = new Stopwatch();
            // Begin timing.
            stopwatch.Start();
            bool Abort = false;
            List<string> filesList = GetListOfFilesInDeployFolder();
            ValidationList = new List<DeploymentFile>();
            foreach (var item in DeployList)
                ValidationList.Add(new DeploymentFile(item.FilePath));

            //If new files are not added or deleted
            for (int i = 0; i < ValidationList.Count; i++)
            {
                if (ValidationList.Count != filesList.Count) Abort = true;
                if (ValidationList[i].FilePath != filesList[i]) Abort = true;
            }
            //if all files are valid and exists in directory
            if (!Abort && ValidationList.Exists((x)=>x.IsFilePathValid==false))
                Abort = true;

            if (Abort)
            {
              //disable message box to calculate accurate execution time through stop watch.
                MessageBox.Show("Files/Folder structure changed or modified since last check");
            }

            if(!Abort)
            {
                for (int i = 0; i < ValidationList.Count; i++)
                    if (ValidationList[i].HashedValue != DeployList[i].HashedValue)
                    {
                        ValidationList[i].IsFileModified = true;
                        Abort = true; 
                    }
            }

            FilesGrid.DataSource = ValidationList;
            
            stopwatch.Stop();
            label1.Text = "Time taken in Validation : " + stopwatch.Elapsed;            
        }

        #endregion
       
        private List<string> GetListOfFilesInDeployFolder()
        {
            filePath = textBox1.Text;
            return Directory.GetFiles(@filePath,"*",SearchOption.AllDirectories).ToList();
        }

        private void FileValidator_Load(object sender, EventArgs e)
        {
            FilesGrid.AutoSizeColumnsMode = DataGridViewAutoSizeColumnsMode.DisplayedCells;
        }               
    }

The above screenshot displays the time taken for computation and validation of hash. If there will be some file modification in between compute hash button click and check for modifications button click, then those modifications will display in IsFilemodified column. I am also recording the file structure and comparing it with file structure, any change in file path will be shown in IsFilePathValid column.

Points of Interest

It is interesting to find out that SHA1 and MD5 algo take similar time for fewer files. If file count increases and file size increases, MD5 algo is more efficient than SHA1. However, SHA1 is more trustful in the developer circles. I think MD5 is better because we are not really challenging security here, we are more concerned about integrity of file content. 

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
India India
Hi, I am software developer from Delhi, India. I work in .Net and related technologies.

Personally I like technology and playing with different new features & possibilities.

Comments and Discussions

 
QuestionA Quick Program for File Hash Comparison with MD5 and SHA1 Pin
Jarvis Chen26-Jul-17 0:06
Jarvis Chen26-Jul-17 0:06 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.