Click here to Skip to main content
15,897,518 members
Articles / Programming Languages / C#

FileDiff2 Optimized

Rate me:
Please Sign up or sign in to vote.
3.80/5 (3 votes)
13 Aug 2009CPOL 31K   163   13   14
A file diff utility.

Introduction

This application is pretty basic, it uses FileStream objects to perform its task.

Using the code

C#
ASCIIEncoding Encode = new ASCIIEncoding();

// Open the files
//
FileStream streamA = File.OpenRead(args[0]);
FileStream streamB = File.OpenRead(args[1]);

// Get the stream length
// (so we don't have to caluculate this a million times)
//
long lenA = streamA.Length - 1;
long lenB = streamB.Length - 1;

// Read the bytes
//
int byteA;
int byteB;

do
{
    // Read the streams
    //
    byteA = streamA.ReadByte();
    byteB = streamB.ReadByte();

    // Are they the same
    //
    if (byteA != byteB)
    {
        // Remember where we parked the car
        //
        long startPos = streamB.Position;

        // Read streamB until we = StreamA
        //
        do
        {
            byteB = streamB.ReadByte();
        }
        while (byteA != byteB && streamB.Position <= lenB);

        // How long is the difference?
        //
        long length = streamB.Position - startPos;

        // Read the bytes
        //
        byte[] theseBytes = new byte[length];
        streamB.Seek(length * -1, SeekOrigin.Current);|
        streamB.Read(theseBytes, 0, (int)length);

        Console.WriteLine("Pos:{0}, Len:{1}, Str:{2}", startPos, 
                          length, Encode.GetString(theseBytes));
    }
}
while (streamA.Position <= lenA && streamB.Position <= lenB);

streamA.Close();
streamB.Close();

History

  • Aug 12, 2009: Written.
  • Aug 13, 2009: Rewritten to be more explicit, and I modified the file seeking and some variables to bring this down form 57ms run time to 27ms run time.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States
I started programming for fun when I was about 10 on an Franklin Ace 1000.

I still do it just for fun but it has gotten me a few jobs over the years. More then I can say for my Microsoft Certifications. Smile | :)

The way I learned was by example, now its time to give back to the next generation of coders.



Comments and Discussions

 
GeneralWorks, but that's it... Pin
Rasqual Twilight22-Aug-09 0:58
Rasqual Twilight22-Aug-09 0:58 
GeneralMy solution Pin
Pete Souza IV14-Aug-09 6:31
professionalPete Souza IV14-Aug-09 6:31 
GeneralRe: My solution Pin
Matthew Hazlett14-Aug-09 8:51
Matthew Hazlett14-Aug-09 8:51 
GeneralRe: My solution Pin
Pete Souza IV20-Aug-09 10:33
professionalPete Souza IV20-Aug-09 10:33 
GeneralRe: My solution Pin
Matthew Hazlett21-Aug-09 20:07
Matthew Hazlett21-Aug-09 20:07 
GeneralRe: My solution Pin
Pete Souza IV25-Aug-09 6:56
professionalPete Souza IV25-Aug-09 6:56 
GeneralRe: My solution Pin
Matthew Hazlett25-Aug-09 8:09
Matthew Hazlett25-Aug-09 8:09 
GeneralRe: My solution Pin
Pete Souza IV25-Aug-09 8:15
professionalPete Souza IV25-Aug-09 8:15 
GeneralRe: My solution Pin
Matthew Hazlett25-Aug-09 19:38
Matthew Hazlett25-Aug-09 19:38 
GeneralSome thoughts... Pin
Pete Souza IV13-Aug-09 13:50
professionalPete Souza IV13-Aug-09 13:50 
GeneralRe: Some thoughts... Pin
Matthew Hazlett13-Aug-09 14:52
Matthew Hazlett13-Aug-09 14:52 
You are quite correct in your analysis

If fileA contains abcde and fileB contains acde according to this comparison the difference would be cde not simply b. However, if you reversed the compare order the results would indeed be b. (It depends on who is being compared to whom (you can't compare both A to B and B to A when you were explicitly asked to compare A to B).

In your second example --
This is some text.
This is slightly more text.

You are again quite correct, I had considered comparing words or lines instead of bytes but that would create some unique problems as well as only work for ascii compares and not binary comparisons (yes, binary compares was not a requirement).

If you were comparing by line-by-line as you suggested

A: "This is some text"
B: "This is some
text"

Both lines would be flagged when in reality there is only a character difference between the two files. The same applies if you are splitting by words

A: "This is so
me text"
"This is some text"

Again only a character difference but if it were going off words there would be three discrepancies "so, me and some".

Byte comparison has it's advantages and disadvantages. But the contest did say compare the files in the fastest time with the least amount or overhead (code bulk). Somehow, adding more logic thus increasing memory usage, executable size and execution time when it's not needed seemed counter productive.

Matthew Hazlett
Fighting the good fight for web usability.

GeneralRe: Some thoughts... Pin
Pete Souza IV13-Aug-09 16:07
professionalPete Souza IV13-Aug-09 16:07 
GeneralRe: Some thoughts... Pin
Matthew Hazlett13-Aug-09 16:41
Matthew Hazlett13-Aug-09 16:41 
GeneralRe: Some thoughts... Pin
Pete Souza IV14-Aug-09 1:59
professionalPete Souza IV14-Aug-09 1:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.