Click here to Skip to main content
15,906,708 members
Articles / Programming Languages / C#

FileDiff2 Optimized

Rate me:
Please Sign up or sign in to vote.
3.80/5 (3 votes)
13 Aug 2009CPOL 31.1K   163   13   14
A file diff utility.

Introduction

This application is pretty basic, it uses FileStream objects to perform its task.

Using the code

C#
ASCIIEncoding Encode = new ASCIIEncoding();

// Open the files
//
FileStream streamA = File.OpenRead(args[0]);
FileStream streamB = File.OpenRead(args[1]);

// Get the stream length
// (so we don't have to caluculate this a million times)
//
long lenA = streamA.Length - 1;
long lenB = streamB.Length - 1;

// Read the bytes
//
int byteA;
int byteB;

do
{
    // Read the streams
    //
    byteA = streamA.ReadByte();
    byteB = streamB.ReadByte();

    // Are they the same
    //
    if (byteA != byteB)
    {
        // Remember where we parked the car
        //
        long startPos = streamB.Position;

        // Read streamB until we = StreamA
        //
        do
        {
            byteB = streamB.ReadByte();
        }
        while (byteA != byteB && streamB.Position <= lenB);

        // How long is the difference?
        //
        long length = streamB.Position - startPos;

        // Read the bytes
        //
        byte[] theseBytes = new byte[length];
        streamB.Seek(length * -1, SeekOrigin.Current);|
        streamB.Read(theseBytes, 0, (int)length);

        Console.WriteLine("Pos:{0}, Len:{1}, Str:{2}", startPos, 
                          length, Encode.GetString(theseBytes));
    }
}
while (streamA.Position <= lenA && streamB.Position <= lenB);

streamA.Close();
streamB.Close();

History

  • Aug 12, 2009: Written.
  • Aug 13, 2009: Rewritten to be more explicit, and I modified the file seeking and some variables to bring this down form 57ms run time to 27ms run time.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Web Developer
United States United States
I started programming for fun when I was about 10 on an Franklin Ace 1000.

I still do it just for fun but it has gotten me a few jobs over the years. More then I can say for my Microsoft Certifications. Smile | :)

The way I learned was by example, now its time to give back to the next generation of coders.



Comments and Discussions

 
GeneralWorks, but that's it... Pin
Rasqual Twilight22-Aug-09 0:58
Rasqual Twilight22-Aug-09 0:58 
GeneralMy solution Pin
Pete Souza IV14-Aug-09 6:31
professionalPete Souza IV14-Aug-09 6:31 
GeneralRe: My solution Pin
Matthew Hazlett14-Aug-09 8:51
Matthew Hazlett14-Aug-09 8:51 
GeneralRe: My solution Pin
Pete Souza IV20-Aug-09 10:33
professionalPete Souza IV20-Aug-09 10:33 
GeneralRe: My solution Pin
Matthew Hazlett21-Aug-09 20:07
Matthew Hazlett21-Aug-09 20:07 
GeneralRe: My solution Pin
Pete Souza IV25-Aug-09 6:56
professionalPete Souza IV25-Aug-09 6:56 
GeneralRe: My solution Pin
Matthew Hazlett25-Aug-09 8:09
Matthew Hazlett25-Aug-09 8:09 
GeneralRe: My solution Pin
Pete Souza IV25-Aug-09 8:15
professionalPete Souza IV25-Aug-09 8:15 
GeneralRe: My solution Pin
Matthew Hazlett25-Aug-09 19:38
Matthew Hazlett25-Aug-09 19:38 
GeneralSome thoughts... Pin
Pete Souza IV13-Aug-09 13:50
professionalPete Souza IV13-Aug-09 13:50 
While reviewing your code, I came across the following considerations:

Your approach does provide a (very) simple solution to comparing the bytes of a second file against a first with limited detection when discrepancies between the two end.   Please correct me if I'm wrong, but doesn't your approach suffer greatly when a character appearing in fileA doesn't appear in fileB?

Example, if your fileA's text only had the letter 'z' once and the only differences between the two files was a missing 'z' in fileB, your approach would mark the entire remainder of the contents of the file as a change.

i.e.:

fileA:
A1234
B1234
Z1234
T1234
R1234

fileB:
A1234
B1234
1234
T1234
R1234

Instead of your approach detecting only the missing 'Z', you will instead detect the entire remainder of the file as a difference in text.   This is because your approach is very aware of additional changes to the text from fileA -> fileB but is not at all aware of deletions of text from fileA -> fileB.

Also, and this is a gray area in regards to the actual contest, but I would think it'd be a poor idea to do a byte-by-byte comparison.   While there's more overhead in line-by-line (or other methods), you don't run the risk of this error:

This is some text.
This is slightly more text.

You should detect the removal of 'some' and addition of 'slightly more', but your application will actually detect the addition of 'lightly m' and 're text.'.   Even worse, it will at this point ignore the fact that 'me text.' was removed from the file (it had left over parts in fileA that weren't detected in fileB because it kept waiting for an 'm' to follow an 'o' -- something that never occurs in fileB).

What are your thoughts?
GeneralRe: Some thoughts... Pin
Matthew Hazlett13-Aug-09 14:52
Matthew Hazlett13-Aug-09 14:52 
GeneralRe: Some thoughts... Pin
Pete Souza IV13-Aug-09 16:07
professionalPete Souza IV13-Aug-09 16:07 
GeneralRe: Some thoughts... Pin
Matthew Hazlett13-Aug-09 16:41
Matthew Hazlett13-Aug-09 16:41 
GeneralRe: Some thoughts... Pin
Pete Souza IV14-Aug-09 1:59
professionalPete Souza IV14-Aug-09 1:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.