compare two near duplicate documents(pdf files) in c#

Question

1.00/5 (2 votes)

See more:

I have develop application for compare two near duplicate documents(pdf files) in c#.

Actually compare the content of two files. In this how much content is matching with

file1 to file2 that means finally howmuch percentage is matching with file1 to file2.

whenever compare the two pdf files howmuch percentage u got.

my code like this

C#

private bool FileCompare(string file1, string file2)
{
int file1byte;
int file2byte;
FileStream fs1;
FileStream fs2;
 
// Determine if the same file was referenced two times.
if (file1 == file2)
{
// Return true to indicate that the files are the same.
return true;
}
 
// Open the two files.
fs1 = new FileStream(file1, FileMode.Open);
fs2 = new FileStream(file2, FileMode.Open);
 
// Check the file sizes. If they are not the same, the files 
// are not the same.
if (fs1.Length != fs2.Length)
{
// Close the file
fs1.Close();
fs2.Close();
 
// Return false to indicate files are different
return false;
}
 
// Read and compare a byte from each file until either a
// non-matching set of bytes is found or until the end of
// file1 is reached.
do
{
// Read one byte from each file.
file1byte = fs1.ReadByte();
file2byte = fs2.ReadByte();
}
while ((file1byte == file2byte) && (file1byte != -1));
 
// Close the files.
fs1.Close();
fs2.Close();
 
// Return the success of the comparison. "file1byte" is 
// equal to "file2byte" at this point only if the files are 
// the same.
return ((file1byte - file2byte) == 0);
}
 
private void PdfCompare_Load(object sender, EventArgs e)
{
 
}
 
private void button1_Click(object sender, EventArgs e)
{
if (FileCompare(this.textBox1.Text, this.textBox2.Text))
{
MessageBox.Show("Files are equal.");
}
else
{
MessageBox.Show("Files are not equal.");
} 
}
 
}

please help me for percentage matching whenever compare two nearduplicate pdf files

Posted 12-Mar-15 7:45am

Krishna Veni

Add a Solution

Comments

phil.o 12-Mar-15 17:50pm

The code you show and claim as yours is a copy/paste from How to create a File-Compare function in Visual C#[^]. What about honesty?

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

manchanx · Answer 1 · 2015-03-12T11:38:00

Quote:
I have develop application

If you're saying here that you wrote that method: I don't believe you. The English of the code comments doesn't match the English of your question at all.

However, that method compares two files byte-by-byte. I don't think you actually want that. It wouldn't make a lot of sense. Take a text document, make a copy of it and delete only the very first character in the copy. The contents would be the same except that one character is missing from the copy and all other characters are shifted one position. But if you compare the files byte-by-byte your percentual match would be totally random. You have to go for a completely different approach and it is too complex to explain it here. But I will give you these pointers:

Extracting text from PDF-Documents: Toxy[^]

"Diff"-Algorithms/Libraries:
http://stackoverflow.com/questions/138331/any-decent-text-diff-merge-engine-for-net[^]
https://github.com/mmanela/diffplex[^]