Click here to Skip to main content
15,920,508 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
I have develop application for compare two near duplicate documents(pdf files) in c#.

Actually compare the content of two files. In this how much content is matching with

file1 to file2 that means finally howmuch percentage is matching with file1 to file2.

whenever compare the two pdf files howmuch percentage u got.

my code like this

C#
private bool FileCompare(string file1, string file2)
{
int file1byte;
int file2byte;
FileStream fs1;
FileStream fs2;
 
// Determine if the same file was referenced two times.
if (file1 == file2)
{
// Return true to indicate that the files are the same.
return true;
}
 
// Open the two files.
fs1 = new FileStream(file1, FileMode.Open);
fs2 = new FileStream(file2, FileMode.Open);
 
// Check the file sizes. If they are not the same, the files 
// are not the same.
if (fs1.Length != fs2.Length)
{
// Close the file
fs1.Close();
fs2.Close();
 
// Return false to indicate files are different
return false;
}
 
// Read and compare a byte from each file until either a
// non-matching set of bytes is found or until the end of
// file1 is reached.
do
{
// Read one byte from each file.
file1byte = fs1.ReadByte();
file2byte = fs2.ReadByte();
}
while ((file1byte == file2byte) && (file1byte != -1));
 
// Close the files.
fs1.Close();
fs2.Close();
 
// Return the success of the comparison. "file1byte" is 
// equal to "file2byte" at this point only if the files are 
// the same.
return ((file1byte - file2byte) == 0);
}
 
private void PdfCompare_Load(object sender, EventArgs e)
{
 
}
 
private void button1_Click(object sender, EventArgs e)
{
if (FileCompare(this.textBox1.Text, this.textBox2.Text))
{
MessageBox.Show("Files are equal.");
}
else
{
MessageBox.Show("Files are not equal.");
} 
}
 
}


please help me for percentage matching whenever compare two nearduplicate pdf files
Posted
Comments
phil.o 12-Mar-15 17:50pm    
The code you show and claim as yours is a copy/paste from How to create a File-Compare function in Visual C#[^]. What about honesty?

1 solution

Quote:
I have develop application
If you're saying here that you wrote that method: I don't believe you. The English of the code comments doesn't match the English of your question at all.

However, that method compares two files byte-by-byte. I don't think you actually want that. It wouldn't make a lot of sense. Take a text document, make a copy of it and delete only the very first character in the copy. The contents would be the same except that one character is missing from the copy and all other characters are shifted one position. But if you compare the files byte-by-byte your percentual match would be totally random. You have to go for a completely different approach and it is too complex to explain it here. But I will give you these pointers:

Extracting text from PDF-Documents: Toxy[^]

"Diff"-Algorithms/Libraries:
http://stackoverflow.com/questions/138331/any-decent-text-diff-merge-engine-for-net[^]
https://github.com/mmanela/diffplex[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900