Click here to Skip to main content
16,004,406 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
my code like as:

C#
private void button1_Click(object sender, EventArgs e)
        {
            OpenFileDialog openFileDialog = new OpenFileDialog();
            openFileDialog.CheckFileExists = true;
            openFileDialog.AddExtension = true;
            openFileDialog.Filter = "PDF files (*.pdf)|*.pdf";
            DialogResult result = openFileDialog.ShowDialog();
            if (result == DialogResult.OK)
            {
                filename = Path.GetFileName(openFileDialog.FileName);
                path = Path.GetDirectoryName(openFileDialog.FileName);
                textBox1.Text = path + "\\" + filename;
            }
        }

        private void button2_Click(object sender, EventArgs e)
        {

            OpenFileDialog openFileDialog = new OpenFileDialog();
            openFileDialog.CheckFileExists = true;
            openFileDialog.AddExtension = true;
            openFileDialog.Filter = "PDF files (*.pdf)|*.pdf";
            DialogResult result = openFileDialog.ShowDialog();
            if (result == DialogResult.OK)
            {
                filename = Path.GetFileName(openFileDialog.FileName);
                path = Path.GetDirectoryName(openFileDialog.FileName);
                textBox2.Text = path + "\\" + filename;
            }
        }
       private void button3_Click(object sender, EventArgs e)
        {
            string s = ExtractTextFromPdf(textBox1.Text);
            string s1 = Extract(textBox2.Text);
             d = CalculateSimilarity(s, s1);
             d = Math.Round(d, 0);
            label1.Visible = true;
            label1.Text = Convert.ToString(d);
        }


     
        double CalculateSimilarity(string source, string target)
        {
            if ((source == null) || (target == null)) return 0.0;
            if ((source.Length == 0) || (target.Length == 0)) return 0.0;
            if (source == target) return 1.0;

            int stepsToSame = ComputeLevenshteinDistance(source, target);
            double ds =(1.0 - ((double)stepsToSame / (double)Math.Max(source.Length, target.Length)))*100;
            return ds;
        }


         int ComputeLevenshteinDistance(string source, string target)
        {
            if ((source == null) || (target == null)) return 0;
            if ((source.Length == 0) || (target.Length == 0)) return 0;
            if (source == target) return source.Length;

            int sourceWordCount = source.Length;
            int targetWordCount = target.Length;

            // Step 1
            if (sourceWordCount == 0)
                return targetWordCount;

            if (targetWordCount == 0)
                return sourceWordCount;

        
            int[,] distance = new int[sourceWordCount + 1, targetWordCount + 1];

            // Step 2
            for (int i = 0; i <= sourceWordCount; distance[i, 0] = i++) ;
            for (int j = 0; j <= targetWordCount; distance[0, j] = j++) ;

            for (int i = 1; i <= sourceWordCount; i++)
            {
                for (int j = 1; j <= targetWordCount; j++)
                {
                    // Step 3
                    int cost = (target[j - 1] == source[i - 1]) ? 0 : 1;

                    // Step 4
                    distance[i, j] = Math.Min(Math.Min(distance[i - 1, j] + 1, distance[i, j - 1] + 1), distance[i - 1, j - 1] + cost);
                }
            }

            return distance[sourceWordCount, targetWordCount];
        }



My application theme is geting percentage whenever compare pdf files. thats why i

have read the content of pdf files and stored in strings after that i want to compare

the two strings using LevenshteinDistance alogrithem.at the time i have error occured

in method -ComputeLevenshteinDistance.

error statment like as:

C#
int[,] distance = new int[sourceWordCount + 1, targetWordCount + 1];


Finally error is
An unhandled exception of type 'System.OutOfMemoryException' occurred in WindowsFormsApplication1.exe.

please help me.thank u
Posted
Updated 24-Mar-15 22:28pm
v3

Have you debug it?

It seems like sourceWordCount or targetWordCount are too big.
 
Share this answer
 
Comments
Krishna Veni 25-Mar-15 4:41am    
ya. debug. i want how can u slove
The implementation of the Levenshtein Algorithm that you're using requires (FileLength1 * FileLength2) bytes of memory. Depending on the size of PDF files you're trying to compare, this obviously can quickly result in a OutOfMemory-Exception, especially if you're running your application as a 32 bit process.

Take a look at this alternative Levenshtein-Implementation which, as the author claims, only requires 2*Min(StrLen1,StrLen2) bytes: Fast, memory efficient Levenshtein algorithm[^]
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900