Click here to Skip to main content
15,887,485 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
I am not able to search for key information as Google only supports exact match. However, due to OCR errors, the OCR text would not exactly represent the text in PDF documents. Are there any techniques/software that can search accurately in spite of bad quality scan documents (and subsequent OCR errors)?

What I have tried:

The scanned PDF documents I am handling have poor scan quality, in spite of searching on Google drive after enabling OCR.
Posted
Updated 24-Jan-22 3:37am
v3

1 solution

"Bad quality" is not exactly an exact measurement, so the question is really "are there any OCR tools that are better at text conversion with poorer quality scanned documents?"

Well, all that depends on how bad the quality of the scanned document is. There is no way for someone to tell you what's going to work with your documents with any accuracy. You simply have to try various libraries until you find something that works with your documents.

The second part of this is searching against words that are poorly spelled. Searching against content like that, there is no such thing as "accurately". There are matches that may be close matches but with a "confidence" value that the match is the word you're looking for. That comes down to the search engine you're using, or going to use. Such engines are going to use various "fuzzy match" techniques to generate results.

*
 
Share this answer
 
v4

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900