I have a project to read the content of 10-15 different vendor statements that are in PDF form, and have different layouts. I assume I'll have to write 15 different parsers to capture the information, put it in structured format and store in a database. I know there are many different PDF to text APIs, but wanted to get insight if others have done such a thing and best methods. I'm fluent in Java/C#/Python. Anyone have wisdom to offer?
Thanks,
Kasey
What I have tried:
Looked at PDFTools framework, a few c# and python libraries.