Click here to Skip to main content
15,901,666 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I have a file of type / pdf / that contains an unformatted table (the design of the cells in it is irregular (the address of a cell in the line and under it in the next line is the value or the line that follows it ...))
I converted it to text using the /asp.net & C# / and library / Iron OCR /
Where the result is data collected in consecutive lines.
How can I extract field values? To store them in a database


What I have tried:

pdf file

Title 1
Value 1

title2_____title3
Value2
___________Value3

title4____value4

_________________
_________________

Result as Text

Title1
value1
title2 title3
value2
value3
title4 value4
Posted
Updated 24-Nov-20 7:38am

1 solution

public Dictionary<string, string> Parser(string jsonTemplate, string data)
        {
var arrRow = data.Split("/n");
JavaScriptSerializer serializer = new JavaScriptSerializer();
var jsonObject = serializer.Deserialize(jsonTemplate);
  //plz see here how to use it https://stackoverflow.com/questions/2246694/how-to-convert-json-object-to-custom-c-sharp-object

//then get an item from arrRow with title3. The arrRow is an array so just get value according to the position from that title.
        }


Sorry I have no time to make it works. I hope you will get an idea.
 
Share this answer
 
Comments
Member 15000927 24-Nov-20 17:41pm    
thank you

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900