So at the moment, I have a PDF I am parsing. I have split it where I need it to
CYT-HMI-S-005 CarrierSCAC: BPUS Stop Number:1 Release Type: AUTO
Route Name:
Days to
Pick-up Delivery
Final
Order Due @
Frequency: Frequency:
Location
Carrier Arrival Carrier Departure Plant Destination Dock Code Initial Dest Final Location
Initial Arrival
10:45 11:15 HMI IHMI1 HMI 14:45 14:45 0 MTWRFS MTWRFS
The data in this line
10:15 11:00 HMI IOSL1 OSL 17:15 17:15 0 MTWRFS MTWRFS
is just data that includes time arrived, time departure, company, ship-to site and so fourth. The data isn't the problem. The issue is importing that line into the datagrid under specific columns.
So after every "Initial Arrival" I would like to parse that text line
10:15 11:00 HMI IOSL1 OSL 17:15 17:15 0 MTWRFS MTWRFS
and insert it into a datagrid.
Or Plan B:
I run a query from the database that returns the shipments and is able to match results with parsed pdf data via a column
TLDR: There is important data stored in 2 different places. The PDF data is sent monthly from a company. I need to be able to match columns up from the database and the pdf parsed data.
What I have tried:
public static void ReadPDF()
{
PdfReader reader = new PdfReader(@"file.pdf");
int intPageNum = reader.NumberOfPages;
string text;
string[] words;
string line;
for (int i = 1; i <= intPageNum; i++)
{
text = PdfTextExtractor.GetTextFromPage(reader, i, new LocationTextExtractionStrategy());
words = text.Split($"Delivery Frequency:");
for (int j = 0, len = words.Length; j < len; j++)
{
line = Encoding.UTF8.GetString(Encoding.UTF8.GetBytes(words[j]));
Debug.Print(line);
}
}
This is the code I am using to parse the PDF data.