Using iTextSharp, I am trying to make a program that will read a pdf file, extract a text-price (for example $2.00 or 0.20¢) each time it finds one in the file and then displays the whole list.
I am hoping to extract just the prices from a certain page and not the entire pdf file. I would like the program to read each line in the pdf file, and when a line contains the string "SUMMARY OF RATES AND CHARGES," it will start the process of extracting the text prices, and when it reads the string "Summer Commodity," it will break the loop.
Right now the code I have will output every text-price it finds from the file; which is not what I want it to do. It would check if the pdf file has the string ("SUMMARY OF RATES AND CHARGES,") somewhere in the file and if so, will start to extract text prices from the beginning of the pdf file to the end.
I do not want it to start from the beginning but rather it will start once the program reads the line ("SUMMARY OF RATES AND CHARGES"). Once it finds that line, it will continue reading each line till it finds a text price and will begin to extract it. But once the program finds the line ("Summer Commodity"), it will break the loop and stop extracting anymore text prices.
What I have tried:
Imports iTextSharp.text.pdf
Imports iTextSharp.text.pdf.parser
Imports iTextSharp.text
Imports System.IO
Imports System.Text.RegularExpressions
Public Class Form1
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
GetTextFromPDF("C:\Users\Desktop\Tariffrr.pdf")
End Sub
Public Function GetTextFromPDF(ByVal PdfFileName As String) As String
Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
Dim sOut = ""
For i = 1 To oReader.NumberOfPages
Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
Dim adrRx As Regex = New Regex("(\d+\.\d{1,4})")
Dim tarrifs As New List(Of String)
For Each item As Match In adrRx.Matches(sOut.ToLower)
If sOut.ToUpper().Contains("SUMMARY OF RATES AND CHARGES") Then
tarrifs.Add(item.Value)
sOut.ToLower().Contains("R1-Demand")
End If
Next
Dim emailsString As String = Join(tarrifs.Distinct.ToArray, " ")
TextBox1.Text = emailsString
Next
Return sOut
End Function
End Class