Click here to Skip to main content
15,889,874 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Using iTextSharp, I am trying to make a program that will read a pdf file, extract a text-price (for example $2.00 or 0.20¢) each time it finds one in the file and then displays the whole list.

I am hoping to extract just the prices from a certain page and not the entire pdf file. I would like the program to read each line in the pdf file, and when a line contains the string "SUMMARY OF RATES AND CHARGES," it will start the process of extracting the text prices, and when it reads the string "Summer Commodity," it will break the loop.

Right now the code I have will output every text-price it finds from the file; which is not what I want it to do. It would check if the pdf file has the string ("SUMMARY OF RATES AND CHARGES,") somewhere in the file and if so, will start to extract text prices from the beginning of the pdf file to the end.

I do not want it to start from the beginning but rather it will start once the program reads the line ("SUMMARY OF RATES AND CHARGES"). Once it finds that line, it will continue reading each line till it finds a text price and will begin to extract it. But once the program finds the line ("Summer Commodity"), it will break the loop and stop extracting anymore text prices.

What I have tried:

Imports iTextSharp.text.pdf
Imports iTextSharp.text.pdf.parser
Imports iTextSharp.text
Imports System.IO
Imports System.Text.RegularExpressions

Public Class Form1
    Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        GetTextFromPDF("C:\Users\Desktop\Tariffrr.pdf")
    End Sub
    Public Function GetTextFromPDF(ByVal PdfFileName As String) As String
            Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
            Dim sOut = ""
        For i = 1 To oReader.NumberOfPages

            Dim its As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
            sOut &= iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(oReader, i, its)
            Dim adrRx As Regex = New Regex("(\d+\.\d{1,4})")

            Dim tarrifs As New List(Of String)



            For Each item As Match In adrRx.Matches(sOut.ToLower)

                If sOut.ToUpper().Contains("SUMMARY OF RATES AND CHARGES") Then

                    tarrifs.Add(item.Value)

                    sOut.ToLower().Contains("R1-Demand")

                End If


            Next
            Dim emailsString As String = Join(tarrifs.Distinct.ToArray, "     ")

            TextBox1.Text = emailsString

        Next
        Return sOut

        End Function

End Class
Posted
Updated 1-Apr-20 22:57pm

1 solution

You just need a switch to tell you whether to save values or not. Something like:
boolean capture = false;
while not end of file
    read a string
    if string contains "SUMMARY OF RATES AND CHARGES"
        set capture = true
    if string contains "Summer Commodity"
        set capture = false
    if capture == true
        save the price
end while
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900