Read more than one pdf file (iTextsharp)

Question

0.00/5 (No votes)

See more:

I made a console with iTextSharp for reading a .pdf file and save it as a .csv So I have a hardcoded .pdf file but I would like to read more than 100 .pdf files and save it as a .csv

The files will be named like this:

DT12345678, DT98765432, FR123567, FR988654 ...

C#

static void Main(string[] args)
{
    string fileName = "test.pdf";
        StringBuilder text = new StringBuilder();
    StreamWriter write = new StreamWriter("test.csv");
        if (File.Exists(fileName))
        {
            PdfReader pdfReader = new PdfReader(fileName);

            for (int page = 1; page <= pdfReader.NumberOfPages; page++)
            {
                ITextExtractionStrategy strategy = new SimpleTextExtractionStrategy();
                string currentText = PdfTextExtractor.GetTextFromPage(pdfReader, page, strategy);

                currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)));
                text.Append(currentText);
                pdfReader.Close();
            }
        }
         text.ToString();
    write.Write(text.ToString());
    write.Close();
    Console.WriteLine(text.ToString());

}

What I have tried:

I couldn't try anything because I have no reference point.

Posted 20-Sep-20 20:22pm

Member 14783397

Updated 20-Sep-20 20:54pm

Add a Solution

Comments

Sandeep Mewara 21-Sep-20 2:48am

It's not clear on where are you stuck? Seems you wrote a code to read and save for 1 pdf. Now you want to extend for multiple, so what is the issue?

Member 14783397 21-Sep-20 6:05am

the issue is how to do that..

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

OriginalGriff · Answer 1 · 2020-09-20T20:54:00

You have code to read a pdf file.
So extract that into a separate method that accepts a single parameter - the path to the file - and returns the entire content. Test it, and make sure it works.

You can then call that method as many times as you need in a loop to get all the files content.

You will then probably need to process that content into actual data before outputting it as CSV, but that will depend on the data content, and we have no idea what your PDF files contain, or what you need in each column of the CSV. It is unlikely that the PDF content will be in CSV format already, but it is possible!