Python how to group/merge rows in to single row

Question

1.00/5 (2 votes)

See more:

Greetings!

Actually, I converted the PDF file containing the tables into a Pandas dataframe and then into Excel. Some cells in a PDF document contain multiline text.
I've previously converted PDFs into a Pandas dataframe and then into Excel, but in those PDFs, the cells with multiline text had a \n at the end of the line, so I managed to make the multiline text into a single line/cell, but in this PDF there is no \n.

So I want the text into one line/cell, but I am not able to do so. Can anybody please help me with the same?

I hope I am able to make you understand my question.

This is what I getting after exporting dataframe into excel
INPUT1 — Postimages[^]

And this is I want
OUTPUT1 — Postimages[^]

What I have tried:

I have no idea what I should do.

Posted 12-Jan-23 1:08am

Member 15881193

Updated 16-Jan-23 10:06am

Add a Solution

Comments

Richard MacCutchan 12-Jan-23 7:29am

"I have no idea what I should do."
The first thing you need to do is to show us the relevant code, and the extracted data that you are trying to save. No one here can guess what either of those may be.

Member 15881193 12-Jan-23 7:39am

I think that the answer is more Python-specific and has nothing to do with how I pulled the data from PDF.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Maciej Los · Answer 1 · 2023-01-16T10:06:00

Solution 1

Follow these links:
tabula extract table from pdf remove line break[^]
Opening a pdf and reading in tables with python pandas[^]

More at: How to Extract Tables from PDF in Python - Python Code[^]

Posted 16-Jan-23 10:06am

Maciej Los

Comments

Member 15881193 18-Jan-23 6:59am

I tried everything, but it didn't work for me.

Member 15881193 18-Jan-23 7:01am

Can you tell me how to create a new column before the PostDate column and enter a unique number where the PostDate column contains a value, and keep the space empty where the PostDate Column contains nothing.

NewColumn PostDate
1 01-04-2012
2 03-04-2012
3 05-04-2012
----------
----------
----------
4 10-04-2012