Click here to Skip to main content
15,886,110 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
Greetings!

Actually, I converted the PDF file containing the tables into a Pandas dataframe and then into Excel. Some cells in a PDF document contain multiline text.
I've previously converted PDFs into a Pandas dataframe and then into Excel, but in those PDFs, the cells with multiline text had a \n at the end of the line, so I managed to make the multiline text into a single line/cell, but in this PDF there is no \n.

So I want the text into one line/cell, but I am not able to do so. Can anybody please help me with the same?

I hope I am able to make you understand my question.

This is what I getting after exporting dataframe into excel
INPUT1 — Postimages[^]

And this is I want
OUTPUT1 — Postimages[^]

What I have tried:

I have no idea what I should do.
Posted
Updated 16-Jan-23 10:06am
Comments
Richard MacCutchan 12-Jan-23 7:29am    
"I have no idea what I should do."
The first thing you need to do is to show us the relevant code, and the extracted data that you are trying to save. No one here can guess what either of those may be.
Member 15881193 12-Jan-23 7:39am    
I think that the answer is more Python-specific and has nothing to do with how I pulled the data from PDF.

1 solution

 
Share this answer
 
Comments
Member 15881193 18-Jan-23 6:59am    
I tried everything, but it didn't work for me.
Member 15881193 18-Jan-23 7:01am    
Can you tell me how to create a new column before the PostDate column and enter a unique number where the PostDate column contains a value, and keep the space empty where the PostDate Column contains nothing.

NewColumn PostDate
1 01-04-2012
2 03-04-2012
3 05-04-2012
----------
----------
----------
4 10-04-2012

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900