How to extract word table to excel with Python?

Question

0.00/5 (No votes)

See more:

I am new to Python. I tried writing a code to extract details from specific Word's table cell and export as Excel. However, I seems can't to extract the correct details from word's table to excel sheet. I am able to extract details from row 1-3, but not for other rows

The following is the word table:

https://i.stack.imgur.com/0rQZS.png[^]

What I have tried:

Python

import os
import xlsxwriter
import xlwt
import docx
 


from docx import Document
#read single file 
doc = Document ('/Users/TP/approach/data/sample.docx')

#read multiple files 
path = "/Users/TP/approach/data/"
files = os .  listdir  ( path )  
docx_list =   [  ]  
for f in files :  
    if os . path .  splitext  ( f )  [  1  ]   ==   '.docx'  :  
        docx_list .  append  ( path +   '//'   + f )  
    else  : 
        pass

#read form 
tb = doc.tables
#read line
rows = tb[0].rows
#read column
cols = rows[0].cells
#read cell
cell = cols [ 0 ] 
text = cell . text
`
mat =   [  ]  
for a in range  (  len  ( docx_list )  )  :  
    doc =  Document  ( docx_list [ a ]  )  
    tb = doc . tables [  0  ] 


row = []
#From row 1-3, the code is following.

# Get the 2nd row of data 
for i in  range ( 1 , 7 , 6 ) : 
    cell = tb . cell ( 1 , i ) 
    txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
    row . append ( txt )


#From row 5-9, the code is following
# Get the 5th row of data
for l in  range ( 1 , 7 , 6 ) : 
    cell = tb . cell ( 4 , l ) 
    txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
    row . append ( txt )
# Get the 6th row of data
for m in   ( 1, 7 , 6  ) : 
    cell = tb . cell ( 5 , m ) 
    txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
    row . append ( txt )
# Get the 7th row of data
for n in  range ( 2 , 7 , 1) : 
        cell = tb . cell ( 6 , n ) 
        txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
        row . append ( txt )
# Get the 8th row of data
for o in  range ( 3 , 7, 1) : 
    cell = tb . cell ( 7 , o ) 
    txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
    row . append ( txt )


#Create workbook
workbook = xlsxwriter.Workbook('/Users/TP/missed_approach/output/missed approach_output_7.xlsx')

#add sheet
xlsheet = workbook.add_worksheet('data')
#add header
table_head = [' date ', 'time','aircraft_call_sign','runway_in_use','aircraft_type','persons_on_board','name_of_airline','point_of_depature','aircraft_registration','destination','reason']
headlen = len(table_head)

for i in range(headlen):
    xlsheet.write(0,i,table_head[i])
for i in range(len(mat)):
    for j in range(len(row)):
        xlsheet.write(i+1,j,mat[i][j])
workbook.close()

Posted 30-Sep-20 23:59pm

lauuster

Updated 1-Oct-20 4:44am

v4

Add a Solution

Comments

Richard MacCutchan 1-Oct-20 10:47am

I see that you have updated your question but you have still not explained what the problem is. Why are you using range statements to extract single items of data? For example you have three blocks beginning

for i in  range ( 1 , 7 , 6 )

which could be combined into one.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Richard MacCutchan · Answer 1 · 2020-10-01T03:14:00

Solution 1

Python

# Get the 5th row of data
for l in  range ( 1 , 7 , 6 ) : 
    cell = tb . cell ( 4 , j )

You are using the index named l for your range, but trying to use j to refer to the data item. The index j is never declared anywhere.

Posted 1-Oct-20 3:14am

Richard MacCutchan

Comments

lauuster 1-Oct-20 10:44am

Thank you! I have amended the code accordingly. However, the extractions were still wrong. I have attached an image for reference above.

Richard MacCutchan 1-Oct-20 10:49am

Well you need to explain what you mean by "the extractions were still wrong". We do not know what data you are trying to extract, or what actually happens when you try.