Click here to Skip to main content
15,886,026 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I am new to Python. I tried writing a code to extract details from specific Word's table cell and export as Excel. However, I seems can't to extract the correct details from word's table to excel sheet. I am able to extract details from row 1-3, but not for other rows

The following is the word table:

https://i.stack.imgur.com/0rQZS.png[^]

What I have tried:

Python
import os
import xlsxwriter
import xlwt
import docx
 


from docx import Document
#read single file 
doc = Document ('/Users/TP/approach/data/sample.docx')

#read multiple files 
path = "/Users/TP/approach/data/"
files = os .  listdir  ( path )  
docx_list =   [  ]  
for f in files :  
    if os . path .  splitext  ( f )  [  1  ]   ==   '.docx'  :  
        docx_list .  append  ( path +   '//'   + f )  
    else  : 
        pass

#read form 
tb = doc.tables
#read line
rows = tb[0].rows
#read column
cols = rows[0].cells
#read cell
cell = cols [ 0 ] 
text = cell . text
`
mat =   [  ]  
for a in range  (  len  ( docx_list )  )  :  
    doc =  Document  ( docx_list [ a ]  )  
    tb = doc . tables [  0  ] 


row = []
#From row 1-3, the code is following.

# Get the 2nd row of data 
for i in  range ( 1 , 7 , 6 ) : 
    cell = tb . cell ( 1 , i ) 
    txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
    row . append ( txt )


#From row 5-9, the code is following
# Get the 5th row of data
for l in  range ( 1 , 7 , 6 ) : 
    cell = tb . cell ( 4 , l ) 
    txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
    row . append ( txt )
# Get the 6th row of data
for m in   ( 1, 7 , 6  ) : 
    cell = tb . cell ( 5 , m ) 
    txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
    row . append ( txt )
# Get the 7th row of data
for n in  range ( 2 , 7 , 1) : 
        cell = tb . cell ( 6 , n ) 
        txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
        row . append ( txt )
# Get the 8th row of data
for o in  range ( 3 , 7, 1) : 
    cell = tb . cell ( 7 , o ) 
    txt = cell . text if cell . text !=  ''  else  ''   # No content with spaces
    row . append ( txt )


#Create workbook
workbook = xlsxwriter.Workbook('/Users/TP/missed_approach/output/missed approach_output_7.xlsx')

#add sheet
xlsheet = workbook.add_worksheet('data')
#add header
table_head = [' date ', 'time','aircraft_call_sign','runway_in_use','aircraft_type','persons_on_board','name_of_airline','point_of_depature','aircraft_registration','destination','reason']
headlen = len(table_head)

for i in range(headlen):
    xlsheet.write(0,i,table_head[i])
for i in range(len(mat)):
    for j in range(len(row)):
        xlsheet.write(i+1,j,mat[i][j])
workbook.close()
Posted
Updated 1-Oct-20 4:44am
v4
Comments
Richard MacCutchan 1-Oct-20 10:47am    
I see that you have updated your question but you have still not explained what the problem is. Why are you using range statements to extract single items of data? For example you have three blocks beginning
for i in  range ( 1 , 7 , 6 )

which could be combined into one.

1 solution

Python
# Get the 5th row of data
for l in  range ( 1 , 7 , 6 ) : 
    cell = tb . cell ( 4 , j ) 

You are using the index named l for your range, but trying to use j to refer to the data item. The index j is never declared anywhere.
 
Share this answer
 
Comments
lauuster 1-Oct-20 10:44am    
Thank you! I have amended the code accordingly. However, the extractions were still wrong. I have attached an image for reference above.
Richard MacCutchan 1-Oct-20 10:49am    
Well you need to explain what you mean by "the extractions were still wrong". We do not know what data you are trying to extract, or what actually happens when you try.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900