Click here to Skip to main content
15,867,488 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I want to write a program that copies text from a Word document and pastes it to another. I'm trying to do that using the python-docx library. I was able to do that with the following code, but it does not copy the bold, italic, underlined nor colored parts as they are and only their text:
Python
from docx import Document


input = Document('SomeDoc.docx')

paragraphs = []
for para in input.paragraphs:
    p = para.text
    paragraphs.append(p)

output = Document()
for item in paragraphs:
    output.add_paragraph(item)
output.save('OutputDoc.docx')


What I have tried:

I've tried copying the paragraph object directly into the output document, but it doesn't work either:
Python
from docx import Document


input = Document('SomeDoc.docx')
output = Document()

for para in input.paragraphs:
    output.add_paragraph(para)
output.save('OutputDoc.docx')
Posted
Updated 22-Feb-18 0:28am
v4

In order to copy the text with its styles, you will need to write your own function, as there is no python-docx function that does such a thing. This is the function I wrote:

Python
def get_para_data(output_doc_name, paragraph):
    """
    Write the run to the new file and then set its font, bold, alignment, color etc. data.
    """

    output_para = output_doc_name.add_paragraph()
    for run in paragraph.runs:
        output_run = output_para.add_run(run.text)
        # Run's bold data
        output_run.bold = run.bold
        # Run's italic data
        output_run.italic = run.italic
        # Run's underline data
        output_run.underline = run.underline
        # Run's color data
        output_run.font.color.rgb = run.font.color.rgb
        # Run's font data
        output_run.style.name = run.style.name
    # Paragraph's alignment data
    output_para.paragraph_format.alignment = paragraph.paragraph_format.alignment


How The Function Works
1. Adds a new paragraph object to the file.
2. Adds a new run to that paragraph.
3. Checks whether each of the styles bold, italic and underline is True, False, None. If it's True, the run will be in that style, if it's False, it won't be in that style, and if it's None, it will be inherited by the default style of the paragraph it's in. Then it applies the styles to the run.
3. Checks what's the color of the run in RGB and applies the found color to the run.
4. Checks what's the font of the run and applies the found font to the run.
5. Checks what's the alignment of the run and applies the found alignment setting to the run.


How to Use the Function:
You need to give it the name you gave your output document and the paragraphs you want to copy.
For Example:

Python
# Imports

input_doc = Document('InputDoc.docx')
output_doc = Document()

# Call the function
get_para_data(output_doc, input_doc.paragraphs[3])

# Save the new file
output_doc.save('OutputDoc.docx')

If you'd like to copy the entire document I suggest you do this:

Python
for para in input_doc.paragraphs:
    get_para_data(output_doc, para)

output_doc.save('OutputDoc.docx')
 
Share this answer
 
v2
You need to get the style information from the paragraph, see Text-related objects — python-docx 0.8.6 documentation[^].
 
Share this answer
 
Comments
E. Epstein 19-Feb-18 7:37am    
I'm sorry, but I've already checked the documentation out, and I couldn't find the solution. Could you please write the function you're referring to?
#clone your document
import docx
from copy import deepcopy

doc1 = docx.Document('original_file.docx')

copy_the_content = deepcopy(doc1)

copy_the_content.save('new_file.docx')
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900