Click here to Skip to main content
15,867,453 members
Articles / Productivity Apps and Services / Microsoft Office / Microsoft Word

Find and Replace Text in a Word Document

Rate me:
Please Sign up or sign in to vote.
5.00/5 (41 votes)
14 Jun 2016CPOL4 min read 99.7K   3.1K   42   41
Pure .NET solution for performing find and replace text on Word documents (DOCX file format)
This article, you will find various approaches that can be used to search and replace the Word document's text using only the .NET Framework (without using any third-party code).

(last updated on 14th June, 2016)

Introduction

Searching a Word document's text and replacing it with text from a .NET application is a rather common task. This article will mention various approaches that we can use and also show how we can search and replace the Word document's text using only the .NET Framework (without using any third-party code). To follow the implementation details, a basic knowledge of WordprocessingML is required.

Find And Replace Word Document

Details

If we have the option to use Word Automation (which requires having MS Word installed), then we can achieve the find and replace functionality with an API provided by Word Interop, as demonstrated here.

Another way would be to read the whole main part of the DOCX file (document.xml) as string and perform a find and replace on it, as demonstrated here. This simple approach may be enough, but a problem occurs when the searched text is not the value of a single XML element, for example, consider the following DOCX file:

Image 2

Picture 1: Hello World sample document

The document's main part will look something like the following:

XML
<p>
  <r>
    <rPr><color val="FF0000"/></rPr>
    <t>Hello </t>
  </r>
  <r>
    <rPr><color val="0000FF"/></rPr>
    <t>World</t>
  </r>
</p>

Another situation, for example, is the following:

XML
<p>
  <r>
    <t>Hello</t>
    <t> </t>
    <t>World</t>
  </r>
</p>

So the text that we're looking for inside our Word document may span over multiple elements and we need to consider this when searching for it.

Implementation

We'll open the Word document and present it with a FlatDocument object. This object will read document parts (like the body, headers, footers, comments, etc.) and store them as a collection of XDocument objects.

The FlatDocument object will also create a set of FlatTextRange objects that represent searchable parts of the document's text content (a single FlatTextRange can represent a single paragraph, a single hyperlink, etc.). Each FlatTextRange will contain FlatText objects that have an indexed text content (FlatText.StartIndex and FlatText.EndIndex represent the FlatText's text location inside the FlatTextRange's text).

Steps

  1. Open Word document:

    C#
    public sealed class FlatDocument : IDisposable
    {
        public FlatDocument(string path) :
            this(File.Open(path, FileMode.Open, FileAccess.ReadWrite)) { }
    
        public FlatDocument(Stream stream)
        {
            this.documents = XDocumentCollection.Open(stream);
            this.ranges = new List<FlatTextRange>();
    
            this.CreateFlatTextRanges();
        }
    
        // ...
    }
  2. Iterate through the Run elements of the supported document parts (body, headers, footers, comments, endnotes and footnotes, which are loaded as XDocument objects) and create FlatTextRange and FlatText instances:

    C#
    public sealed class FlatDocument : IDisposable
    {
        private void CreateFlatTextRanges()
        {
            foreach (XDocument document in this.documents)
            {
                FlatTextRange currentRange = null;
                foreach (XElement run in document.Descendants(FlatConstants.RunElementName))
                {
                    if (!run.HasElements)
                        continue;
    
                    FlatText flatText = FlattenRunElement(run);
                    if (flatText == null)
                        continue;
    
                    // If the current Run doesn't belong to the same parent
                    // (like a paragraph, hyperlink, etc.),
                    // create a new FlatTextRange, otherwise use the current one.
                    if (currentRange == null || currentRange.Parent != run.Parent)
                        currentRange = this.CreateFlatTextRange(run.Parent);
                    currentRange.AddFlatText(flatText);
                }
            }
        }
    
        // ...
    }
  3. Flatten Run elements, which splits a single Run element into multiple sequential Run elements that have a single content child element (and optionally the first RunProperties child element). Create a FlatText object from the flat Run element:

    Flatten Run element

    Picture 2: Flatten Run element

    Flat Objects

    Picture 3: Flat objects

    Flat Objects Text Content

    Picture 4: Flat objects text content
    C#
    public sealed class FlatDocument : IDisposable
    {
        private static FlatText FlattenRunElement(XElement run)
        {
            XElement[] childs = run.Elements().ToArray();
            XElement runProperties = childs[0].Name == 
                                     FlatConstants.RunPropertiesElementName ?
                childs[0] : null;
    
            int childCount = childs.Length;
            int flatChildCount = 1 + (runProperties != null ? 1 : 0);
    
            // Break the current Run into multiple Run elements that have one child,
            // or two children if it has RunProperties element as a first child.
            while (childCount > flatChildCount)
            {
                // Move the last child element from the current Run into the new Run,
                // which is added after the current Run.
                XElement child = childs[childCount - 1];
                run.AddAfterSelf(
                    new XElement(FlatConstants.RunElementName,
                        runProperties != null ? new XElement(runProperties) : null,
                        new XElement(child)));
    
                child.Remove();
                --childCount;
            }
    
            XElement remainingChild = childs[childCount - 1];
            return remainingChild.Name == FlatConstants.TextElementName ?
                new FlatText(remainingChild) : null;
        }
    
        // ...
    }
  4. Perform find and replace over FlatTextRange instances:

    C#
    public sealed class FlatDocument : IDisposable
    {
        public void FindAndReplace(string find, string replace)
        {
            this.FindAndReplace(find, replace, StringComparison.CurrentCulture);
        }
    
        public void FindAndReplace
        (string find, string replace, StringComparison comparisonType)
        {
            this.ranges.ForEach
            (range => range.FindAndReplace(find, replace, comparisonType));
        }
    
        // ...
    }
     
    internal sealed class FlatTextRange
    {
        public void FindAndReplace
        (string find, string replace, StringComparison comparisonType)
        {
            int searchStartIndex = -1, searchEndIndex = -1, searchPosition = 0;
            while ((searchStartIndex =
                this.rangeText.ToString().IndexOf
                     (find, searchPosition, comparisonType)) != -1)
            {
                searchEndIndex = searchStartIndex + find.Length - 1;
    
                // Find FlatText that contains the beginning of the searched text.
                LinkedListNode<FlatText> node = this.FindNode(searchStartIndex);
                FlatText flatText = node.Value;
    
                ReplaceText(flatText, searchStartIndex, searchEndIndex, replace);
    
                // Remove next FlatTexts that contain parts of the searched text.
                this.RemoveNodes(node, searchEndIndex);
    
                this.ResetRangeText();
                searchPosition = searchStartIndex + replace.Length;
            }
        }
    
        // ...
    }
  5. Finally, FlatDocument.Dispose will save the XDocument parts and close the Word document.

Usage

The following sample code demonstrates how to use FlatDocument:

C#
class Program
{
    static void Main(string[] args)
    {
        // Open the Word file.
        using (var flatDocument = new FlatDocument("Sample.docx"))
        {
            // Search and replace the document's text content.
            flatDocument.FindAndReplace("Hello Word", "New Value 1");
            flatDocument.FindAndReplace("Foo Bar", "New Value 2");
            // ...
			
            // Save the Word file on Dispose.
        }
    }
}

Points of Interest

An alternative algorithm to that above is to split a single Run element into multiple sequential Run elements that have a single child (the same as above), but in this case, a single child element would contain only a single character:

XML
<p>
  <r>
    <t>H</t>
  </r>
  <r>
    <t>e</t>
  </r>
  <r>
    <t>l</t>
  </r>
  <r>
    <t>l</t>
  </r>
  <r>
    <t>o</t>
  </r>
  <!--
      ...
  -->
</p>

We would then iterate through those elements while looking for a sequence of matched characters. You can find the details and an implementation of this approach in the following article: Search and Replace Text in an Open XML WordprocessingML Document
This approach is actually used in the Open XML PowerTools (TextReplacer class).

However, the problem with both of these algorithms is that they do not work on content that spans over multiple paragraphs. In this case, we would need to flatten the entire content of the Word document to search for the required text successfully. GemBox.Document is a .NET component for processing Word files that presents a document with a content model hierarchy that can be accessed as flat content through the ContentRange class. With it, we are able to search for content that spans over multiple paragraphs. For details, see the following article: Find and Replace in Word with C# or VB.NET
With this approach, we are actually able to find any arbitrary content and replace it with any desired content (including tables, pictures, paragraphs, HTML formatted text, RTF formatted text, etc.).

Improvements

  • Currently, the replace text will have the same formatting as used at the beginning of the found text. However, we could consider providing a FindAndReplace overload method that would accept the desired formatting (for example, something like: FlatDocument.FindAndReplace(string find, string replace, TextFormat format)). When the formatting is provided, we would need to create a new RunProperties element based on it.
  • Currently any special characters (like tabs, line breaks, non-breaking hyphens, etc.) in both the search and replace texts are not considered. For this, FlatText should be aware of the different element types that FlatText.textElement can be (like <tab/>, <br/>, <noBreakHyphen/>, etc.) and return the appropriate FlatText.Text value based on it.
  • Please feel free to post other suggestions for improvements in the comments!

History

  • 14th June, 2016: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer GemBox Ltd.
Croatia Croatia
I'm a developer at GemBox Software, working on:

  • GemBox.Spreadsheet - Read, write, convert, and print XLSX, XLS, XLSB, CSV, HTML, and ODS spreadsheets from .NET applications.
  • GemBox.Document - Read, write, convert, and print DOCX, DOC, PDF, RTF, HTML, and ODT documents from .NET applications.
  • GemBox.Pdf - Read, write, edit, and print PDF files from .NET applications.
  • GemBox.Presentation - Read, write, convert, and print PPTX, PPT, and PPSX presentations from .NET applications.
  • GemBox.Email - Read, write, and convert MSG, EML, and MHTML email files, or send and receive email messages using POP, IMAP, SMTP, and EWS from .NET applications.
  • GemBox.Imaging - Read, convert, and transform PNG, JPEG, and GIF images from .NET applications.

Comments and Discussions

 
AnswerCan't open file after process - FIXED Pin
a.s.mitchell1-Sep-21 21:30
a.s.mitchell1-Sep-21 21:30 
Questionthis is work on mvc ? Cannot open after dispos and Add image Pin
Sivaji Budida2-Mar-20 22:33
Sivaji Budida2-Mar-20 22:33 
Questionstring new line not working Pin
Francisco Gonçalves20-Nov-19 7:55
professionalFrancisco Gonçalves20-Nov-19 7:55 
AnswerRe: string new line not working Pin
Francisco Gonçalves20-Nov-19 8:47
professionalFrancisco Gonçalves20-Nov-19 8:47 
QuestionPrint Datagridview into word docx Pin
Santhosh Ks16-Aug-19 0:41
Santhosh Ks16-Aug-19 0:41 
Questionhow to find table and insert row in table ? Pin
Bảo Võ12-Dec-18 16:37
Bảo Võ12-Dec-18 16:37 
Questionhow to change font color Pin
nealee4-Nov-18 21:50
nealee4-Nov-18 21:50 
QuestionThank you, it worked Pin
Member 1363886031-Jan-18 4:28
Member 1363886031-Jan-18 4:28 
QuestionThank you so much... something error please. Pin
Narayan N6-Jan-18 23:58
Narayan N6-Jan-18 23:58 
AnswerRe: Thank you so much... something error please. Pin
Mario Z9-Jan-18 0:06
professionalMario Z9-Jan-18 0:06 
QuestionCouldn't open file after dispose Pin
premshiva14312-Nov-17 11:31
premshiva14312-Nov-17 11:31 
AnswerRe: Couldn't open file after dispose Pin
Mario Z12-Nov-17 19:16
professionalMario Z12-Nov-17 19:16 
QuestionAdd a break in the line in the word document with Words numbering Pin
Member 133937127-Sep-17 0:34
Member 133937127-Sep-17 0:34 
AnswerRe: Add a break in the line in the word document with Words numbering Pin
Mario Z7-Sep-17 0:58
professionalMario Z7-Sep-17 0:58 
QuestionConvert Docx to pdf Pin
Member 859986028-Aug-17 23:50
Member 859986028-Aug-17 23:50 
AnswerRe: Convert Docx to pdf Pin
Mario Z29-Aug-17 7:05
professionalMario Z29-Aug-17 7:05 
Questionoutput file not disposed in winForm Pin
Francisco Castellanos Demeneghi21-Jun-17 5:25
Francisco Castellanos Demeneghi21-Jun-17 5:25 
AnswerRe: output file not disposed in winForm Pin
Francisco Castellanos Demeneghi21-Jun-17 6:51
Francisco Castellanos Demeneghi21-Jun-17 6:51 
QuestionReplace value = "" Pin
Member 1303101620-Jun-17 20:36
Member 1303101620-Jun-17 20:36 
AnswerRe: Replace value = "" Pin
Mario Z20-Jun-17 20:53
professionalMario Z20-Jun-17 20:53 
Questioncannot open file after dispose Pin
jini,codeproject17-Jun-17 0:00
jini,codeproject17-Jun-17 0:00 
AnswerRe: cannot open file after dispose Pin
Mario Z17-Jun-17 1:42
professionalMario Z17-Jun-17 1:42 
AnswerRe: cannot open file after dispose Pin
Mario Z17-Jun-17 20:15
professionalMario Z17-Jun-17 20:15 
GeneralRe: cannot open file after dispose Pin
jini,codeproject18-Jun-17 15:28
jini,codeproject18-Jun-17 15:28 
GeneralRe: cannot open file after dispose Pin
Mario Z18-Jun-17 20:27
professionalMario Z18-Jun-17 20:27 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.