Click here to Skip to main content
15,884,388 members
Articles / Productivity Apps and Services / Microsoft Office / Microsoft Word

MS Word Document Search Command Line Utility

Rate me:
Please Sign up or sign in to vote.
0.00/5 (No votes)
4 Mar 2021CPOL3 min read 4K   184   7  
Console .NET application using Microsoft.Office.Interop.Word for keyword search
This utility is designed to search a series of Word documents using the Microsoft.Office.Interop.Word library. It selects and writes extended paragraphs satisfying user specified requirements for keyword presence/absence. In addition, it contains simple exploratory commands to write specified paragraphs and words as defined by the Interop.Word library.

Image 1

Background

For years, I have kept small random notes in MS Word documents. New notes or topics are introduced by a line starting with a hyphen and a new document is created every quarter or half year. This NoteSearch utility provides a convenient way to search multiple Word documents, such as years of my collected notes, for entries with user-specified keywords. In addition, exploratory commands were included to help in understanding of how the Microsoft.Office.Interop.Word library divides the document into paragraphs and words.

Introduction

Given the need to search multiple Word documents, the use of the Microsoft.Office.Interop.Word library seemed an obvious choice. The documentation provided by Microsoft was very brief, so the exploratory commands were written first to provide basic familiarity with using the library to open and close the document and examine paragraphs and words. The search commands go further by using the library to search for the presence or absence of specified keywords. This program does not utilize the library's ability to modify documents.

Documentation - Example 1

The initial examples are based on a document containing the U. S. Constitution. The text was downloaded from the web site of the National Constitution Center in Philadelphia. In this first snippet, we use the "af" (add file) command to select the USConstitution document and set the search to include the word "defence" and exclude the word "jury" (i and x commands for include and exclude). Then the note search (nn command) examines the document for paragraphs obeying these rules. The commands and responses are:

af usconstitution.docx
ns
i defence
x jury
nn

Paragraphs 1 - 1 from C:\...\data\usconstitution.docx
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, ... do ordain and establish this Constitution for the United States of America.
ready
Press [return]/q for next/quit
[RETURN]


Paragraphs 35 - 35 from C:\...\data\usconstitution.docx
The Congress shall have Power To lay and collect Taxes, ... and provide for the common Defence and general Welfare of the United States; but all Duties, Imposts and Excises shall be uniform throughout the United States;
ready
Press [return]/q for next/quit
[RETURN]


End of notes.
Please note that these snippets may omit certain details. The documentation provides a full listing of all commands for these examples.

Documentation - Example 2

In the previous example, we did not divide the document into subdivisions so it performed a paragraph by paragraph search. We can search utilizing larger document divisions and we use the term notes to designate these subdivisions. The framers divided the Constitution into Sections, so the word SECTION provides a convenient divider for this document. In this example, we use the ns (note start) command to choose the word SECTION to precede divisions and then perform the same search.

af usconstitution.docx
i defence 
x jury
ns section
nn

Paragraphs 1 - 2 from C:\Projects\NoteSearch\data\usconstitution.docx
We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America.
Article 1
ready
Press [return]/q for next/quit


RETURN
Paragraphs 34 - 52 from C:\Projects\NoteSearch\data\usconstitution.docx
Section 8: Powers of Congress
The Congress shall have Power To lay and collect Taxes, Duties, Imposts and Excises, to pay the Debts and provide for the common Defence and general Welfare of the United States; but all Duties, Imposts and Excises shall be uniform throughout the United States;
To borrow Money on the credit of the United States;
[listing of enumerated powers]
To make all Laws which shall be necessary and proper for carrying into Execution the foregoing Powers...
ready
Press [return]/q for next/quit


RETURN
End of notes.
ready

The difference here is that the search is by sections rather than paragraphs, so the entire section of the Constitution, in this case, paragraphs 34 through 52, is written.

Documentation - Example 3

My primary goal in writing this program was to locate notes in my journal documents. You can see the structure of these documents in the following excerpt:

Monday, February 9, 2015
- Fell in driveway today. Bumped head on concrete stairs. Ouch!
- I had a car problem with hitting a couple of potholes...
I didn't notice anything with the car but it seemed to have a drag on it when I tried to drive on Sunday...
Finally gave up and called AAA...
I called dealer at 7:30 on Monday and they had the tire in stock ....
I also told them they could align the wheels so the total bill was about $525.
- Installed the cover on the side basement window. I should have cut openings for the phone wires and it would have fit better. We'll see whether this does any good.

The basic format is that there are date lines and individual notes on random topics. Notes can contain multiple paragraphs and each note begins with a line starting with a hyphen. In the snippet below, I want to search two documents using a single command use a text file, which lists file names of two Word files, and search for note entries with the word "car".

af notes-q1-2015.docx
ready
af notes-q2-2015.docx
ready
l
  0 - C:\...\notes-q1-2015.docx
  1 - C:\...\notes-q2-2015.docx
i car
ready
nn


Monday, February 09, 2015 Paragraphs 20 - 23 from C:\...\data\notes-q1-2015.docx
-  I had a car problem with hitting a couple of potholes...
   I didn't notice anything with the car but it seemed to have a drag on it when I tried to drive on Sunday... 
   Finally gave up and called AAA...
   I called dealer at 7:30 on Monday and they had the tire in stock ....
   I also told them they could align the wheels so the total bill was about $525. 


ready
Press [return]/q for next/quit
[RETURN]


Monday, June 08, 2015 Paragraphs 33 - 35 from C:\Projects\NoteSearch\data\notes-q2-2015.docx

- Last week I was driving car to PM when the oil low light came on.  It said it was okay to continue driving but I could add a quart of oil.
    I drove to the dealer.  ...
[etc]   

The search can continue for multiple documents and would display every note containing the string "car".

Conclusion and Points of Interest

This program provides a very simple introduction to programming using the Interop.Word library. It is also an easy-to-use program for searching a series of MS Word documents for keywords. For this application, the fact that it can examine multiple documents and that it writes an entire paragraph or section makes it a very versatile search tool. Adding additional commands and search types could turn the program into a versatile document search tool.

History

  • 4th March, 2021: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --