How to Extract Text from Word Document ?

Question

0.00/5 (No votes)

See more:

Hi Friends,
I am trying to extract specific text from word document based on coordinates, i have been searching many sites on this requirement ,but no luck. How to set the coordinates to the word document text ?

if any one know about this, could you share the answer.

Regards
Nanda Kishore.CH

Posted 23-Mar-15 1:59am

nandkishorre

Add a Solution

Comments

Richard MacCutchan 23-Mar-15 8:08am

What do you mean by co-ordinates, in reference to a Word document?

nandkishorre 23-Mar-15 8:17am

in pdf document, we extracted data by coordinates using (llx,lly,urx,ury). like wise i want to extract word document data. is there any possible to get data ?

Richard MacCutchan 23-Mar-15 8:38am

I have no idea how that is supposed to work, even for a PDF document. Where do these co-ordinates come from, and what do they refer to?

Leo Chapiro 23-Mar-15 8:12am

I know only the possibility to navigate through Word document using the Bookmarks - is that what you mean?

nandkishorre 23-Mar-15 8:17am

in pdf document, we extracted data by coordinates using (llx,lly,urx,ury). like wise i want to extract word document data. is there any possible to get data ?

Member 10762095 29-Jul-17 12:45pm

may I know how can you share the source code?

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Mario Z · Answer 1 · 2015-03-23T03:08:00

First regarding the extracting of specific text in Word documents, you can check out my following post: Find Text in Word Documents[^]
The provided code converts the DOCX files into a string, but now regarding your second requirements (extracting by coordinate), well this is somewhat impossible.

Word files are of a flow document type and its content is not fixed like in PDF files. To put it in another words in PDF a specific text is defined on a specific coordinates and it will always be rendered in the same location to anyone that is viewing the file.
But in Word files a specific text does not contain any information about where it will be rendered, it can be in first page or a fifth page, the content itself does not care about it, but instead the viewing applications does and different application can result in different rendering of the same document.