Click here to Skip to main content
15,889,877 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Can anyone tell me how to extract only required text from Word document using C#.

About Word document:

Word document contains some simple text and audio text and I want to differentiate and extract that text from word doc into excel.

Storyboard Screen shot link contains Simple text and audio text and where you can see 2 Text colors Blue and Black. Problem is to identify the Black color Text because it has some Simple Text and Audio Text and I want that Text into Excel in separate columns as displayed in Screen shot link above.

and i'm able to extract the text from Word document into Excel but problem is how can I identify Simple text and Audio text without changing the font style and color.

Any help in this will be highly appreciated.

Thanks in advance.

What I have tried:

I have followed 2 approaches but these are not appropriate solution for me-
1. Add separate Bookmark as identifier for each Text so that we can identify and extract the text easily.
2. Change font style and colour of the text we want to extract.
but these approaches are time taking and not a proper solution for me so any one can have another idea to identify the different text please suggest.
Posted
Updated 3-Jul-16 17:58pm
v3
Comments
U. G. Leander 30-Jun-16 2:56am    
Have you tried to use OpenXML? It's a quite powerful (yet poorly documented) SDK. Might help you out.
Rohit027 30-Jun-16 5:22am    
Thank you for your reply and yes, i have tried OpenXML also and it gives same result when i was implementing the above two approaches and i don't want to use these approaches and if you have any other idea please suggest.

1 solution

You could try saving the Word document as an HTML file and then use Jquery to extract the colored elements. Though, you will have to make 2 modifications to the generated HTML file.

1. Insert jQuery CDN script
<script src="https://code.jquery.com/jquery-3.0.0.min.js" integrity="sha256-JmvOoLtYsmqlsWxa7mDSLMwa6dZ9rrIdtrrVYRnDRH0=" crossorigin="anonymous"></script>


2. Insert Custom jQuery code.
JavaScript
var texts = $('span');
var fin = [];
$.each(texts, function(i,v){
if($(v).css('color') == 'rgb(0, 0, 0)')
{
fin.push($(v).text());
console.log($(v).text());
}
});


You can use the 'fin' array as you see fit. I tried it. Does work. :)
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900