Click here to Skip to main content
15,867,453 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
I have this document:https://mega.nz/file/EgIXQCCS#V3SjNh9-H32MCjMqQvRIGjXfj6qxlrPmLdwdIQcAxWQ

I want to read all the paragraphs of the document and I have a loop where I iterate to read the paragraphs and when it reaches the pages 2552-2620 / 2695-2954 slows down dramatically and can take days to read all the paragraphs in the file.

I have been able to know how to retrieve the paragraphs of a specific page, to skip the number of paragraphs that it contains, but the paragraphs on the page do not match the paragraphs in the document's paragraph loop.

The content of those pages does not interest me because they are tables. If possible, I would like to skip those pages. any solution?

C#
Application application = new Application();
     Document document = application.Documents.Open("C:\\word.doc");
            
     foreach (Microsoft.Office.Interop.Word.Paragraph MyParagraph in document.Paragraphs)
     {   int  Page = MyParagraph.Range.Information[Microsoft.Office.Interop.Word.WdInformation.wdActiveEndPageNumber];
            
         if ((Page >= 2552 && Page <= 2620) || (Page >= 2695 && Page <= 2954))
         {    
         }
      }
     application.Quit();


As soon as the foreach loop reaches paragraphs that are on those pages, it practically stops. And they are hundreds of pages.

What I have tried:

everywhere and on the internet.
Posted
Updated 13-Nov-21 4:42am
v2

1 solution

You could try to restrict to the pages that you're interested in before enumerating them:
C#
var flag = Microsoft.Office.Interop.Word.WdInformation.wdActiveEndPageNumber;

var desiredParagraphs = document.Paragraphs.Where(p => {
   int endPage = p.Range.Information[flag];
   return (endPage < 2552) || ((endPage > 2620) && (endPage < 2695)) || (endPage > 2954);
});

foreach (Microsoft.Office.Interop.Word.Paragraph MyParagraph in desiredParagraphs)
{
   // ...
}

Hopefully that will save some enumerations, maybe enough to get rid of the performance issue.
 
Share this answer
 
v4
Comments
Member 14890678 13-Nov-21 10:40am    
in Vs i get Error. Does not support Where. CS1061 Paragraphs does not contain a definition for Where ...
Also, the problem is that I want to save time by not consulting the paragraph page. And the lambda command queries it, so nothing is gained.
phil.o 13-Nov-21 10:47am    
You should add a directive to use System.Linq namespace, to get rid of the nasty cs1061. Then, you should have a go and test, anyway, because linq may not work as you expect :) The lambda gives a mean to the iterator to skip those items which do not satisfy the predicate.
Member 14890678 13-Nov-21 22:45pm    
I have added using System.Linq and I get the same error.
phil.o 14-Nov-21 3:48am    
I strongly doubt that you get a cs1061 telling there is no Where method on Paragraphs, if System.Linq is imported. Paragraphs properties implements IEnumerable.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900