Click here to Skip to main content
15,881,882 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I need to extract content controls (Mainly Check and Combo boxes) from RTF files and replace them with their text value.
OpenXML is not an option because the XML part of the Word doc was lost when the content was saved as RTF.

I got as far as the following regex, but it has a problem with the last part.

((\}\{\\field\\fldpriv)([.\s\S]*)FORMCHECKBOX([.\s\S]*)(\{\\fldrslt \}\}))

What this needs to be changed to it find the FIRST occurrence of {\fldrslt }} after FORMCHECKBOX. As it is, it seems to find the last.
I would welcome any suggestion as to how to make this work. I don't want to write code that finds FORMCHECKBOX and then steps back and forth a character at a time to the start and end of the section.

Thanking you in advance.
Posted
Comments
CBadger 9-Oct-14 4:41am    
I am not sure if I understand correctly so will post a comment instead of a solution :-)

Have you tried to incorporate a non-greedy match in the regex?

1 solution

I have done some research and believe that the problem is indeed that there is no non-greedy matching used and so it will search the entire value even after a match is found. The correct term would be to make the regex matching lazy

Try using it like this:

((\}\{\\field\\fldpriv)([.\s\S]*?)FORMCHECKBOX([.\s\S]*?)(\{\\fldrslt \}\}))

A great tool to test this can be found here
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900