Click here to Skip to main content
15,896,154 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi
I would like use regex for parse string. I need get string between <w:r> in string under a question. I know how get data only between <w:t> using pattern <w:t>.*?</w:t>

Problem is because <w:r> have inside another object (<w:t> and <w:rPr>) and I don't know how get whole string between <w:r> and </w:r>.
Can you help me, how can read data between <w:r> and </w:r>?


Thank you

XML
  <w:p w:rsidR="006C121D" w:rsidRDefault="00A462A4">
  <w:pPr>
  <w:rPr>
  <w:lang w:val="en-US" />
  </w:rPr>
  </w:pPr>
  <w:bookmarkStart w:id="0" w:name="_GoBack" />
  <w:bookmarkEnd w:id="0" />
  <w:r>
  <w:rPr>
  <w:lang w:val="en-US" />
  </w:rPr>
  <w:t>TOPIC</w:t>
  </w:r>
  </w:p>


[edit]Code block added - OriginalGriff[/edit]
Posted
Updated 2-Nov-14 0:55am
v7
Comments
Afzaal Ahmad Zeeshan 2-Nov-14 5:57am    
What do you mean between and ? can you use "" to qoute the characters. The better thing is to use <code> tags to view the codes in your post.
OriginalGriff 2-Nov-14 6:01am    
Even with tags encoded, your sample doesn't contain the data you are looking for - so we can't really tell what your problem is.
Perhaps if you show us the raw input data, the regex you use, and the results it gives, together with the result you want we might be able to help?

Use the "Improve question" widget to edit your question and provide better information.
smoula99 2-Nov-14 6:30am    
Hi
How get data between <w:r> and </w:r> using regex?

Thank you
BillWoodruff 2-Nov-14 9:22am    
0. would a solution that used plain-old-C#-programmig, or Linq, rather than a RegEx be okay ?

1. exactly what do you want: a List<string> where each element is the content between <w:r> and </w:r> ?

2. is it possible you will have nested 'wr tags: if so, does that matter ?

3. is it possible you'll have the open 'wr tag, but no matching close 'wr tag ?

First of all, the example you have given is not unstructured text but formatted xml.

I would advice to load the xml document and the read the elements or attributes whichever is applicable.

Google[^] can give you a head start. So, try it out and if you still have issues, questions then come back and someone here would be willing to help!
 
Share this answer
 
v2
Comments
George Jonsson 2-Nov-14 6:14am    
I second that.
smoula99 2-Nov-14 7:39am    
Hi
Yes, this text is XML (from DOCX file). But I don't use XDocument or another class for work with XML, but I need work with this as text (not XML). Because I will work with restriction I must use only Regex.

Best regards
How get data between <w:r> and </w:r> using regex?

Personally, I'd go with Manas' solution and process the file properly as an XML document. The results you will get are likely to be much, much better, and a lot more maintainable.

But...
(?<=<w:r>).*?(?=</w:r>)
Should do it.

If you are going to play with regexes, then get a copy of Expresso [^] - it's free, and it examines and generates Regular expressions. I use it a lot, and wish I'd written it!

[edit]Where the heck did that come from?[/edit]
 
Share this answer
 
v2
Comments
smoula99 2-Nov-14 7:14am    
When I use your pattern, match result is a NULL.

What I doing wrong?
On this site is possible valid result (https://www.myregextester.com/index.php)

Raw Match Pattern:
(?<=<w:r>).*?(?=)
Matches Found:
NO MATCHES.
OriginalGriff 2-Nov-14 7:29am    
NULL is not likely - it's going to depend on exactly what you are doing.
Show the code that uses the Regex!
smoula99 2-Nov-14 7:31am    
When click on this site (https://www.myregextester.com/index.php) options a "C#.NET", this give a source code in C#.
OriginalGriff 2-Nov-14 7:45am    
No, I don't go to random websites! :laugh:
And I need to see the exact code you used to test the regex, not what a different systems regex processor thinks will happen.
smoula99 2-Nov-14 7:48am    
String sourcestring = "source string to match with pattern";
Regex re = new Regex(@"(?<=<w:r>).*?(?=)");
MatchCollection mc = re.Matches(sourcestring);
int mIdx=0;
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[{0}][{1}] = {2}", mIdx, re.GetGroupNames()[gIdx], m.Groups[gIdx].Value);
}
mIdx++;
}

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900