Click here to Skip to main content
15,887,585 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
I'm trying to get the text "BlaBla" between the html tags with C#, but I get an error on the two \s regex's.
The text can have any character.

Regex:
C#
Match m = Regex.Match(file, "<h1 class=\"header\"> <span class=\"itemprop\" itemprop=\"name\">\s*(.+?)\s*</span>");



Example html:
HTML
<h1 class="header"> <span class="itemprop" itemprop="name">Text I_need</span> 
            <span class="nobr">(<a href="/year/2013/?ref_=tt_ov_inf" >2013</a>)</span>


Thanks!
Posted

1 solution

It's down to string processing: "\s" is the string you need in the regex, but it is being interpreted as a control code by the C# compiler. Replace it with "\\s" and it should be fine:
C#
Match m = Regex.Match(file, "<h1 class=\"header\"> <span class=\"itemprop\" itemprop=\"name\">\\s*(.+?)\\s*</span>");
(You could prefix the string with and '@' character, but then you would have to double up all the quotes, and remove the backslashes before them...)
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900