I have been doing a little research on regular expression support in the .NET framework and have been impressed with my findings. As a developer, regular expressions are a very useful tool to keep in the arsenal. They can help programs efficiently search text and they can be used to validate input (think phone numbers or email addresses, among other things).
Two great sources of information for expression syntax are regular-expressions.info and MSDN. These two sources help make sense of the metacharacters, escapes, anchors, groups, and other artifacts of a well-formed expression.
The syntax for searching a string
using the .NET System.Text.RegularExpressions.Regex class is very straight forward and the class seems very efficient when compared with other string
search methods. If you drop the following into a console application, you can compare the time needed to find the string
“simple text
” in the string
represented by the “ipsum
” string
variable. In this example, I generated the ipsum
variable using the itools Lorem Ipsum generator. I generated a single string
250,000 words long.
var ipsum =
var stopwatch = new Stopwatch();
Regex RegexSearch = new Regex("simple text", RegexOptions.None);
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
RegexSearch.IsMatch(ipsum);
}
stopwatch.Stop();
Console.WriteLine("Elapsed time with Non-compiled Regex: " + stopwatch.ElapsedMilliseconds);
Regex CompiledRegexSearch = new Regex("simple text", RegexOptions.Compiled);
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
CompiledRegexSearch.IsMatch(ipsum);
}
stopwatch.Stop();
Console.WriteLine("Elapsed time with Compiled Regex: " + stopwatch.ElapsedMilliseconds);
stopwatch.Restart();
for (int i = 0; i < 1000; i++)
{
ipsum.Contains("simple text");
}
stopwatch.Stop();
Console.WriteLine("Elapsed time with Contains: " + stopwatch.ElapsedMilliseconds);
Console.ReadLine();
The run times were pretty telling. As you can see from this output, when searching a string
with 250,000 words of random length, the non-compiled regex search was much faster:
Writing regular expressions can be complicated but there are some really useful tools for use in their generation. The defacto tool right now seems to be RegexBuddy (for a $40 license fee). Also, Roy Osherove has several regular expression tools available at his website. Finally, you can search for community submitted expressions at regexlib.com, where there are many, many examples from which to choose.
Your mileage (and mine) may vary from project to project, but using the Regex class to dig through large amounts of text is definitely worth considering.
CodeProject
I'm a learner/coder/leader who is curious about how technologies and people work together to solve interesting problems. I have a passion for software and doing what I can to improve the lives of the people who create and use it.