Click here to Skip to main content
15,867,568 members
Articles / Programming Languages / C++/CLI
Article

xp_regex: Regular Expressions in SQL Server 2000

Rate me:
Please Sign up or sign in to vote.
4.95/5 (33 votes)
6 Oct 20035 min read 463.8K   2.6K   86   94
This is a SQL Server 2000 Extended Stored Procedure writted in Managed C++. It allows you to use regular expressions in T-SQL.

Introduction

Regular Expressions are probably the best tool for text parsing. Using Regular Expressions GREATLY simplifies anything for which you would normally use CHARINDEX, PATINDEX, SUBSTRING, STUFF, REPLACE, etc.

xp_regex is an Extended Stored Procedure for SQL Server 2000 written in Managed C++ that lets you use Regular Expressions from T-SQL. In case you’re skeptical about the performance when mixing non-.NET code (in this case SQL Server) and .NET code in the same process, don't be. I've written a non-.NET version and the performance is about the same (although I really haven't done any exhaustive testing). If you have other problems that are preventing you from using the .NET Framework on your SQL Server, you can get the non-.NET version, <A href="http://www.codeproject.com/database/xp_pcre.asp">xp_pcre</A>, which is also posted on Code Project.

Note to users of .NET 1.0: This DLL will not work on SQL Servers that are running with the /3GB switch in the boot.ini file. You'll probably get an error like:

Cannot load the DLL xp_regex.dll, or one of the DLLs it references.
Reason: 1114(A dynamic link library (DLL) initialization routine failed.).

This limitation has been removed in .NET 1.1.

Also note: There are some issues associated with running managed code in-process with SQL Server. Check out Clemens Vasters' blog entry here. Also, check out this MSKB article which discusses extended stored procedures written using .NET.

I can tell you that I have been running this on several servers since I wrote it and have not had any problems. The servers I run it on are used more for reporting than OLTP-style transactions. I'm not pushing either the CPU or RAM to the limit and am not using fibers. This may account for the fact that I have not encountered any issues.

Overview

There are four Extended Stored Procedures in the DLL:

  • xp_regex_match
  • xp_regex_format
  • xp_regex_split
  • xp_regex_replace

XP_REGEX_MATCH

Syntax:

xp_regex_match @input, @regex, @result OUTPUT

@input is the text to match against.
@regex is the regular expression to match with.
@result is an output parameter that will hold either 'Matched', 'Failed' or NULL in the case of an invalid regex.

All parameters are either (N)VARCHAR or (N)CHAR of any ength. ((N)TEXT might work too, haven't tried it.)

xp_regex_match is used to check an input string against a regular expression. It will return either 'Matched' or 'Failed'. If the regular expression was not valid, it will return NULL.

XP_REGEX_FORMAT

Syntax:

xp_regex_format @input, @regex, @format_string, @result OUTPUT

@input is the text to parse.
@regex is the regular expression to match.
@format_string is used to format the results.
@result is an output parameter that will hold the formatted results.

xp_regex_format is used to parse an input string and format the results. Probably the best example to demonstrate is by doing some telephone number parsing. I’ll assume you know the Regular Expression syntax for this documentation.

The regex [^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4}) will parse just about any phone-number-like string you throw at it. For instance, this code:

SQL
DECLARE @out VARCHAR(50)

EXEC xp_regex_format '(310)555-1212', '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
    '($1) $2-$3', @out OUTPUT
PRINT @out

EXEC xp_regex_format '310.555.1212', '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
    '($1) $2-$3', @out OUTPUT
PRINT @out

EXEC xp_regex_format ' 310!555 hey! 1212',
    '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
    '($1) $2-$3', @out OUTPUT
PRINT @out

EXEC xp_regex_format ' hello, ( 310 ) 555.1212 is my phone number. Thank you.',
    '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})', '($1) $2-$3', @out OUTPUT
PRINT @out

prints out:

(310) 555-1212
(310) 555-1212
(310) 555-1212
(310) 555-1212

XP_REGEX_SPLIT

Syntax:

xp_regex_split @input, @regex, @column_number, @result OUTPUT

@input is the text to parse.
@regex is a regular expression that matches the delimiter.
@column_number: We’re basically doing a "text-to-columns" here, so @column_number lets you specify which of the resulting columns should be passed back in the @result parameter.
@result is an output parameter that will hold the formatted results

@input, @regex and @result are either (N)VARCHAR or (N)CHAR of any length.

@column_number is an INT. Columns are numbered starting at 1.

This function splits text data on some sort of delimiter (comma, pipe, whatever). The cool thing about a split using regular expressions is that the delimiter does not have to be as consistent as you would normally expect.

For example, take this line as your source data:

one ,two|three : four

In this case, our delimiter is either a comma, pipe or colon with any number of spaces either before or after (or both). In regex form, that is written: \s*[,|:]\s*.

For example:

SQL
DECLARE @out VARCHAR(8000)

EXEC xp_regex_split 'one  ,two|three  : four', '\s*[,|:]\s*', 1, @out OUTPUT
PRINT @out

EXEC xp_regex_split 'one  ,two|three  : four', '\s*[,|:]\s*', 2, @out OUTPUT
PRINT @out

EXEC xp_regex_split 'one  ,two|three  : four', '\s*[,|:]\s*', 3, @out OUTPUT
PRINT @out

EXEC xp_regex_split 'one  ,two|three  : four', '\s*[,|:]\s*', 4, @out OUTPUT
PRINT @out

prints out:

one  
two
three  
four

Please note that in the case above, there is no performance penalty for running the same split more than once. xp_regex_split caches both the input string and the regular expression, so calling it multiple times while changing only the @column_number parameter is perfectly fine. The actual split is only done the first time and the rest is pulled from cache.

XP_REGEX_REPLACE

Syntax:

xp_regex_replace @input, @regex, @replacement, @result OUTPUT

@input is the text to parse.
@regex is a regular expression.
@replacement will replace every piece of text that matches the regex
@result is an output parameter that will hold the results.

For example, this is how you would remove all whitespace from an input string:

SQL
DECLARE @out VARCHAR(8000)
EXEC xp_regex_replace ' one  two    three four ', '\s*', '', @out OUTPUT
PRINT '[' + @out + ']'

prints out:

[onetwothreefour]

To replace all numbers (regardless of length) with "###":

SQL
DECLARE @out VARCHAR(8000)
EXEC xp_regex_replace '12345 is less than 99999, but not 1, 12, or 123', 
                      '\d+', '###', @out OUTPUT
PRINT @out

prints out:

### is less than ###, but not ###, ###, or ###

FN_XP_REGEX_MATCH, FN_XP_REGEX_FORMAT, FN_XP_REGEX_SPLIT and FN_XP_REGEX_REPLACE

These are user-defined functions that wrap the stored procedures. This way you can use the function as part of a SELECT list or a WHERE clause:

SQL
USE pubs
GO

SELECT dbo.fn_regex_format(
  phone_number,
  '[^\d]*(\d{3})[^\d]*(\d{3})[^\d]*(\d{4})',
  '$1 $2 $3'
  )

FROM
  authors

This would format every phone number in the "authors" table.

Please note that FN_XP_REGEX_MATCH, returns a 1 or 0 instead of 'Matched' or 'Failed' (like XP_REGEX_MATCH does). You can certainly change this back in the INSTALL.SQL code. Also note, you'll need to create the UDFs in each database that you call them from.

Installation

  1. Copy xp_regex.dll to your Program Files\Microsoft SQL Server\MSSQL\bin folder.
  2. Run the SQL script INSTALL.SQL. This will register the procedures and create the user-defined functions.

Performance Counters

I've created a set of Performance Counters (the Performance Object is called "xp_regex"). You can use the Windows 2000 System Monitor to see various statistics about xp_regex while it is running.

If you want to avoid the small overhead of the performance counters, just comment out this line in PerfCounters.h and rebuild:

#define XP_REGEX_USING_PERF_COUNTERS

Misc

Comments/corrections/additions are welcome. Please let me know if you find this useful! Thanks!

History

  • 06 Oct 03 - Included a version of xp_regex.dll that was compiled without the performance counters. Added the warning about running managed code in-process with SQL Server.
  • 19 Jul 03 - Complete refactoring to a more object-oriented design. Also added support for Unicode parameters. Changed the OUTPUT parameter to Unicode. Greatly improved parameter checking and error handling.
  • 07 May 03 - Rebuilt using Visual Studio .NET 2003 and .NET Framework 1.1
  • 29 Mar 03 - Updated download

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
Web Developer
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
Questionmssql 2008 express x64 Pin
Lavrinenko Evgeni16-Feb-12 21:28
Lavrinenko Evgeni16-Feb-12 21:28 
Questionxp_regex in SQL2008 x64 Pin
Jeromos.hu21-Mar-11 14:00
Jeromos.hu21-Mar-11 14:00 
GeneralCould not load the DLL xp_pcre.dll, or one of the DLLs it references. Reason: 126(The specified module could not be found.). Pin
Venkat1.V18-Aug-09 21:40
Venkat1.V18-Aug-09 21:40 
QuestionUsing Split Pin
HiMik200314-Aug-09 2:03
HiMik200314-Aug-09 2:03 
GeneralBroken Link Pin
Tim_Higgison6-Aug-09 21:29
Tim_Higgison6-Aug-09 21:29 
Generalxp_regex_replace Pin
Calvin Wrightsman28-Jan-09 8:27
Calvin Wrightsman28-Jan-09 8:27 
GeneralAwesome, I thought this was built in to SQL 2005 Pin
philmee9524-Sep-08 14:53
philmee9524-Sep-08 14:53 
QuestionCould not load the DLL....SQL 2005! Pin
Almir Martinelli16-Sep-08 9:14
Almir Martinelli16-Sep-08 9:14 
AnswerRe: Could not load the DLL....SQL 2005! Pin
philmee9524-Sep-08 14:50
philmee9524-Sep-08 14:50 
QuestionRegEx to insert data every so many characters? Pin
Member 56973910-Jun-08 3:15
Member 56973910-Jun-08 3:15 
QuestionIs there a 64bit version of the xp_pcre.dll (regular expressions in SqlServer2005)? Pin
Jaspers Yvo14-May-08 6:07
Jaspers Yvo14-May-08 6:07 
QuestionSQL Message Field Type Pin
m_r4-Jun-07 8:47
m_r4-Jun-07 8:47 
QuestionWhat REGX format dose this code follow? Pin
81u3ph14m318-May-06 21:04
81u3ph14m318-May-06 21:04 
AnswerRe: What REGX format dose this code follow? Pin
Garth J Lancaster18-May-06 21:16
professionalGarth J Lancaster18-May-06 21:16 
QuestionShould xp_regex work with .SQL Server 2005 Pin
netedk23-Mar-06 8:39
netedk23-Mar-06 8:39 
GeneralRe: Should xp_regex work with .SQL Server 2005 Pin
ajwilson229-Oct-06 21:10
ajwilson229-Oct-06 21:10 
GeneralRe: Should xp_regex work with .SQL Server 2005 Pin
dombrowsky10-Oct-06 6:54
dombrowsky10-Oct-06 6:54 
QuestionRoman numerals Pin
VeroVige17-Mar-06 5:14
VeroVige17-Mar-06 5:14 
AnswerRe: Roman numerals Pin
Dan Farino17-Mar-06 5:25
Dan Farino17-Mar-06 5:25 
Generalerror message.... Pin
Mathew Arakal10-May-05 15:03
Mathew Arakal10-May-05 15:03 
GeneralRe: error message.... Pin
Dan Farino10-May-05 15:08
Dan Farino10-May-05 15:08 
GeneralFunctions returns 0 Pin
Maxime Ducharme14-Feb-05 9:14
sussMaxime Ducharme14-Feb-05 9:14 
GeneralRe: Functions returns 0 Pin
Dan Farino14-Feb-05 9:24
Dan Farino14-Feb-05 9:24 
GeneralRe: Functions returns 0 Pin
Maxime Ducharme14-Feb-05 9:40
sussMaxime Ducharme14-Feb-05 9:40 
GeneralRe: Functions returns 0 Pin
Maxime Ducharme15-Feb-05 9:14
sussMaxime Ducharme15-Feb-05 9:14 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.