Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / desktop / MFC

DEELX - Regular Expression Engine for C++

4.69/5 (22 votes)
25 Dec 2006CPOL3 min read 1   4.3K  
DEELX regular expression engine is the most convenient and easiest engine to use.

Downloads for C++

Download Unit for Delphi (statically linked into Delphi project)

Download ActiveX for VB

Download Dynamic Link Version

Introduction

DEELX is a simple regular expression engine coded in pure C++.

All source code of DEELX is just only one single header file (deelx.h). Without any other CPP or lib, you need not create a project alone for DEELX when you want to use it, and also you need not worry about link problems.

DEELX has a good compatibility that it can be compiled by Visual C++ 6.0, 7.1, 8.0 (Windows), gcc(Cygwin), gcc(Linux), gcc(FreeBSD), Turbo C++ 3.0(DOS), C++ Builder(Windows), etc. DEELX is coded using template, so char, wchar_t and other simple types can be used as its base type.

DEELX regular expression engine is the most convenient and easiest engine to use.

Features

DEELX supports PERL compatible regular expression syntax. Besides the basic pattern syntax, DEELX has implemented many extended syntaxes:

  • Right to left match mode
  • Named capture group
  • Remark
  • Zero-width assertion
  • Independent expression
  • Conditional expression
  • Recursive expression
  • Replace operation

Ideas

The most important idea of DEELX is the concept of "Element of Regular Expression". In the source code, I call it "ELX".

I regard every kind of element as "Abstract Element" => "ElxInterface". This ElxInterface has two methods: Match() and MatchNext(). Match() means to try to match the first time. If Match() returns true, but what matched is not what you want, call MatchNext() means to discard the result and try to get another successful match. If the result is still not what you want, go on calling MatchNext() till it returns false or you get what you want.

For example, two elements: (.*)(a)

  1. To call the "Match()" method of the first element(.*) will let it match all the text. But now the second element(a) will fail to match, so the match result of the previous "Match()" is not what I want.
  2. The next step is to call the "MatchNext()" method of the first element(.*). This step is also called "backtrack". The first element(.*) will reduce its repeat times, then the second element(a) will again try to match.
  3. So on, one possible final result is that: even the first element(.*) reduced to zero times, the second element still failed to match, so the overall regular expression failed to match.
  4. Another final result is that: when the first element(.*) reduced to a certain times, the second element succeeded to match, so the overall regular expression succeeded.

Match operations of all kinds of elements can be abstracted into "Match()" and "MatchNext()" operations.

That is DEELX's idea.

Demo in C++

C++
#include "deelx.h"

int main(int argc, char * argv[])
{
    // text
    char * text = "12.5, a1.1, 0.123, 178";

    // declare
    static CRegexpT <char> regexp("\\b\\d+\\.\\d+", IGNORECASE | MULTILINE);

    // loop
    MatchResult result = regexp.Match(text);

    while( result.IsMatched() )
    {
        printf("%.*s\n", result.GetEnd() - result.GetStart(), text + result.GetStart());

        // get next
        result = regexp.Match(text, result.GetEnd());
    }

    return 0;
}

Regex flag definition:

C++
enum REGEX_FLAGS
{
 NO_FLAG        = 0,
 SINGLELINE     = 0x01,
 MULTILINE      = 0x02,
 GLOBAL         = 0x04,
 IGNORECASE     = 0x08,
 RIGHTTOLEFT    = 0x10,
 EXTENDED       = 0x20,
};

Wrap for Delphi (Statically Linked into Delphi Project)

Use Borland C++ Builder to compile DEELX into a .obj file, then link this .obj file into a Delphi Unit: DEELX.dcu.

C++
uses
  DEELX;

var
  result:TMatchResult;
  re:TRegexpA;

begin
  result := TMatchResult.Create();
  re := TRegexpA.Create(Edit1.Text, IGNORECASE + MULTILINE); // the 2nd is 'FLAG's

  re.Match(Edit2.Text, result);

  if result.IsMatched() then
  begin
    Edit2.SelStart := result.GetStart();
    Edit2.SelLength := result.GetEnd() - result.GetStart();
  end
  else
  begin
    Edit2.SelLength := 0;
  end;

  re.Destroy;
  result.Destroy;
end;

Regex flags definition:

C++
const
  NO_FLAG        = $00;
  SINGLELINE     = $01;
  MULTILINE      = $02;
  GLOBAL         = $04;
  IGNORECASE     = $08;
  RIGHTTOLEFT    = $10;
  EXTENDED       = $20;

Wrap to ActiveX for VB

Wrap DEELX to an ActiveX plugin, so DEELX can be used in VB or ASP file.

C++
Private pos As Integer
Private re As New RegExLab.RegExp

Private Sub Command1_Click()
    re.Compile (Text1.Text, "igm") ' the 2nd parameter is 'FLAG's

    re.Match Text2.Text, pos

    If re.IsMatched Then
        pos = re.End
        Text2.SelStart = re.Begin
        Text2.SelLength = re.End - re.Begin
    Else
        pos = -1
        Text2.SelLength = 0
    End If
End Sub

The flags are the same as JScript.Regexp:

s  -  SINGLELINE
m  -  MULTILINE
g  -  GLOBAL
i  -  IGNORECASE
r  -  RIGHTTOLEFT
x  -  EXTENDED

DLL Version of DEELX

The DLL version of deelx uses stdcall format for every function, because Visual Basic can call stdcall only.

The demo.zip contains two projects: one is in Visual Basic, the other is in Delphi.

References and Acknowledgements

Homepage - I'm the author, this is the homepage of DEELX.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)