I'm just basically using C# to generate an array for my C++ code to traverse. The C# code is a console application.
I feed it a regular expression on the command line and it produces a small amount of C++ code to declare an array as its output - for example:
int16_t dfa_table[] = {
-1, 1, 6, 1, 34, 34, -1, 1, 12, 1, 117, 117, -1, 1, 18,
1, 110, 110, -1, 1, 24, 1, 105, 105, -1, 1, 30, 1, 120, 120,
-1, 1, 36, 1, 116, 116, -1, 1, 42, 1, 105, 105, -1, 1, 48,
1, 109, 109, -1, 1, 54, 1, 101, 101, -1, 1, 60, 1, 34, 34,
-1, 1, 66, 1, 58, 58, 0, 0
};
That's a DFA table. What it is is a state machine encoded into an array. I have C++ code that can walk it in order to run the regular expression. The walking code is easy and efficient. Generating the array is not easy.
That C# console application uses a regular expression engine I wrote (in C#) in order to generate that C++ array. The code to run the regular expression is simple and is in C++:
bool match(const int16_t* dfa, int16_t(read_cb)(void*), void* cb_state = nullptr) {
int tlen;
int tto;
int prlen;
int pmin;
int pmax;
int i;
int j;
int ch;
int state = 0;
bool done;
bool found = false;
int acc = -1;
ch = read_cb(cb_state);
while (ch != -1) {
acc = -1;
done = false;
while (!done) {
start_dfa:
done = true;
acc = dfa[state++];
tlen = dfa[state++];
for (i = 0; i < tlen; ++i) {
tto = dfa[state++];
prlen = dfa[state++];
for (j = 0; j < prlen; ++j) {
pmin = dfa[state++];
pmax = dfa[state++];
if (ch < pmin) break;
if (ch <= pmax) {
found = true;
ch = read_cb(cb_state);
state = tto;
done = false;
goto start_dfa;
}
}
}
}
if (acc != -1) {
return found;
}
ch = read_cb(cb_state);
state = 0;
}
return false;
}
To err is human. Fortune favors the monsters.
|