Introduction
The classes in this project allow you to parse text expressions entered by a user and compile them to a .NET assembly. This assembly can be executed on the fly, or saved to a DLL. Pre-compiling expressions allows for a high level of portability, and enables us to evaluate user entered logic extremely efficiently. In addition, we can use Microsoft’s ildasm.exe tool to open and inspect the underlying MSIL code being generated.
There are a lot of cool features that come with the .NET framework, but for my money, the Reflection.Emit
namespace offers about as much bang for your geek buck as you can find. The Reflection.Emit
namespace allows you to create your own .NET code at runtime by dynamically creating .NET types and inserting MSIL instructions into the body.
MSIL is Microsoft’s intermediate language for the .NET framework. IL is what your C# and VB.NET code gets compiled into and is sent to the JIT compiler when .NET programs are run. MSIL is a very low level language that is very fast, and working with it gives you exceptional control over your programs. I won’t go into depths about MSIL in this article, but there are several other resources available on the web, and if you are interested in learning more, I have included some links at the end of this article.
Background
Let’s have a quick overview of what our parser/compiler will be doing. The user will enter a string expression that matches our parser’s grammar. This expression will be turned into a tiny .NET program that will run and output the result.
To do this, the parser will read in the sequential list of characters, and break it down into a hierarchal parse tree as seen below. The nodes are evaluated in the order shown. When a node is matched, the appropriate instruction is called for that node type. For instance, when a number is matched, we push that number onto the stack. When the “*” token is matched, we call the multiply instruction, and so on. Adding up all the instructions in their proper order gives us the “program” seen to the right.
Expression: 3 * 2 + 1
Now, let’s take a look at how our program executes and compare it to the original text expression. The first two instructions push the integers 3 and 2 on to the stack. The multiply instruction pops those two values off the stack, multiplies them, and pushes the product 6 back onto the stack. Instruction #4 pushes the integer 1 onto the stack. Instruction #5 pops those two values (6 and 1), adds them, and pushes the result (7) back on the stack. Finally, the return command pops the value 7 off the stack and returns it as the result.
Brilliant! This may seem simple and obvious to most computer programmers, but this clever idea is pretty much the foundation for all programming and compiling, and I think it is worth a look. Here is what this program would look like in MSIL. For example, the ldc.r8
represents the load constant instruction, and loads the double 3.0 onto the stack.
IL_0000: ldc.r8 3.
IL_0009: ldc.r8 2.
IL_0012: mul
IL_0013: ldc.r8 1.
IL_001c: add
IL_0023: ret
Using the code
This project contains two classes for parsing the expression and compiling it into MSIL. The first class is RuleParser
, which is an abstract parsing class that contains all the lexing and parsing logic for our particular grammar. This class parses the statement but doesn’t take any actions. The code excerpt below shows that when the ttAdd
token is found, the parser calls the matchAdd()
method which is an abstract method defined on the RuleParser
class. It is up to the concrete class to implement the method body and the corresponding semantic action.
This pattern allows us to implement a separate concrete class to handle semantic actions, and means that we can implement different concrete classes depending on what we are trying to accomplish. This code was previously setup to evaluate expressions on the fly by calculating nodes as soon as they were found. We can now swap in our MsilParser
to compile the expression to an IL program using the same parsing class.
The MsilParser
does this by implementing all of the necessary token functions and emitting the appropriate IL instructions. For example, the matchAdd()
function simply inserts an Add command. When a variable is matched, we load the variable name with the Ldstr
instruction, then call the GetVar
method.
protected override void matchAdd()
{
this.il.Emit(OpCodes.Add);
}
protected override void matchVar()
{
string s = tokenValue.ToString();
il.Emit(OpCodes.Ldstr, s);
il.Emit(OpCodes.Call, typeof(MsilParser).GetMethod(
"GetVar", new Type[] { typeof(string) }));
}
Once all of the tokens have been setup, we can call the CompileMsil()
method of our MsilParser
class which runs the parser and returns the compiled .NET type using the AssemblyBuilder
classes in the Relection.Emit
namespace.
public Type CompileMsil(string expr)
{
string assemblyName = "Expression";
string modName = "expression.dll";
string typeName = "Expression";
string methodName = "RunExpression";
AssemblyName name = new AssemblyName(assemblyName);
AppDomain domain = System.Threading.Thread.GetDomain();
AssemblyBuilder builder = domain.DefineDynamicAssembly(
name, AssemblyBuilderAccess.RunAndSave);
ModuleBuilder module = builder.DefineDynamicModule
(modName, true);
TypeBuilder typeBuilder = module.DefineType(typeName,
TypeAttributes.Public | TypeAttributes.Class);
MethodBuilder methodBuilder = typeBuilder.DefineMethod(methodName,
MethodAttributes.HideBySig | MethodAttributes.Static
| MethodAttributes.Public,
typeof(Object), new Type[] { });
ILGenerator ilGenerator = methodBuilder.GetILGenerator();
this.il = ilGenerator;
this.Run(expr);
this.il.Emit(OpCodes.Conv_R8);
this.il.Emit(OpCodes.Box, typeof(Double));
this.il.Emit(OpCodes.Ret);
Type myClass = typeBuilder.CreateType();
builder.Save(modName);
return myClass;
}
The end result is a .NET assembly that can be executed, cached, or saved to disk. Here is a look at the IL code for our method that was created by our compiler:
.method public hidebysig static object
RunExpression() cil managed
{
.maxstack 2
IL_0000: ldc.r8 3.
IL_0009: ldc.r8 2.
IL_0012: mul
IL_0013: ldc.r8 1.
IL_001c: add
IL_001d: conv.r8
IL_001e: box [mscorlib]System.Double
IL_0023: ret
}
The main benefit of this approach is that parsing the expression takes much longer than just executing the instructions. By pre-compiling the expression to IL, we only need to parse the expression once instead of every time it’s evaluated. Although this example only uses one expression, a real implementation could involve thousands of expressions precompiled and executed on demand. In addition, we also have our code packaged up in a nice .NET DLL that we can do whatever we want with. This example can be evaluated over 1 million times in less than 3 one hundredths of a second!
Using the Sample Project
The sample project allows you to enter an expression in the top left textbox. When you click Parse, the form will parse the expression and create a .NET assembly with your compiled code in the RunExpression()
function. The program will then call that function the specified number of times, and show how long it took to execute. Finally, the program will save the assembly as expression.dll and run Microsoft’s ildasm.exe to output the full MSIL code for the assembly so you can see the code that was generated for your program.
Points of interest
How our dynamic method gets called will considerably affect performance. For example, simply using the Invoke()
method on a Dynamic Method will dramatically slow down the performance when called 1 million times. Using a generic delegate signature, like in the code below, gives us about 20X better performance.
MsilParser em = new MsilParser();
Type t = em.CompileMsil(textBox1.Text);
MethodInfo m = t.GetMethod("RunExpression");
Delegate d = Delegate.CreateDelegate(
typeof(MsilParser.ExpressionInvoker<Object>), m);
MsilParser.ExpressionInvoker<Object> method =
(MsilParser.ExpressionInvoker<Object>)d;
Object result = method();
Calling ILDASM.EXE
The sample project will also let you view the entire MSIL code for your newly created assembly. It does this by calling ildasm.exe in the background and outputting the result to a textbox. The ildasm.exe is a very useful tool for anyone working with IL code or the System.Reflection.Emit
namespace. The code below shows how to use this executable in your program using the System.Diagnostics
namespace. Checkout Microsoft’s documentation on ildasm.exe in the links below.
string modName = "expression.dll";
Process p = new Process();
p.StartInfo.FileName = "ildasm.exe";
p.StartInfo.Arguments = "/text /nobar \"" + modName;
p.StartInfo.UseShellExecute = false;
p.StartInfo.CreateNoWindow = true;
p.StartInfo.RedirectStandardOutput = true;
p.StartInfo.WindowStyle = ProcessWindowStyle.Hidden;
p.Start();
string s = p.StandardOutput.ReadToEnd();
p.WaitForExit();
p.Close();
txtMsil.Text = s;
Links
Stephen Marsh has over 10 years of experience developing enterprise applications built on the .Net framework. He specializes in building expert systems that serve the financial industry.