CodeDomUtility: Abbreviate Your Code Generation

honey the codewitch

5.00/5 (2 votes)

Nov 27, 2019

MIT

5 min read

5954

134

A helper class that dramatically reduces the amount of code you need to write for the CodeDOM

Download source code - 6.2 KB

Introduction

I find myself doing a lot of code generation with Microsoft's CodeDOM. That usually involves creating a reference implementation of the target code, and then "porting" it to CodeDOM constructs, and then mixing in the dynamic/generation bits.

Anyone that has ever used the CodeDOM can attest to the repetitive-strain inducing object model. It's so verbose it absolutely kills your wrists and fingers. Worse, it's really hard to understand what you wrote because of all the nested constructs that end up getting you lost.

We don't aim to solve all of the CodeDOM problems here, but we'll be using a drop-in bit of source to ease our burden. The code isn't flashy - it's mostly just wrappers, but they are useful. A big part of the goal, aside from saving typing, is to make your code generation look more like the code it is generating.

Using the Code

First let's look at an extreme, but real world example I use in one of my projects:

using CD = CDU.CodeDomUtility;
...
var result = CD.Method(typeof(bool), "_MoveNextInput");
var input = CD.FieldRef(CD.This, "_input");
var state = CD.FieldRef(CD.This, "_state");
var line = CD.FieldRef(CD.This, "_line");
var column = CD.FieldRef(CD.This, "_column");
var position = CD.FieldRef(CD.This, "_position");
var current = CD.PropRef(input,"Current");
result.Statements.AddRange(new CodeStatement[] {
    CD.If(CD.Invoke(input,"MoveNext"),
        CD.IfElse(CD.NotEq(state,CD.Literal(_BeforeBegin)), new CodeStatement[] {
            CD.Let(position,CD.Add(position,CD.One)),
            CD.IfElse(CD.Eq(CD.Literal('\n'),current),new CodeStatement[] {
                CD.Let(column,CD.One),
                CD.Let(line,CD.Add(line,CD.One))
            },
                CD.IfElse(CD.Eq(CD.Literal('\t'),current),new CodeStatement[]
                {
                    CD.Let(column,CD.Add(column,CD.Literal(_TabWidth)))
                },
                    CD.Let(column,CD.Add(column,CD.One))
                )
            )
        },
        CD.IfElse(CD.Eq(CD.Literal('\n'),current),new CodeStatement[] {
            CD.Let(column,CD.One),
            CD.Let(line,CD.Add(line,CD.One))
        },
            CD.If(CD.Eq(CD.Literal('\t'),current),
                CD.Let(column,CD.Add(column,CD.Literal(_TabWidth-1))))
            )
        ),
        CD.Return(CD.True)
    ),
    CD.Let(state,CD.Literal(_InnerFinished)),
    CD.Return(CD.False)
});
return result;

It kind of looks like hell, but it's surprisingly easy to get used to. Take a minute with it and notice how the nested structure mirrors the nested structure of the code it is generating, and the statements look more declarative - closer to what an actual language would be, like for example CD.Let(...) versus its counterpart, new CodeAssignStatement(...). There's nothing stopping you from declaring a statement at a time if you prefer though, or intermingling CodeDOM object creation expressions with this. I just feel like this is a bit easier to read and write, once it clicks. It's vaguely LISPy.

There's nothing dynamic in the above. It generates a static bit of code. In fact, the only reason to use the CodeDOM above is so that this static code can be emitted in any CodeDOM provider language. Here's the code it generates (in C#) for comparison:

bool _MoveNextInput() {
    if (this._input.MoveNext()) {
         {
            this._position = (this._position + 1);
            if (('\n' == this._input.Current)) {
                this._column = 1;
                this._line = (this._line + 1);
            }
            else {
                if (('\t' == this._input.Current)) {
                    this._column = (this._column + 4);
                }
                else {
                    this._column = (this._column + 1);
                }
            }
        }
        else {
            if (('\n' == this._input.Current)) {
                this._column = 1;
                this._line = (this._line + 1);
            }
            else {
                if (('\t' == this._input.Current)) {
                    this._column = (this._column + 3);
                }
            }
        }
        return true;
    }
    this._state = -1;
    return false;
}

Even a cursory examination should show the experienced CodeDOM user how much less code was required to generate this. Also, it's easier to read, despite taking some getting used to. It more closely reflects the code it is to generate. Again, you don't have to declare everything inline as I have above, but as I said, I find doing it this way makes the code look closest to the code that's being generated.

You can see sometimes the generated code looks a little weird, what with things like this:

if ((false 
                    == (this._state == -3)))

Don't worry, as this is by design. When you use something like CodeDomUtility.NotEq() it has to do this because there is no proper language independent way to do value inequality. This is as close as it gets. All of the code it generates will be semantically equivalent to what you generate using CodeDomUtility even if the generated code itself looks a little weird. Either way, using this class gives you a slightly better chance of your code actually ending up language independent, particularly where the binary operators are concerned.

Types are set up for easy declaration using the Class(), Struct(), and Enum() methods:

var result = CD.Struct("Token", false,
                CD.Field(typeof(int), "Line", MemberAttributes.Public),
                CD.Field(typeof(int), "Column", MemberAttributes.Public),
                CD.Field(typeof(long), "Position", MemberAttributes.Public),
                CD.Field(typeof(int), "SymbolId", MemberAttributes.Public),
                CD.Field(typeof(string), "Value", MemberAttributes.Public));

It isn't required to fill in fields, methods or properties in this method itself - you can always do so later, but I find using this mechanism to create members "inline" as above to be particularly helpful as it saves a lot of typing.

This utility class has a very powerful feature buried in it underneath CodeDomUtility.Literal(). Unlike CodePrimitiveExpression(), this will even instantiate arrays, including nested arrays, and with some prep work, using Microsoft's TypeConverter framework with InstanceDescriptor, you can tell it how to instantiate your complex classes and structs. Just be careful instantiating arrays of generics like KeyValuePair<TKey,TValue>[] because there's a bug in Microsoft's VBCodeProvider that prevents it from rendering arrays of generics correctly. However, C# will do it just fine, but of course then you lose language independence.

I typically use the array rendering when I need to do things like render state machine tables or parse tables. It's one call:

var array = new string[] {"Hello","World"};
...
// generate the expression to instantiate our array 
var expr = CD.Literal(array);
// render the code in (default) C#
Console.WriteLine(CD.ToString(expr));

This will output:

new string[] {
        "Hello",
        "World"}

I've found using this to be extremely expedient for the complicated nested array structures I typically end up generating (I generate a lot of table driven code). In some cases, it saves me several dozen lines of code, although that's an extreme case. Probably 90% of it will be used to render scalar types and non-nested arrays with scalar element types like the above, but it still saves your fingers some, and for the complex things, it's a life saver.

You can use CD.ToString() and pass a code object and an optional language parameter to painlessly render it.

All of the code is doc commented so you should be able to figure out what each method does by Intellisense. 90% of it or so is just shorthand wrappers. The other 10% is Literal() serialization code and an internal helper for creating binary operator expressions in series. I might add more functionality down the road but for now, I prefer to keep it simple and to the point, unlike the CodeDOM itself.

I haven't included demo code because this is basically a preview of some code in an upcoming project I'm submitting, and creating a code generator even as a demo is a lot of work, even with helper classes.

Hopefully, between the doc comments and the article, you'll find this useful and usable. If not, wait for my upcoming article on How To Build A Lexer/Tokenizer Generator, which uses this class.

Points of Interest

The goal of making the CodeDOM code more like the code it generates wasn't originally part of this project. It's only after I started using it that I realized this was one of the upshots, so once I realized that, I starting coding in that direction. I've found it makes the code generation a whole lot easier to debug. I've even made entire libraries in the CodeDOM in order to make the source available in any CodeDOM language - something I never would have attempted without this little tool.

History

27^th November, 2019 - Initial submission