Click here to Skip to main content
15,867,568 members
Articles / Programming Languages

How I rolled my own simple (DSL) Domain specific Language and you can too!

Rate me:
Please Sign up or sign in to vote.
5.00/5 (4 votes)
7 Aug 2014CPOL7 min read 24.8K   11   10
Simple (DSL) Domain specific Languages

FYI I'm talking about this....

http://www.youtube.com/embed/ZfdAwV0HlEU

Preamble

So a project required me to take the following statement

"C#" or "Vb.net" and Not "Java"

and create a filter based on it that equates to the following logic.

if((file.Contains("C#")||file.Contains("Vb.net"))&&(!file.Contains("Java")))
{
  SearchResult.Add(file);
}

This is what we in the biz refer to as a Mini-Language. Essentially an especially small and (hopefully) easy-to-understand programming language that we use for a very specific purpose, a type of Domain Specific Language(DSL for short). To achieve this we turn to a process called Lexical analysis.

  1. Breaking down the expression into manageable parts. Referred to as the "Scanner" or "Tokenizer"; This is the first half of the lexical analysis process.
  2. Transforming those parts into a form can be evaluated by the system that needs to be interacted with a.k.a. a "Processed Value". (be it db access, programming logic, a modeling schema whatever) Referred to as the "Evaluator" or "Parser" This is the second half of the lexical analysis process.

Why This? Why Now?

There are numerous ways that this can be done (and I was back and forth on a couple) and this is not the first time I've had the pleasure of working through them however, its not terribly often so I'm writing this as a means of solidifying it into my memory or at least having something readily available so I don't "Forget to remember" how to do this sort of thing.

Stuff You should know

The programming logic doesn't contain anything that is overtly language specific(Though I do use linq at spots for filtering out results. This can just amount to looping through a series of value the filtered action based on a specific set of criteria). The most specialized tool in my arsenal for this project is Regular Expressions (an exceptionally powerful string search and evaluation DSL that IMO you should really have some familiarity with as a programmer as they have totally saved my ass a ton of extra time and work in many cases). If your existing language does not support them or something equally powerful you are gonna have a bad time.

The Plan

What will it look like?

Hacking away at this in Linqpad the following represents the way this would be envisioned 

void Main()
{
    //The user input expression
    var expression = "\"java\" and \"SQL\" and \"C#\" or \"VB.NET\""; 
    
    //The collection of tokens that we will use to evaluate the logic in the expression
    var tokens = Tokenize(expression);
    //tokens.Dump();
    
    //Our evaluated representation of the tokens; We can use this to 
    //convert the expression to whatever language we wish
    var ConditionNode = Eval(tokens);
    
    //Transforms our Nodes into our desired programming language
    var endResult = CreateQueryString(ConditionNode );
    
}

So that is our path to paradise.

The Tokenizer AKA Scanner's job is breaking down string expressions into manageable parts or "Tokens".

The Tokenizer AKA Scanner's  job is breaking down string expressions into manageable parts or "Tokens".

The Tokenizer

So begins the tokenization process. It amounts to taking our users expression

"C#" or "Vb.net" and Not "Java"

and splits it into pieces such that we end up with a collection of strings that looks like the following.

  1. C#
  2. or
  3. Vb.net
  4. and
  5. Not
  6. Java

If you still haven't learned Regular Expressions (seriously go freakin learn them), this is a summary or what it does (more or less).
Looks for an optional not followed by whitespace as a capture group to store the "not" then an expression starting with a double, containing any number of characters and and ending with a double quote " (this is our representation of a string) followed optionally by at least one character of whitespace and a logical operation that identifies the relationship of the condition to it sibling on the right if there is one. This will evaluate the entire length of the string and capture each fitting expression accordingly. See line 12.

public static List<string> Tokenize(string expression)
{
    //essentiall replaces double quotes with &quot; and apostrophys with "&apos;"
    //I prefer dealing with searching for &quot; then " as they tend to be a pain in the regex
    var tempExpression = System.Security.SecurityElement.Escape(expression);
    string doubleQuote = "&quot;";
    
    //looks for an optional not and whitespace then an expression starting with a quote 
    //and ending with a quote" our representation of a string
    //followed by whitespace and an optional logical operation that identifies the 
    //relationship of the condition to it sibling on the right if there is one
    var regEx = string.Format("(not\\s+)?{0}(.+?){0}(and|or)?",doubleQuote);
    Regex RE = new Regex(regEx);
    
    //splits the expression by each capture group and trims the result 
    var result = (RE.Split(tempExpression)).Select (r => r.Trim());
    
    //While there shouldnt be any additional whitespace in the expression 
    //i make a second pass at removing it and converting the expression to a list 
    //(so I can use linq on it later)
    return result.Where (re => !String.IsNullOrWhiteSpace(re)).ToList();
}

and with that we have our collection of tokens.

Make the required preparations. For we shall parse as dawn!

So before we can parse our tokens into something of interest we have to decide what data we need to retrieve and what will we need to do with it. Looking once again at our requirements and text

"C#" or "Vb.net" and Not "Java"
if((file.Contains("C#")||file.Contains("Vb.net"))&&(!file.Contains("Java")))
{
  SearchResult.Add(file);
}

So we established before that we were going to need a basic string to hold the content that we were filtering but we also have theNOT (!), OR (||) and AND (&&) to deal with. In computer programming as well as natural language (I.E. the English language in this case) and just about any other Syntax these are referred to as Logical Connectives or more commonly Logical Operators. Logical operators are what allow us communicate multiples of ideas without a lot of additional dialog (i.e. Compound Sentences). Without them the sentence

"Jack and Jill went up the hill."

Becomes

"Jack went up the hill."
"Jill went up the hill."

(UGH.... I think I'd go a bit crazy if people talked like that all the time.)

Logical Operators do a ton more than that but, for our requirements this should be enough to give you an idea of their significance at the core of most any and all forms of language, programming or otherwise.

So with that we have a class with which to parse.

To recap:This is what we have so far is our "condition". What gives our parser its reason for being.  

class Condition
- string expression
- bool HasNot
- LogicalOperator LogicalOperator

To elaborate on the "LogicalOperator" type is just an enumeration of all the possible Logical Operators we are concerned with and looks like this.

enum LogicalOperator
{
  None = 0,
  And = 1,
  Or = 2,
}

Now we can probably stop here and just make a collection of "Conditions" and maybe get away with it for now. However that would not accurately reflect the relationship between the items in the collection and will, more than likely, make implementing our parser (and thus maintenance it) much more difficult in the long run should we want to implement more language features (such as additional operators) to make our baby DSL into a strong and able bodied language one day. Because of that we are going to be building a Tree data structure using Nodes

Nodes .... NODES EVERYWHERE!!!!

For our purposes a node is a data structure that can have a concept of its relationship to its neighbor(s). A group of these nodes with relationships to each other become a tree. Now, as with most of the point of discussion, there are tons of other areas of study and topic of discussion in this area of computer science and I encourage you to continue forward in checking out all the neat stuff people do with nodes and trees but, like so many of the things covered in this post, I aim to keep it simple.

So with that I have here a simple node example in C#

public class Node
{
  public string Value{get;set;}
  public Node Right {get;set;}
}

At its heart that is pretty much it. The more interesting pieces or on how one goes about navigating these things. Not a lot too that either ... in C#

public static Node MoveRight(this Node node)
{
    return node.Right;
}

Note* that this does not account for Null nodes. It will break on a null node and throw a null ref exception. If you indented to navigate to the last node in a tree it would just be a matter of recursion. //goes to the last node in the tree

public static Node GetRightMostNode(Node node)
{
  while(node.Value!=null)
  {
    node = node.MoveRight();
  }
  return node;
}

Same thing if I wanted to get all the values in a tree

public static string GetValues(Node node)
{
  StringBuilder sb = new StringBuilder();
  while (node.Value!=null)
  {
    sb.Append(node.Value);
    node = node.Right;
}
  return sb.ToString();
}

Converting a collection of strings to nodes

public static void RecurseSet(this Node node, List<string> vals)
{
  Node tNode;
  for(var i=0;i&lt;vals.Count();i++)
  {
    tNode = new Node();
    tNode.Value = vals[i];
    node.Right = tNode;
    vals.Remove(vals[i]);
    RecurseSet(tNode,vals);
  }
}

Here is a link to the full example typed up in linqpad.

Taking it Home

Armed with additional node and tree xp lets go back and preform some "augmentations" to our condition class.

We now have...

public class Condition
{
  public int Order {get;set;}
  public string expression {get;set;}
  public LogicalOperator LogicalOperator {get;set;}
  public bool HasNot{get;set;}
  public Condition Right {get;set;}
}

Note* the order was used as a means of validating the order of the node tree from the token collection and is completely optional. The traversal methods are just about the same.

public static Condition MoveRight(this Condition node)
{
    return node.Right;
}

Finally we just need to Transform our tree into whatever form we require. In this case we will be using it to build out our C# expression's if statement .... just because.

public static string CreateCode(Condition node)
{
    StringBuilder sb = new StringBuilder();
    sb.Append("if(");
    while (node!=null)
    {
        if(node.HasNot)
        {
            sb.Append("!");
        }
        
        sb.Append("val.Contains==");

        sb.Append("\""+node.expression+"\" ");

        if(node.LogicalOperator!=LogicalOperator.None)
        {
            sb.Append(GetLogicOperationSymbol(node.LogicalOperator) +" ");
        }
        node = node.Right;

    }
    sb.Append(")");
    sb.AppendLine("");
    sb.AppendLine("{");
    sb.AppendLine("//do stuff");
    sb.AppendLine("}");

    return sb.ToString();
}

public static string GetLogicOperationSymbol(LogicalOperator op)
{
    var retval= "";
    switch (op)
    {
        case LogicalOperator.And:
            retval= "&&";
            break;
        case LogicalOperator.Or:
            retval= "||";
            break;
        case LogicalOperator.Not:
            retval= "!";
            break;
        default:
            break;
    }
    return retval;
}

And that ladies and laddies is but one way of building out your own itty-bitty baby DSL.

Additional things I'd like to point out.

  • Instead of a condition class the entire class could be split into node tokens with even simpler parts. This implementation works for me but you are more than free to experiment with more tree node goodness.
  • There are tools out there such as that preform many of the parsing and tokenizing related tasks mentioned in this post with ease and then some now that you have a handle on a hand rolled implementation.

I first learned about Mini-Languages, DSLs,among a ton of other awesome stuff reading The Art Of Unix Programming. A freely available book by Eric Steven Raymond(one of the godfathers of the open-source movement).

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
CEO Hacker Ferret Software
United States United States
Owner at Hacker Ferret Software
Software Consultant with over 8 years in the trenches.

Specialized in .Net, javascript, and Android development. But adaptable enough for whatever you can dish out. I have a spiritual neck-beard just not a physical one.

For more info about me check out Hacker Ferret Software where we focus on hacking together your software with love.

We now offer a Free 30 Minute Consultation

Comments and Discussions

 
QuestionIrony.Net is also a good alternative Pin
Antonino Porcino8-Aug-14 20:55
Antonino Porcino8-Aug-14 20:55 
QuestionOrder of Evaluation Pin
PeejayAdams8-Aug-14 3:09
PeejayAdams8-Aug-14 3:09 
AnswerRe: Order of Evaluation Pin
musicm1228-Aug-14 4:14
professionalmusicm1228-Aug-14 4:14 
GeneralRe: Order of Evaluation Pin
PeejayAdams8-Aug-14 5:06
PeejayAdams8-Aug-14 5:06 
GeneralRe: Order of Evaluation Pin
musicm1228-Aug-14 5:22
professionalmusicm1228-Aug-14 5:22 
GeneralMy vote of 5 Pin
Vercas7-Aug-14 23:22
Vercas7-Aug-14 23:22 
GeneralRe: My vote of 5 Pin
musicm1228-Aug-14 4:24
professionalmusicm1228-Aug-14 4:24 
GeneralMy vote of 5 Pin
Volynsky Alex7-Aug-14 10:00
professionalVolynsky Alex7-Aug-14 10:00 
GeneralRe: My vote of 5 Pin
musicm1228-Aug-14 6:21
professionalmusicm1228-Aug-14 6:21 
Thank you much!

GeneralRe: My vote of 5 Pin
Volynsky Alex8-Aug-14 7:46
professionalVolynsky Alex8-Aug-14 7:46 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.