Click here to Skip to main content
15,890,527 members
Articles / Programming Languages / ASM

Anonymus, a New Optimal Language

Rate me:
Please Sign up or sign in to vote.
4.92/5 (129 votes)
6 Mar 2012GPL312 min read 380.9K   2.7K   153  
A new general purpose language that makes programming easier
This is an old version of the currently published article.

Facebook page

Introduction

I developed this language because I don't find the other languages perfect for the things that I want to do. My problem with C++ is that it's slow to develop in it, and doesn't have such features that C# has, but .NET languages are not native and use CIL. This doesn't allow to use standard x86 assembly and maybe also slower.

I have been working on developing Anonymus since March 2010 in C#. Its performance is competitive with C++ while it will be a high level language. The syntax and libraries are similar to C#, C++, so I don't think it would be hard to learn, but I also change syntax if I think it could be better.

Creating Programs with Anonymus

Some functions are used from BlitzMax modules, because I haven't implemented all of the graphical libraries yet. Some of the sample programs use C++ to compare the performance. To run these, you'll need to install MinGW. To compile the programs written in Anonymus, you should run the “As.exe” which is in the “Binaries” directory. In the command line, you can use options and the others will be the input files. They can be Anonymus files, C, C++, Object files, etc. or libraries which start with "-l". For example:

As.exe -x -lBlitzMax -lAnonymus -format app -out Something.exe

The Anonymus library should be always used. The -x option means that the compiler should run the output file, in this case "Something.exe” file. The output format can be app (.exe), obj (.o), or arc (.a).

1st Sample Program: Squares

This sample draws fading squares. When the mouse is over the square, it appears. I have created some constants and an array that contains the fading rate. The Width and Height variables contain how many squares should be in the graphics window:

C++
using System
using BlitzMax

const var SquareSize = 32,
          Width = 24,
          Height = 20

float[Width, Height] Array

The type doesn't have to be specified, use can use var unlike C# where you can't use var for globals and class members and only one declaration is allowed.

There are two helper functions. The first decreases a value, the second returns true if the mouse is in the specified area:

C++
float Decrease(float Val)
    return if Val < 0.02: 0 else Val - 0.02

bool InArea(int x, y, w, h)
    return x <= MouseX() < x + w and y <= MouseY() < y + h

The Update function updates the squares and draws them. If the mouse is on the square, it draws a white square and sets the fading rate to 1 which means it's fully visible, otherwise it draws with a color based on square's position:

C++
void Update()
    for var x, y in 0 .. Width, 0 .. Height
        var Value = Array[x, y]
        var XPos = x * SquareSize
        var YPos = y * SquareSize

        if InArea(XPos, YPos, SquareSize, SquareSize)
            Value = 1
            SetColor 255, 255, 255
            DrawRect XPos, YPos, SquareSize, SquareSize

        else if Value > 0
            var Color = 255 * Value
            var Red = (int)(((float)x / Width) * Color)
            var Greed = (int)(((1 - (float)y / Height)) * Color)
            var Blue = (int)(((float)y / Height) * Color)

            SetColor Red, Greed, Blue
            DrawRect XPos, YPos, SquareSize, SquareSize

        Array[x, y] = Decrease(Value)

The Reset function sets the array to zero, because it can contain any data. The Main function creates a graphics window, and updates it until the user hits the escape key or clicks the exit button:

C++
void Reset()
    for var x, y in 0 .. Width, 0 .. Height
        Array[x, y] = 0

public stdcall void Main()
    Reset
    Graphics SquareSize * Width, SquareSize * Height

    while !KeyHit(Keys.Escape) and !AppTerminate()
        Cls
        Update
        Flip

The for until loop is different from for to loop. It doesn't reach the value at the right side, so subtracting one can be saved in some cases.

2ns Sample Program: Circles

In this sample, I calculate the color of each pixel. This sample's purposes is to compare the performance with C++. It calculates the distance from the middle point of the window and it takes the sine and cosine of it. To make it faster, it creates a smaller image that it draws to, and then it's scaled up.

I made some helper functions. The ArgbColor function combines the four components of a color to a single number. The GetDistance function simply calculates a distance, and the Wrap function cuts down parts of a integer that cannot fit in a byte.

The UpdateImagefunction does the main task of the program. It works as I wrote it in the beginning of the sample.

C++
using System
using BlitzMax

int ArgbColor(byte a, r, g, b)
    return (int)a << 24 | (int)r << 16 | (int)g << 8 | b

double GetDistance(double x1, y1, x2, y2)
    var x = x2 - x1, y = y2 - y1
    return Math.Sqrt(x * x + y * y)

byte Wrap(int x)
    if x < 0: x = 0
    if x > 255: x = 255
    return (byte)x

void UpdateImage(IntPtr Image)
    var Pixmap = LockImage(Image, 0, true, true)
    var Width = ImageWidth(Image)
    var Height = ImageHeight(Image)
    var Time = MilliSecs()

    for var x = 0 until Width
        for var y = 0 until Height
            var RelX = (float)x / Width
            var RelY = (float)y / Height
            var Distance = GetDistance(RelX, RelY, 0.5, 0.5) * 3 - (float)Time / 1000

            var Light = (int)((RelY * 100) * Math.Abs(Math.Sin(Distance / 1.5)))
            var Red = (int)((RelX * 255) * Math.Abs(Math.Cos(Distance))) + Light
            var Green = (int)(((1 - RelY) * 255) * Math.Abs(Math.Sin(Distance))) + Light
            var Blue = (int)((RelY * 255) * Math.Abs(Math.Cos(Distance / 3))) + Light

            var Color = ArgbColor(255, Wrap(Red), Wrap(Green), Wrap(Blue))
            WritePixel Pixmap, x, y, Color

    UnlockImage Pixmap, 0

public stdcall void Main()
    Graphics 1024, 720
    var Image = CreateImage(512, 360, 1, -1)

    while not KeyHit(Keys.Escape) and not AppTerminate()
        Cls
        UpdateImage Image
        DrawImageRect Image, 0, 0, 1024, 720, 0
        DrawFrameStats
        Flip

Here are the performance results (frames per second) on my Intel Core 2 Dou 2.2 Ghz processor:

Image SizeGNU C++ compiler Clang C++ compiler (LLVM)Anonymus
512x3608912
256x180313547

Analyzing the Created Assembly Code

The ArgbColor Function

The ArgbColor function takes 4 byte parameters. These are passed in the al, ah, dl, dh registers, in that order. al is the first byte, ah is the second byte of the eax register. The result will be stored in the eax register too.

The ah register will be overwritten, so it copies it to the cl register. Then it clears the high 3 bytes of the eax register by doing a bitwise and on them and shifts the last byte from blue color place to the alpha. It does almost the same with the others.

ASM
_ArgbColor:
    mov cl, ah
    and eax, 0xFF
    shl eax, 24
    and ecx, 0xFF
    shl ecx, 16
    or eax, ecx
    movzx ecx, dl
    shl ecx, 8
    or eax, ecx
    movzx ecx, dh
    or eax, ecx
    ret

The GetDistance Function

The float numbers can be only stored in memory with FPU, so the parameters will be on the stack. The return value will be on the FPU stack. Anonymus doesn't save the stack pointer to ebp register, it rather uses it to store data, but it makes assembly less readable, because you may don't know the variables' positions. In this function, the parameters starts at esp + 4, because the caller's eip is stored in the stack. It doesn't need to subtract 16 from esp, because no functions are called by GetDistance function.

ASM
_GetDistance:
    fld qword[esp + 20]
    fsub qword[esp + 4]
    fstp qword[esp - 16]
    fld qword[esp + 28]
    fsub qword[esp + 12]
    fstp qword[esp - 8]
    fld qword[esp - 16]
    fmul qword[esp - 16]
    fld qword[esp - 8]
    fmul qword[esp - 8]
    faddp
    fsqrt
    ret 32

The Wrap Function

This is the most simple function. It uses conditional moves. The disadvantage of this is that the cmov instruction's second parameter must be a memory position, and reading memory is slow. But I read the conditional jumps are also slow if we do it a lot because of branch prediction.

ASM
_Wrap:
    cmp eax, 0
    cmovl eax, dword[_11]
    cmp eax, 255
    cmovg eax, dword[_18]
    ret

The Other Functions

I think describing them doesn't make much sense. The only thing I would mention is that in the future, I will add function inlining to improve performance. I suppose G++ does the same.

3rd Sample: Fire

This sample increases values based on the distance to the mouse. These values are stored in a global array called Array. The colors assigned to the them is stored in the Colors array. It uses tuples extensively to reduce code size.

The sample uses some of functions from the previous sample. The GetDistance function is vectorized, it takes two tuples as parameters. The GetValue function returns the value of Array safely, if the parameters are outside of ImageSize, it returns 0.

The Update functions firstly blurs the image by getting the avarage of nearby pixels. It divides with 7.1 instead of 7 to make it fade out. Then it increases the values in Array based on distance to the mouse. UpdateImage function updates the image which will be scaled up to the whole window.

C#
using System
using BlitzMax

const var
    WindowSize = (1024, 720),
    ImageSize = (256, 180),
    RectSize = 35,
    ColorNumber = 768
          
float[ImageSize.0, ImageSize.1] Array
int[ColorNumber] Colors

int ArgbColor(byte a, r, g, b)
    return (int)a << 24 | (int)r << 16 | (int)g << 8 | b

float GetDistance((float, float) P1, P2)
    var x = P1.0 - P2.0, y = P1.1 - P2.1
    return (float)Math.Sqrt(x * x + y * y)
      
byte Wrap(int x)
    if x < 0: x = 0
    if x > 255: x = 255
    return (byte)x

float GetValue(int x, y)
    if 0 < x < ImageSize.0 and 0 < y < ImageSize.1
        return Array[x, y]
        
    return 0f

void Update()
    for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
        var Value = Array[x, y] * 5
        Value += GetValue(x + 1, y + 1) * 0.5
        Value += GetValue(x - 1, y + 1) * 0.75
        Value += GetValue(x - 2, y + 2) * 0.5
        Value += GetValue(x - 3, y + 3) * 0.25
        Array[x, y] = Value / 7.1

    var Mouse = ((float)MouseX(), (float)MouseY()) * ImageSize / WindowSize
    var P1 = (int, int)Mouse - RectSize, P2 = (int, int)Mouse + RectSize
                
    if P1.0 < 0: P1.0 = 0
    if P1.1 < 0: P1.1 = 0
    if P2.0 >= ImageSize.0: P2.0 = ImageSize.0 - 1
    if P2.1 >= ImageSize.1: P2.1 = ImageSize.1 - 1

    for var x, y in P1.0 ... P2.0, P1.1 ... P2.1
        var Distance = 1 - GetDistance(Mouse, (x, y)) / RectSize
        if Distance > 0f
            var NewVal = Array[x, y] + Distance / 10f
            if NewVal > 1f: NewVal = 1f
            Array[x, y] = NewVal

void UpdateImage(IntPtr Image)
    var Pixmap = LockImage(Image, 0, true, true)
    for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
        var Color = Colors[(int)(Array[x, y] * (ColorNumber - 1))]
        WritePixel Pixmap, x, y, Color
    
void Initialize()
    for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
        Array[x, y] = 0f
        
    for var i in 0 .. ColorNumber
        var Value = (float)i / (ColorNumber - 1)
        var Red = (int)((Math.Min(Value, 1f / 3)) * 3 * 255)
        var Green = (int)((Math.Min(Value, 1.65f / 3) - 0.65f / 3) * 3 * 255)
        var Blue = (int)((Math.Min(Value, 1f) - 2f / 3) * 3 * 255)
        Colors[i] = ArgbColor(255, Wrap(Red), Wrap(Green), Wrap(Blue))
 
public stdcall void Main()
    Initialize
    Graphics WindowSize
    
    var Image = CreateImage(ImageSize, 1, -1)
    while not KeyHit(Keys.Escape) and not AppTerminate()
        Cls
        Update
        UpdateImage Image
        DrawImageRect Image, (0, 0), WindowSize, 0
        DrawFrameStats
        Flip

Some Basic Syntax

Literals

The type of integer literals can be specified by a suffix that is a lower case short form of the type name. The hexadecimal numbers have to be upper case in order to distinguish them:

C#
$FFb        '' An unsigned byte
-$1Asb        '' A signed byte
%100        '' A binary number

For loops

C++
for var x, y in 0 ... Width, 0 ... Height

C# equivalent:

C#
for (var x = 0; x <= Width; x++)
    for (var y = 0; y <= Height; y++)

In certain cases you don't want to reach the value at right side. This can be indicated by 2 dots instead of 3:

C++
for var x, y in 0 .. Width, 0 .. Height

C# equivalent:

C#
for (var x = 0; x < Width; x++)
    for (var y = 0; y < Height; y++)

Structures and Enumerations

Basically, these are the same as C#:

C++
struct Rect
    public float X, Y, Width, Height

    '' Constructor: This is called when a Rect is created
    public Rect(float X, Y, Width, Height)
        this.X = X
        this.Y = Y
        this.Width = Width
        this.Height = Height

    public Rect Copy()
        return new Rect(X, Y, Width, Height)

    public float Area:
        get return Width * Height

    public float GetValue(int x)
        switch x
            case 0: return X
            case 1: return Y
            case 2: return Width
            case 3: return Height
            default return 0

static class Console
    public static void PrintInt(int x)
    public static void PrintLong(long x)
    public static void PrintDouble(double x)

enum Keys
    A = 65, B, C, D, E, F, G, H, I, J, K, L, M, N, O
    P, Q, R, S, T, U, V, W, X, Y, Z,
    F1 = 112, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12

flag Modifiers
    None = 0
    Shift = 1, Control, Option, System
    Alt = Option, Menu = Option, Apple = System, Windows = System

For the flag type, the automatic value is always multiplied by 2 unlike C#. It adds 1 even if it's declared with Flags attribute. The switch command doesn't need case statements to jump out from the switch.

Named and Optional Parameters

This is also the same in C#. This is how BlitzMax.Grapics function looks in Anonymus:

C#
public extern cdecl asmname("_brl_graphics_Graphics")
    IntPtr Graphics(int Width, Height, Depth = 0, Hertz = 60, Flags = 0)

To call it, it's possible to miss the last parameters, or by specifying each parameter individually:

C#
Graphics 800, 600
Graphics Width: 800, Height: 600
Graphics 800, 600, Hertz: 75

Tuples

The tuples allow storing multiple values in a variable with different types. You can write the multiplication of vectors in Anonymus language like this:

C++
type float3 = (float x, y, z)

float3 Cross(float3 a, b)
    return a.y * b.z – a.z * b.y,
           a.z * b.x – a.x * b.z,
           a.x * b.y – a.y * b.x

If I don’t define the float3 type or name the members, they can be identified by numbers:

C++
float, float, float Cross((float, float, float) a, b)
    return a.1 * b.2 – a.2 * b.1,
           a.2 * b.0 – a.0 * b.2,
           a.0 * b.1 – a.1 * b.0

For the simple vector operations (addition, subtraction), you don’t have to call functions, it's enough to write a + b.
If I want to manage the members of the tuple, it's enough to separate the components by commas at the left of the assignment:

C++
float x, y, z
x, y, z = Cross(a, b)

It’s also possible to declare and give values to the variable, but in this case you have to write out the types to be unambiguous:

C++
(var x, var y, var z) = Cross(a, b)

Swapping of Variables

It swaps two or more data simple without used temporary variable:

C++
a, b = b, a

Without this, the only way is to declare new variables or create a function that does the swapping, but this is also not perfect because the parameter can be only type. In order to be able to output values in parameters, I use the ref types. When you pass a parameter with ref, you pass the address of the variable, not its value:

C++
void SwapInts(ref int a, b)
    var c = a
    a = b
    b = c

'' Usage:
SwapInts ref a, ref b

In C, pointers are also needed:

C++
void SwapInts(int* a, int* b)
{
    int c = *a;
    *a = *b;
    *b = c;
}

// Usage:
SwapInts(&a, &b); 

Function Return Value

Returning with a variable that is created at the beginning of a function is a common case. This function calculates the length of a zero-terminated string where the return command is not necessary:

C++
public static UInt_PtrSize StringLength(char* Str)
    : Ret = 0
    while Str[Ret] != '~0'
        Ret++

It's possible to create multiple variables, in this case the return type will be a tuple.

Referencing Function Written in Other Languages

You can reference them with the extern modifier. The asmname enables you to specify the name that this identifier will have in the compiled code. You can also give the calling convention of the function. Currently, my language supports cdecl, stdcall, and ascall:

C++
public extern cdecl asmname("_brl_polledinput_KeyHit") bool KeyHit(Keys key)

In C, there wouldn't be asmname setting, you need to use it with the full name or declare a macro:

C++
void brl_polledinput_KeyHit(uint KeyCode);
#define KeyHit brl_polledinput_KeyHit

The default function call is ascall that uses eax, edx and ecx for parameters, but passes pointers, classes, etc. first, then the numbers (except the floating point numbers because it's not possible to calculate if they are in general registers). It can also handle high and low byte of the registers that have that variant to store two data in the same register.

Including Assembly in Source Code

Here is an example (System.Math.Truncate from Anonymus library):

C#
public static asm double Truncate(double x)
    fldcw $[_x86Helper.TruncateFPUControlWord]
    fld qword[esp + 4]
    frndint
    fldcw $[_x86Helper.DefaultFPUControlWord]
    ret 8

As you can see, it's possible to use globals with $[ ]. For function parameters direct location is needed, if I implement it, the code would be less optimal, because not all of the instructions' operand data types are allowed.

Overview of Anonymus's Structure

Identifiers

These can be types, variable or functions. All variables and functions have types. The latter is determined by the return value and the parameter’s type. The variables are also divided into ground. They can be local, global or member of a structure.

Modifiers

They can be found before the declaration (e.g., cdecl, asmname, etc.). They are isolated and stored in a list that will receive a function that does the declaration. For example, if a variable is marked with static, then it will create a global identifier.

Expressions

These are parsed by different recognizers. It’s possible to create a new or replace one without a large modification in the language. An expression recognizer can recognize more operations. E.g., The Multiplicative recognizes multiplication, division and modulus.
If it succeeded, then it should call the plugin that the recognizer received by parameter.

Plugins

The plugins can modify the recognized expression. The MultiPlugin class manages multiple plugin joint work. The type manager plugin’s task is to determine the type of the expressions and do the conversions. The calculator plugin calculates a value if we want to operate with constants.

If the expression recognizer wants to declare a variable, it should call a plugin’s function. If it can’t do this, it gives an error message. The multi plugin redirects the declaration to the compiler plugin, if it exists.

Preprocessor

It processes all statements starting with #. You can create a macro or if a condition is true, run another code. There’s a preprocessor plugin that replaces the expressions created by call recognizer to the given expression:

C++
#define min(x, y) if x < y then x else y
#define max(x, y) if x > y then x else y

int Something(int i, j)
    return max(i, j) / min(i, j)

The preprocessor produces this from it:

C++
int Something(int i, j)
    return (if i > j: i else j) / (if i < j: i else j) 

Scopes

In the Anonymus language, statements that have the same number of whitespace characters belong to a scope. First the compiler creates a global scope, then calls the preprocessor, processes the types and then the functions and variables. These are processed multi-threaded.

In the code, scopes can contain commands that can also contain things. E.g., an if command has an expression which is the condition and one or more code scopes. A function scope is the outermost scope in a function that is also a code scope.

Architectures

It’s possible to do more architecture that create assembly code. If necessary, it can use a plugin or attach data to identifiers, identifier containers. Currently the x86 architecture is implemented.

A Basic Insight into Code Parsing

The language consists of recognizers, and supporting new languages can be done by replacing recognizers. The functions, types, variables will be available between Anonymus and the new languages. I don't use parser generators, because I think using other software could prevent from implementing some of the functionalities or I would have to do things in a different way.

The CompilerState class has a Language object that contains all recognizers. Here is the Language class:

C#
public abstract class Language : LanguageNode
{
    public IList<IExprRecognizer> ExprRecognizers;
    public IList<IIdRecognizer> IdRecognizers;
    public IList<ICommRecognizer> CommRecognizers;
    public IList<IModRecognizer> ModRecognizers;
    public IList<INameGenerator> NameGenerators;
    public IList<IResultSkippingHandler> GlobalHandlers;

    public IArgRecognizer ArgRecognizer;
    public IGenericRecognizer GenericRecognizer;
    public IGroupRecognizer GroupRecognizer;

    public IRetValLessRecognizer RetValLessRecognizer;
    public IInnerScopeRecognizer InnerScopeRecognizer;
    public IVarDeclRecognizer VarDeclRecognizer;
    public ITypeDeclRecognizer TypeDeclRecognizer;
    public IConstDeclRecognizer ConstDeclRecognizer;
    public INamespaceDeclRecognizer NamespaceDeclRecognizer;
    public IDeclarationRecognizer DeclarationRecognizer;

    public IGlobalScopeProcessor GlobalScopeProcessor;
    public ICodeFileProcessor CodeFileProcessor;
    public ICodeProcessor CodeProcessor;
    ...
}

String Handling

Anonymus uses StringRange to store strings to avoid a new string allocation when it calls string.Substring. This structure also has functions for bracket handling and other operations. The CodeString structure consists of a StringRange and a reference to CodeFile class that can also manage lines and indents of them. CodeString has a Line field and when its substring is needed, it updates the Line member. Most of the CodeString's methods are just wrapper around StringRange's functions.

The ISkippingHandler interface is used to skip results that are in a string constant or bracket. The recognizers can store data in ResultSkippingManager.Datas, if it's needed. In some cases, it's necessary to make sure that the brackets are not skipped. A bracket recognizer shouldn't skip brackets if ResultSkippingManager.DoNotSkipBrackets is true.

C#
public class BracketRecognizer : LanguageNode, IExprRecognizer,
                IIdRecognizer, IResultSkippingHandler
{
    ...

    public int SkipResult(ref ResultSkippingManager RSM)
    {
        if (!RSM.DoNotSkipBrackets && Helper.GetBracket(RSM.CurrentChar, RSM.Back))
        {
            var Handlers = RSM.SkippingHandlers;
            if (RSM.Back)
            {
                var NewString = RSM.String.Substring(0, RSM.Current + 1);
                var Pos = NewString.GetBracketPos(true, Handlers);
                if (Pos != -1) return Pos;
            }
            else
            {
                var NewString = RSM.String.Substring(RSM.Current);
                var Pos = NewString.GetBracketPos(false, Handlers);
                if (Pos != -1) return Pos + RSM.Current;
            }
        }

        return -1;
    }
}

So this is the BracketRecognizer class. It has to return an index where the string search should be continued or -1. The GetBracketPos method returns the position of the bracket's pair which the string ends or starts with. If the first parameter is false, the string has to begin with a bracket that looks to the right.

LanguageNode Class and Recognizers

The LanguageNode class contains an array of strings called Operators that it recognizes and another array of strings (LanguageNode.Skip) that it should skip, because the operators, that it has to recognise, is the part of them. E.g., an addition recognizer has +, - operators then the skip array should contain ++, --. The NewLine strings contain which operators need the previous or the next line. E.g., if the line ends with +, then it will continue with the next:

C#
public abstract class LanguageNode
{
    public DataList Data = new DataList();
    public Language Root;
    public LanguageNode Parent;
    public IList<LanguageNode> Children;
    public IList<IResultSkippingHandler> SkippingHandlers;

    public string[] Operators;
    public string[] SkipFind;
    public string[] Skip;

    public string[] NewLineLeft;
    public string[] NewLineRight;
    public string[] OnlyLeft;
    public string[] OnlyRight;

    ...
}

Recognizations are done in a recursive way. I think it's simple but maybe not the fastest way to parse a code file. To decrease the time spent on string operations, I use my own functions which are optimized for these things and significantly faster, although far more time spent on code generation. Currently my compiler can process 10 000 lines of code in about 1 second.

An expression recognizer should inherit from this class and IExprRecognizer interface. Recognizers can use Find2 function to determine which operators aren't unary. If the left side of the result ends with an operator or there's nothing, it will ignore that result. When it's not enough to use the LanguageNode.Operators, for example the cast recognizer needs to implement the IFind2Handler interface. The following code is the addition recognizer:

C#
public class AdditiveRecognizer : LanguageNode, IExprRecognizer
{
    public AdditiveRecognizer()
    {
        Operators = new string[] { "+", "-" };
        NewLineLeft = NewLineRight = Operators;
    }

    public ExprRecognizerRes Recognize
        (CodeString Code, ExprPlugin Plugin, ref ExpressionNode Ret)
    {
        var Result = ExprRecognizerHelper.Find2(this, Plugin.Container, Code.String);
        if (Result.Position != -1)
        {
            var Ch = ExprRecognizerHelper.TwoParamOpNode(Code, Result, Plugin);
            if (Ch == null) return ExprRecognizerRes.Failed;

            Operator Op;
            if (Result.Index == 0) Op = Operator.Add;
            else if (Result.Index == 1) Op = Operator.Subract;
            else throw new ApplicationException();

            Ret = new OpExpressionNode(Op, Ch, Code);
            return ExprRecognizerRes.Succeeded;
        }

        return ExprRecognizerRes.Unknown;
    }
}

This still wouldn't be compatible with the NumberRecognizer, because floating point numbers can have e notation (2e+2 = 2 * 102). To solve this problem, the NumberRecognizer adds a result skipper to LanguageNode.SkippingHandlers. ExprRecognizerHelper.Find2 takes a LanguageNode parameter first, which is used to get the skipping handlers from.

The Code Generator

This is most complex part of Anonymus. It is about the third of the whole project. These are major steps it does to compile a function:

  1. Processes expressions
  2. Calculates the expression result storage locations
  3. Calculates the locals' locations
  4. Checks whether there is a need for temporary register (in x86 assembly most instructions can only have one memory operand, some must be either memory location or general register, etc.)
  5. If a temporary register needed restarts the data location allocation, allocates the temporary registers and continues at 2nd point.
  6. Generates assembly (without jumps)
  7. Jump optimizations
  8. Jump replacements

The Jump Optimizations and Replacements

C#
int Fib(int x)
    if x < 2: return 1
    else return Fib(x - 1) + Fib(x - 2)

This function would have the following assembly without jump optimizations:

ASM
_Fib:
    push ebx
    push ebp
    mov ebp, eax
    cmp ebp, 2
    jge _26
    mov eax, 1
    jmp _2
    jmp _24
_26:
    mov eax, ebp
    dec eax
    call _Fib
    mov ebx, eax
    mov eax, ebp
    sub eax, 2
    call _Fib
    add eax, ebx
    jmp _2
_24:
_2:
    pop ebp
    pop ebx
    ret

The label _26 and _24 are for the condition and the label _2 is for function return. There are some unnecessary jumps and labels that should be deleted:

ASM
_Fib:
    push ebx
    push ebp
    mov ebp, eax
    cmp ebp, 2
    jge _26
    mov eax, 1
    jmp _2
_26:
    mov eax, ebp
    dec eax
    call _Fib
    mov ebx, eax
    mov eax, ebp
    sub eax, 2
    call _Fib
    add eax, ebx
_2:
    pop ebp
    pop ebx
    ret

It could save a jump if it does the same as where it jumps to. Unless it is too long, it replaces the jump:

ASM
_Fib:
    push ebx
    push ebp
    mov ebp, eax
    cmp ebp, 2
    jge _26
    mov eax, 1
    pop ebp
    pop ebx
    ret
_26:
    mov eax, ebp
    dec eax
    call _Fib
    mov ebx, eax
    mov eax, ebp
    sub eax, 2
    call _Fib
    add eax, ebx
    pop ebp
    pop ebx
    ret

Temporary Register Handling

The IsEqual function needs 3 temporary register to fulfil that the mov instruction can only have one memory operand, and the memory location can't have a memory address:

C#
long* g_Pointers
int g_Index
long g_Value

bool IsEqual()
    return g_Value == g_Pointers[g_Index]


_IsEqual:
    mov eax, dword[_g_Pointers]
    mov edx, dword[_g_Index]
    mov ecx, dword[eax + edx * 8]
    cmp dword[_g_Value], ecx
    jne _27
    mov ecx, dword[eax + edx * 8 + 4]
    cmp dword[_g_Value + 4], ecx
    jne _27
    mov al, 1
    ret
_27:
    xor al, al
    ret

Condition Assembly Generation

This is the main function for condition assembly generation (Anonymus.CodeGenerator.GetConditionCode):

C#
void GetConditionCode(ExpressionNode Node, int Then, int Else, bool ElseAfterCondition)

The Then and Else are the index of labels. If ElseAfterCondition is

  • True, then the condition is followed by the else scope and the jump's condition is the same as the operator and it jumps to the Then label
  • False, then the condition is followed by the then scope and the jump's condition is negated and it jumps to the Else label
Here is an example:

C++
void WhileTest(bool Loop)
    var a = 0
    while Loop
        a++

void DoWhileTest(bool Loop)
    var a = 0

    do
        a++
    while Loop
    
    
; This is the assembly without jump replacements
_WhileTest:
    xor edx, edx
_1:
    test al, al
    jz _13
_72:
    inc edx
    jmp _1
_13:
    ret

; After jump replacements
_WhileTest:
    xor edx, edx
    test al, al
    jz _13
_72:
    inc edx
    test al, al
    jnz _72
_13:
    ret

_DoWhileTest:
    xor edx, edx
_76:
    inc edx
    test al, al
    jnz _76
    ret

In the WhileTest function, the condition is followed by the then scope, ElseAfterCondition is false. In the DoWhileTest it's followed by the return command (which happens if the condition is false), the ElseAfterCondition is true.

This is how the and and or operators work:

C++
void While_OrTest(bool x, y, z)
    var a = 0
    while x or y or z
        a++

void DoWhile_OrTest(bool x, y, z)
    var a = 0

    do
        a++
    while x or y or z

void While_AndTest(bool x, y, z)
    var a = 0
    while x and y and z
        a++

void DoWhile_AndTest(bool x, y, z)
    var a = 0

    do
        a++
    while x and y and z
    
    
_While_OrTest:
    xor ecx, ecx
_45:
    test al, al
    jnz _44
    test ah, ah
    jnz _44
    test dl, dl
    jz _7
_44:
    inc ecx
    jmp _45
_7:
    ret

_DoWhile_OrTest:
    xor ecx, ecx
_50:
    inc ecx
    test al, al
    jnz _50
    test ah, ah
    jnz _50
    test dl, dl
    jnz _50
    ret

_While_AndTest:
    xor ecx, ecx
_57:
    test al, al
    jz _9
    test ah, ah
    jz _9
    test dl, dl
    jz _9
    inc ecx
    jmp _57
_9:
    ret

_DoWhile_AndTest:
    xor ecx, ecx
_62:
    inc ecx
    test al, al
    jz _10
    test ah, ah
    jz _10
    test dl, dl
    jnz _62
_10:
    ret

In all the while and do-while tests, the last condition remains the same, the ElseAfterCondition is only changed for the first two conditions. For the and conditions, the ElseAfterCondition is false, for the or conditions it's true.

Here is an example that contains both operators:

C#
void While_AndOrTest(bool x, y, z, w)
    var a = 0
    while (x or y) and (z or w)
        a++

void DoWhile_AndOrTest(bool x, y, z, w)
    var a = 0

    do
        a++
    while (x or y) and (z or w)
    
    
_While_AndOrTest:
    xor ecx, ecx
_71:
    test al, al
    jnz _72
    test ah, ah
    jz _11
_72:
    test dl, dl
    jnz _70
    test dh, dh
    jz _11
_70:
    inc ecx
    jmp _71
_11:
    ret

_DoWhile_AndOrTest:
    xor ecx, ecx
_77:
    inc ecx
    test al, al
    jnz _79
    test ah, ah
    jz _12
_79:
    test dl, dl
    jnz _77
    test dh, dh
    jnz _77
_12:
    ret

Plans and Ideas

Currently the most important plan is to make the language stable and usable. In this version, objects can be created, but inheritance with properties doesn't work, and there's no gc yet. I hope I will be able to implement a proper string class within 1-2 months. This takes much time, because I want to program all libraries in Anonymus, so I have a lot of things to do among operator functions.

Inlined Functions Inside Expressions

This could help to create IEnumerable<T> instances. I would create a recognizer to find commands, and then it would create an inline lambda like function and call it. C# doesn't support even the yield keyword in lamdas.

C++
void Something(int* pInt, int Count)
    AnotherFunction for var i = 0 until Count
                        yield pInt[i]

void AnotherFunction(IEnumerable(:int) list)

License

This article, along with any associated source code and files, is licensed under The GNU General Public License (GPLv3)


Written By
Software Developer
Hungary Hungary
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

Discussions on this specific version of this article. Add your comments on how to improve this article here. These comments will not be visible on the final published version of this article.