Introduction
I developed this language because I don't find the other languages perfect for the things that I want to do. My problem with C++ is that it's slow to develop in it, and doesn't have such features that C# has, but .NET languages are not native and use CIL. This doesn't allow to use standard x86 assembly and maybe also slower.
I have been working on developing Anonymus since March 2010 in C#. Its performance is competitive with C++ while it will be a high level language. The syntax and libraries are similar to C#, C++, so I don't think it would be hard to learn, but I also change syntax if I think it could be better.
Creating Programs with Anonymus
Some functions are used from BlitzMax modules, because I haven't implemented all of the graphical libraries yet. Some of the sample programs use C++ to compare the performance. To run these, you'll need to install MinGW. To compile the programs written in Anonymus, you should run the “As.exe” which is in the “Binaries” directory. In the command line, you can use options and the others will be the input files. They can be Anonymus files, C, C++, Object files, etc. or libraries which start with "-l". For example:
As.exe -x -lBlitzMax -lAnonymus -format app -out Something.exe
The Anonymus library should be always used. The -x
option means that the compiler should run the output file, in this case "Something.exe” file. The output format can be app (.exe), obj (.o), or arc (.a).
1st Sample Program: Squares
This sample draws fading squares. When the mouse is over the square, it appears. I have created some constants and an array that contains the fading rate. The Width and Height variables contain how many squares should be in the graphics window:
using System
using BlitzMax
const var SquareSize = 32,
Width = 24,
Height = 20
float[Width, Height] Array
The type doesn't have to be specified, use can use var unlike C# where you can't use var for globals and class members and only one declaration is allowed.
| |
There are two helper functions. The first decreases a value, the second returns true
if the mouse is in the specified area:
float Decrease(float Val)
return if Val < 0.02: 0 else Val - 0.02
bool InArea(int x, y, w, h)
return x <= MouseX() < x + w and y <= MouseY() < y + h
The Update
function updates the squares and draws them. If the mouse is on the square, it draws a white square and sets the fading rate to 1
which means it's fully visible, otherwise it draws with a color based on square's position:
void Update()
for var x, y in 0 .. Width, 0 .. Height
var Value = Array[x, y]
var XPos = x * SquareSize
var YPos = y * SquareSize
if InArea(XPos, YPos, SquareSize, SquareSize)
Value = 1
SetColor 255, 255, 255
DrawRect XPos, YPos, SquareSize, SquareSize
else if Value > 0
var Color = 255 * Value
var Red = (int)(((float)x / Width) * Color)
var Greed = (int)(((1 - (float)y / Height)) * Color)
var Blue = (int)(((float)y / Height) * Color)
SetColor Red, Greed, Blue
DrawRect XPos, YPos, SquareSize, SquareSize
Array[x, y] = Decrease(Value)
The Reset
function sets the array to zero, because it can contain any data. The Main
function creates a graphics window, and updates it until the user hits the escape key or clicks the exit button:
void Reset()
for var x, y in 0 .. Width, 0 .. Height
Array[x, y] = 0
public stdcall void Main()
Reset
Graphics SquareSize * Width, SquareSize * Height
while !KeyHit(Keys.Escape) and !AppTerminate()
Cls
Update
Flip
The for until
loop is different from for to
loop. It doesn't reach the value at the right side, so subtracting one can be saved in some cases.
2ns Sample Program: Circles
In this sample, I calculate the color of each pixel. This sample's purposes is to compare the performance with C++. It calculates the distance from the middle point of the window and it takes the sine and cosine of it. To make it faster, it creates a smaller image that it draws to, and then it's scaled up.
I made some helper functions. The ArgbColor function combines the four components of a color to a single number. The GetDistance function simply calculates a distance, and the Wrap function cuts down parts of a integer that cannot fit in a byte.
The UpdateImage function does the main task of the program. It works as I wrote it in the beginning of the sample.
| |
using System
using BlitzMax
int ArgbColor(byte a, r, g, b)
return (int)a << 24 | (int)r << 16 | (int)g << 8 | b
double GetDistance(double x1, y1, x2, y2)
var x = x2 - x1, y = y2 - y1
return Math.Sqrt(x * x + y * y)
byte Wrap(int x)
if x < 0: x = 0
if x > 255: x = 255
return (byte)x
void UpdateImage(IntPtr Image)
var Pixmap = LockImage(Image, 0, true, true)
var Width = ImageWidth(Image)
var Height = ImageHeight(Image)
var Time = MilliSecs()
for var x = 0 until Width
for var y = 0 until Height
var RelX = (float)x / Width
var RelY = (float)y / Height
var Distance = GetDistance(RelX, RelY, 0.5, 0.5) * 3 - (float)Time / 1000
var Light = (int)((RelY * 100) * Math.Abs(Math.Sin(Distance / 1.5)))
var Red = (int)((RelX * 255) * Math.Abs(Math.Cos(Distance))) + Light
var Green = (int)(((1 - RelY) * 255) * Math.Abs(Math.Sin(Distance))) + Light
var Blue = (int)((RelY * 255) * Math.Abs(Math.Cos(Distance / 3))) + Light
var Color = ArgbColor(255, Wrap(Red), Wrap(Green), Wrap(Blue))
WritePixel Pixmap, x, y, Color
UnlockImage Pixmap, 0
public stdcall void Main()
Graphics 1024, 720
var Image = CreateImage(512, 360, 1, -1)
while not KeyHit(Keys.Escape) and not AppTerminate()
Cls
UpdateImage Image
DrawImageRect Image, 0, 0, 1024, 720, 0
DrawFrameStats
Flip
Here are the performance results (frames per second) on my Intel Core 2 Dou 2.2 Ghz processor:
Image Size | GNU C++ compiler | Clang C++ compiler (LLVM) | Anonymus |
512x360 | 8 | 9 | 12 |
256x180 | 31 | 35 | 47 |
Analyzing the Created Assembly Code
The ArgbColor Function
The ArgbColor
function takes 4 byte parameters. These are passed in the al
, ah
, dl
, dh
registers, in that order. al
is the first byte, ah
is the second byte of the eax
register. The result will be stored in the eax
register too.
The ah
register will be overwritten, so it copies it to the cl
register. Then it clears the high 3 bytes of the eax
register by doing a bitwise and on them and shifts the last byte from blue color place to the alpha. It does almost the same with the others.
_ArgbColor:
mov cl, ah
and eax, 0xFF
shl eax, 24
and ecx, 0xFF
shl ecx, 16
or eax, ecx
movzx ecx, dl
shl ecx, 8
or eax, ecx
movzx ecx, dh
or eax, ecx
ret
The GetDistance Function
The float numbers can be only stored in memory with FPU, so the parameters will be on the stack. The return value will be on the FPU stack. Anonymus doesn't save the stack pointer to ebp
register, it rather uses it to store data, but it makes assembly less readable, because you may don't know the variables' positions. In this function, the parameters starts at esp + 4
, because the caller's eip
is stored in the stack. It doesn't need to subtract 16
from esp
, because no functions are called by GetDistance
function.
_GetDistance:
fld qword[esp + 20]
fsub qword[esp + 4]
fstp qword[esp - 16]
fld qword[esp + 28]
fsub qword[esp + 12]
fstp qword[esp - 8]
fld qword[esp - 16]
fmul qword[esp - 16]
fld qword[esp - 8]
fmul qword[esp - 8]
faddp
fsqrt
ret 32
The Wrap Function
This is the most simple function. It uses conditional moves. The disadvantage of this is that the cmov
instruction's second parameter must be a memory position, and reading memory is slow. But I read the conditional jumps are also slow if we do it a lot because of branch prediction.
_Wrap:
cmp eax, 0
cmovl eax, dword[_11]
cmp eax, 255
cmovg eax, dword[_18]
ret
The Other Functions
I think describing them doesn't make much sense. The only thing I would mention is that in the future, I will add function inlining to improve performance. I suppose G++ does the same.
3rd Sample: Fire
This sample increases values based on
the distance to the mouse. These values are stored in a global array called Array .
The colors assigned to the them is stored in the Colors array. It uses
tuples extensively to reduce code size.
The sample uses some of functions from the previous sample. The GetDistance function
is vectorized, it takes two tuples as parameters. The GetValue function returns the value of
Array safely, if the parameters are outside of ImageSize , it returns 0.
The Update functions firstly blurs the image by getting the avarage of nearby pixels.
It divides with 7.1 instead of 7 to make it fade out. Then it increases the
values in Array based on distance to the mouse.
UpdateImage function updates the image which will be scaled up to the whole window.
| |
using System
using BlitzMax
const var
WindowSize = (1024, 720),
ImageSize = (256, 180),
RectSize = 35,
ColorNumber = 768
float[ImageSize.0, ImageSize.1] Array
int[ColorNumber] Colors
int ArgbColor(byte a, r, g, b)
return (int)a << 24 | (int)r << 16 | (int)g << 8 | b
float GetDistance((float, float) P1, P2)
var x = P1.0 - P2.0, y = P1.1 - P2.1
return (float)Math.Sqrt(x * x + y * y)
byte Wrap(int x)
if x < 0: x = 0
if x > 255: x = 255
return (byte)x
float GetValue(int x, y)
if 0 < x < ImageSize.0 and 0 < y < ImageSize.1
return Array[x, y]
return 0f
void Update()
for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
var Value = Array[x, y] * 5
Value += GetValue(x + 1, y + 1) * 0.5
Value += GetValue(x - 1, y + 1) * 0.75
Value += GetValue(x - 2, y + 2) * 0.5
Value += GetValue(x - 3, y + 3) * 0.25
Array[x, y] = Value / 7.1
var Mouse = ((float)MouseX(), (float)MouseY()) * ImageSize / WindowSize
var P1 = (int, int)Mouse - RectSize, P2 = (int, int)Mouse + RectSize
if P1.0 < 0: P1.0 = 0
if P1.1 < 0: P1.1 = 0
if P2.0 >= ImageSize.0: P2.0 = ImageSize.0 - 1
if P2.1 >= ImageSize.1: P2.1 = ImageSize.1 - 1
for var x, y in P1.0 ... P2.0, P1.1 ... P2.1
var Distance = 1 - GetDistance(Mouse, (x, y)) / RectSize
if Distance > 0f
var NewVal = Array[x, y] + Distance / 10f
if NewVal > 1f: NewVal = 1f
Array[x, y] = NewVal
void UpdateImage(IntPtr Image)
var Pixmap = LockImage(Image, 0, true, true)
for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
var Color = Colors[(int)(Array[x, y] * (ColorNumber - 1))]
WritePixel Pixmap, x, y, Color
void Initialize()
for var x, y in 0 .. ImageSize.0, 0 .. ImageSize.1
Array[x, y] = 0f
for var i in 0 .. ColorNumber
var Value = (float)i / (ColorNumber - 1)
var Red = (int)((Math.Min(Value, 1f / 3)) * 3 * 255)
var Green = (int)((Math.Min(Value, 1.65f / 3) - 0.65f / 3) * 3 * 255)
var Blue = (int)((Math.Min(Value, 1f) - 2f / 3) * 3 * 255)
Colors[i] = ArgbColor(255, Wrap(Red), Wrap(Green), Wrap(Blue))
public stdcall void Main()
Initialize
Graphics WindowSize
var Image = CreateImage(ImageSize, 1, -1)
while not KeyHit(Keys.Escape) and not AppTerminate()
Cls
Update
UpdateImage Image
DrawImageRect Image, (0, 0), WindowSize, 0
DrawFrameStats
Flip
Some Basic Syntax
Literals
The type of integer literals can be specified by a suffix that is a lower case short form of the type name. The hexadecimal numbers have to be upper case in order to distinguish them:
$FFb '' An unsigned byte
-$1Asb '' A signed byte
%100 '' A binary number
For loops
for var x, y in 0 ... Width, 0 ... Height
C# equivalent:
for (var x = 0; x <= Width; x++)
for (var y = 0; y <= Height; y++)
In certain cases you don't want to reach the value at right side. This can be indicated by 2 dots instead of 3:
for var x, y in 0 .. Width, 0 .. Height
C# equivalent:
for (var x = 0; x < Width; x++)
for (var y = 0; y < Height; y++)
Structures and Enumerations
Basically, these are the same as C#:
struct Rect
public float X, Y, Width, Height
'' Constructor: This is called when a Rect is created
public Rect(float X, Y, Width, Height)
this.X = X
this.Y = Y
this.Width = Width
this.Height = Height
public Rect Copy()
return new Rect(X, Y, Width, Height)
public float Area:
get return Width * Height
public float GetValue(int x)
switch x
case 0: return X
case 1: return Y
case 2: return Width
case 3: return Height
default return 0
static class Console
public static void PrintInt(int x)
public static void PrintLong(long x)
public static void PrintDouble(double x)
enum Keys
A = 65, B, C, D, E, F, G, H, I, J, K, L, M, N, O
P, Q, R, S, T, U, V, W, X, Y, Z,
F1 = 112, F2, F3, F4, F5, F6, F7, F8, F9, F10, F11, F12
flag Modifiers
None = 0
Shift = 1, Control, Option, System
Alt = Option, Menu = Option, Apple = System, Windows = System
For the flag
type, the automatic value is always multiplied by 2 unlike C#. It adds 1
even if it's declared with Flags
attribute. The switch
command doesn't need case
statements to jump out from the switch
.
Named and Optional Parameters
This is also the same in C#. This is how BlitzMax.Grapics
function looks in Anonymus:
public extern cdecl asmname("_brl_graphics_Graphics")
IntPtr Graphics(int Width, Height, Depth = 0, Hertz = 60, Flags = 0)
To call it, it's possible to miss the last parameters, or by specifying each parameter individually:
Graphics 800, 600
Graphics Width: 800, Height: 600
Graphics 800, 600, Hertz: 75
Tuples
The tuples allow storing multiple values in a variable with different types. You can write the multiplication of vectors in Anonymus language like this:
type float3 = (float x, y, z)
float3 Cross(float3 a, b)
return a.y * b.z – a.z * b.y,
a.z * b.x – a.x * b.z,
a.x * b.y – a.y * b.x
If I don’t define the float3
type or name the members, they can be identified by numbers:
float, float, float Cross((float, float, float) a, b)
return a.1 * b.2 – a.2 * b.1,
a.2 * b.0 – a.0 * b.2,
a.0 * b.1 – a.1 * b.0
For the simple vector operations (addition, subtraction), you don’t have to call functions, it's enough to write a + b.
If I want to manage the members of the tuple, it's enough to separate the components by commas at the left of the assignment:
float x, y, z
x, y, z = Cross(a, b)
It’s also possible to declare and give values to the variable, but in this case you have to write out the types to be unambiguous:
(var x, var y, var z) = Cross(a, b)
Swapping of Variables
It swaps two or more data simple without used temporary variable:
a, b = b, a
Without this, the only way is to declare new variables or create a function that does the swapping, but this is also not perfect because the parameter can be only type. In order to be able to output values in parameters, I use the ref
types. When you pass a parameter with ref
, you pass the address of the variable, not its value:
void SwapInts(ref int a, b)
var c = a
a = b
b = c
'' Usage:
SwapInts ref a, ref b
In C, pointers are also needed:
void SwapInts(int* a, int* b)
{
int c = *a;
*a = *b;
*b = c;
}
SwapInts(&a, &b);
Function Return Value
Returning with a variable that is created at the beginning of a function is a common case. This function calculates the length of a zero-terminated string
where the return command is not necessary:
public static UInt_PtrSize StringLength(char* Str)
: Ret = 0
while Str[Ret] != '~0'
Ret++
It's possible to create multiple variables, in this case the return type will be a tuple.
Referencing Function Written in Other Languages
You can reference them with the extern
modifier. The asmname
enables you to specify the name that this identifier will have in the compiled code. You can also give the calling convention of the function. Currently, my language supports cdecl
, stdcall
, and ascall
:
public extern cdecl asmname("_brl_polledinput_KeyHit") bool KeyHit(Keys key)
In C, there wouldn't be asmname
setting, you need to use it with the full name or declare a macro:
void brl_polledinput_KeyHit(uint KeyCode);
#define KeyHit brl_polledinput_KeyHit
The default function call is ascall
that uses eax
, edx
and ecx
for parameters, but passes pointers, classes, etc. first, then the numbers (except the floating point numbers because it's not possible to calculate if they are in general registers). It can also handle high and low byte of the registers that have that variant to store two data in the same register.
Including Assembly in Source Code
Here is an example (System.Math.Truncate
from Anonymus library):
public static asm double Truncate(double x)
fldcw $[_x86Helper.TruncateFPUControlWord]
fld qword[esp + 4]
frndint
fldcw $[_x86Helper.DefaultFPUControlWord]
ret 8
As you can see, it's possible to use globals with $[ ]
. For function parameters direct location is needed, if I implement it, the code would be less optimal, because not all of the instructions' operand data types are allowed.
Overview of Anonymus's Structure
Identifiers
These can be types, variable or functions. All variables and functions have types. The latter is determined by the return value and the parameter’s type. The variables are also divided into ground. They can be local, global or member of a structure.
Modifiers
They can be found before the declaration (e.g., cdecl
, asmname
, etc.). They are isolated and stored in a list that will receive a function that does the declaration. For example, if a variable is marked with static
, then it will create a global identifier.
Expressions
These are parsed by different recognizers. It’s possible to create a new or replace one without a large modification in the language. An expression recognizer can recognize more operations. E.g., The Multiplicative recognizes multiplication, division and modulus.
If it succeeded, then it should call the plugin that the recognizer received by parameter.
Plugins
The plugins can modify the recognized expression. The MultiPlugin
class manages multiple plugin joint work. The type manager plugin’s task is to determine the type of the expressions and do the conversions. The calculator plugin calculates a value if we want to operate with constants.
If the expression recognizer wants to declare a variable, it should call a plugin’s function. If it can’t do this, it gives an error message. The multi plugin redirects the declaration to the compiler plugin, if it exists.
Preprocessor
It processes all statements starting with #
. You can create a macro or if a condition is true
, run another code. There’s a preprocessor plugin that replaces the expressions created by call recognizer to the given expression:
#define min(x, y) if x < y then x else y
#define max(x, y) if x > y then x else y
int Something(int i, j)
return max(i, j) / min(i, j)
The preprocessor produces this from it:
int Something(int i, j)
return (if i > j: i else j) / (if i < j: i else j)
Scopes
In the Anonymus language, statements that have the same number of whitespace characters belong to a scope. First the compiler creates a global scope, then calls the preprocessor, processes the types and then the functions and variables. These are processed multi-threaded.
In the code, scopes can contain commands that can also contain things. E.g., an if
command has an expression which is the condition and one or more code scopes. A function scope is the outermost scope in a function that is also a code scope.
Architectures
It’s possible to do more architecture that create assembly code. If necessary, it can use a plugin or attach data to identifiers, identifier containers. Currently the x86 architecture is implemented.
A Basic Insight into Code Parsing
The language consists of recognizers, and supporting new languages can be done by replacing recognizers. The functions, types, variables will be available between Anonymus and the new languages. I don't use parser generators, because I think using other software could prevent from implementing some of the functionalities or I would have to do things in a different way.
The CompilerState
class has a Language
object that contains all recognizers. Here is the Language
class:
public abstract class Language : LanguageNode
{
public IList<IExprRecognizer> ExprRecognizers;
public IList<IIdRecognizer> IdRecognizers;
public IList<ICommRecognizer> CommRecognizers;
public IList<IModRecognizer> ModRecognizers;
public IList<INameGenerator> NameGenerators;
public IList<IResultSkippingHandler> GlobalHandlers;
public IArgRecognizer ArgRecognizer;
public IGenericRecognizer GenericRecognizer;
public IGroupRecognizer GroupRecognizer;
public IRetValLessRecognizer RetValLessRecognizer;
public IInnerScopeRecognizer InnerScopeRecognizer;
public IVarDeclRecognizer VarDeclRecognizer;
public ITypeDeclRecognizer TypeDeclRecognizer;
public IConstDeclRecognizer ConstDeclRecognizer;
public INamespaceDeclRecognizer NamespaceDeclRecognizer;
public IDeclarationRecognizer DeclarationRecognizer;
public IGlobalScopeProcessor GlobalScopeProcessor;
public ICodeFileProcessor CodeFileProcessor;
public ICodeProcessor CodeProcessor;
...
}
String Handling
Anonymus uses StringRange
to store string
s to avoid a new string
allocation when it calls string.Substring
. This structure also has functions for bracket handling and other operations. The CodeString
structure consists of a StringRange
and a reference to CodeFile
class that can also manage lines and indents of them. CodeString
has a Line
field and when its substring is needed, it updates the Line
member. Most of the CodeString
's methods are just wrapper around StringRange
's functions.
The ISkippingHandler
interface is used to skip results that are in a string
constant or bracket. The recognizers can store data in ResultSkippingManager.Datas
, if it's needed. In some cases, it's necessary to make sure that the brackets are not skipped. A bracket recognizer shouldn't skip brackets if ResultSkippingManager.DoNotSkipBrackets
is true
.
public class BracketRecognizer : LanguageNode, IExprRecognizer,
IIdRecognizer, IResultSkippingHandler
{
...
public int SkipResult(ref ResultSkippingManager RSM)
{
if (!RSM.DoNotSkipBrackets && Helper.GetBracket(RSM.CurrentChar, RSM.Back))
{
var Handlers = RSM.SkippingHandlers;
if (RSM.Back)
{
var NewString = RSM.String.Substring(0, RSM.Current + 1);
var Pos = NewString.GetBracketPos(true, Handlers);
if (Pos != -1) return Pos;
}
else
{
var NewString = RSM.String.Substring(RSM.Current);
var Pos = NewString.GetBracketPos(false, Handlers);
if (Pos != -1) return Pos + RSM.Current;
}
}
return -1;
}
}
So this is the BracketRecognizer
class. It has to return an index where the string
search should be continued or -1
. The GetBracketPos
method returns the position of the bracket's pair which the string
ends or starts with. If the first parameter is false
, the string
has to begin with a bracket that looks to the right.
LanguageNode Class and Recognizers
The LanguageNode
class contains an array of string
s called Operators
that it recognizes and another array of string
s (LanguageNode.Skip
) that it should skip, because the operators, that it has to recognise, is the part of them. E.g., an addition recognizer has +, - operators then the skip array should contain ++, --. The NewLine
string
s contain which operators need the previous or the next line. E.g., if the line ends with +, then it will continue with the next:
public abstract class LanguageNode
{
public DataList Data = new DataList();
public Language Root;
public LanguageNode Parent;
public IList<LanguageNode> Children;
public IList<IResultSkippingHandler> SkippingHandlers;
public string[] Operators;
public string[] SkipFind;
public string[] Skip;
public string[] NewLineLeft;
public string[] NewLineRight;
public string[] OnlyLeft;
public string[] OnlyRight;
...
}
Recognizations are done in a recursive way. I think it's simple but maybe not the fastest way to parse a code file. To decrease the time spent on string
operations, I use my own functions which are optimized for these things and significantly faster, although far more time spent on code generation. Currently my compiler can process 10 000 lines of code in about 1 second.
An expression recognizer should inherit from this class and IExprRecognizer
interface. Recognizers can use Find2
function to determine which operators aren't unary. If the left side of the result ends with an operator or there's nothing, it will ignore that result. When it's not enough to use the LanguageNode.Operators
, for example the cast recognizer needs to implement the IFind2Handler
interface. The following code is the addition recognizer:
public class AdditiveRecognizer : LanguageNode, IExprRecognizer
{
public AdditiveRecognizer()
{
Operators = new string[] { "+", "-" };
NewLineLeft = NewLineRight = Operators;
}
public ExprRecognizerRes Recognize
(CodeString Code, ExprPlugin Plugin, ref ExpressionNode Ret)
{
var Result = ExprRecognizerHelper.Find2(this, Plugin.Container, Code.String);
if (Result.Position != -1)
{
var Ch = ExprRecognizerHelper.TwoParamOpNode(Code, Result, Plugin);
if (Ch == null) return ExprRecognizerRes.Failed;
Operator Op;
if (Result.Index == 0) Op = Operator.Add;
else if (Result.Index == 1) Op = Operator.Subract;
else throw new ApplicationException();
Ret = new OpExpressionNode(Op, Ch, Code);
return ExprRecognizerRes.Succeeded;
}
return ExprRecognizerRes.Unknown;
}
}
This still wouldn't be compatible with the NumberRecognizer
, because floating point numbers can have e notation (2e+2 = 2 * 102). To solve this problem, the NumberRecognizer
adds a result skipper to LanguageNode.SkippingHandlers
. ExprRecognizerHelper.Find2
takes a LanguageNode
parameter first, which is used to get the skipping handlers from.
The Code Generator
This is most complex part of Anonymus. It is about the third of the whole project. These are major steps it does to compile a function:
- Processes expressions
- Calculates the expression result storage locations
- Calculates the locals' locations
- Checks whether there is a need for temporary register (in x86 assembly most instructions can only have one memory operand, some must be either memory location or general register, etc.)
- If a temporary register needed restarts the data location allocation, allocates the temporary registers and continues at 2nd point.
- Generates assembly (without jumps)
- Jump optimizations
- Jump replacements
The Jump Optimizations and Replacements
int Fib(int x)
if x < 2: return 1
else return Fib(x - 1) + Fib(x - 2)
This function would have the following assembly without jump optimizations:
_Fib:
push ebx
push ebp
mov ebp, eax
cmp ebp, 2
jge _26
mov eax, 1
jmp _2
jmp _24
_26:
mov eax, ebp
dec eax
call _Fib
mov ebx, eax
mov eax, ebp
sub eax, 2
call _Fib
add eax, ebx
jmp _2
_24:
_2:
pop ebp
pop ebx
ret
The label _26
and _24
are for the condition and the label _2
is for function return. There are some unnecessary jumps and labels that should be deleted:
_Fib:
push ebx
push ebp
mov ebp, eax
cmp ebp, 2
jge _26
mov eax, 1
jmp _2
_26:
mov eax, ebp
dec eax
call _Fib
mov ebx, eax
mov eax, ebp
sub eax, 2
call _Fib
add eax, ebx
_2:
pop ebp
pop ebx
ret
It could save a jump if it does the same as where it jumps to. Unless it is too long, it replaces the jump:
_Fib:
push ebx
push ebp
mov ebp, eax
cmp ebp, 2
jge _26
mov eax, 1
pop ebp
pop ebx
ret
_26:
mov eax, ebp
dec eax
call _Fib
mov ebx, eax
mov eax, ebp
sub eax, 2
call _Fib
add eax, ebx
pop ebp
pop ebx
ret
Temporary Register Handling
The IsEqual
function needs 3 temporary register to fulfil that the mov
instruction can only have one memory operand, and the memory location can't have a memory address:
long* g_Pointers
int g_Index
long g_Value
bool IsEqual()
return g_Value == g_Pointers[g_Index]
_IsEqual:
mov eax, dword[_g_Pointers]
mov edx, dword[_g_Index]
mov ecx, dword[eax + edx * 8]
cmp dword[_g_Value], ecx
jne _27
mov ecx, dword[eax + edx * 8 + 4]
cmp dword[_g_Value + 4], ecx
jne _27
mov al, 1
ret
_27:
xor al, al
ret
Condition Assembly Generation
This is the main function for condition assembly generation (Anonymus.CodeGenerator.GetConditionCode)
:
void GetConditionCode(ExpressionNode Node, int Then, int Else, bool ElseAfterCondition)
The Then
and Else
are the index of labels. If ElseAfterCondition
is
True
, then the condition is followed by the else scope and the jump's condition is the same as the operator and it jumps to the Then
labelFalse
, then the condition is followed by the then scope and the jump's condition is negated and it jumps to the Else
label
Here is an example:
void WhileTest(bool Loop)
var a = 0
while Loop
a++
void DoWhileTest(bool Loop)
var a = 0
do
a++
while Loop
; This is the assembly without jump replacements
_WhileTest:
xor edx, edx
_1:
test al, al
jz _13
_72:
inc edx
jmp _1
_13:
ret
; After jump replacements
_WhileTest:
xor edx, edx
test al, al
jz _13
_72:
inc edx
test al, al
jnz _72
_13:
ret
_DoWhileTest:
xor edx, edx
_76:
inc edx
test al, al
jnz _76
ret
In the WhileTest
function, the condition is followed by the then scope, ElseAfterCondition
is false
. In the DoWhileTest
it's followed by the return command (which happens if the condition is false
), the ElseAfterCondition
is true
.
This is how the and
and or
operators work:
void While_OrTest(bool x, y, z)
var a = 0
while x or y or z
a++
void DoWhile_OrTest(bool x, y, z)
var a = 0
do
a++
while x or y or z
void While_AndTest(bool x, y, z)
var a = 0
while x and y and z
a++
void DoWhile_AndTest(bool x, y, z)
var a = 0
do
a++
while x and y and z
_While_OrTest:
xor ecx, ecx
_45:
test al, al
jnz _44
test ah, ah
jnz _44
test dl, dl
jz _7
_44:
inc ecx
jmp _45
_7:
ret
_DoWhile_OrTest:
xor ecx, ecx
_50:
inc ecx
test al, al
jnz _50
test ah, ah
jnz _50
test dl, dl
jnz _50
ret
_While_AndTest:
xor ecx, ecx
_57:
test al, al
jz _9
test ah, ah
jz _9
test dl, dl
jz _9
inc ecx
jmp _57
_9:
ret
_DoWhile_AndTest:
xor ecx, ecx
_62:
inc ecx
test al, al
jz _10
test ah, ah
jz _10
test dl, dl
jnz _62
_10:
ret
In all the while
and do
-while
tests, the last condition remains the same, the ElseAfterCondition
is only changed for the first two conditions. For the and
conditions, the ElseAfterCondition
is false
, for the or
conditions it's true
.
Here is an example that contains both operators:
void While_AndOrTest(bool x, y, z, w)
var a = 0
while (x or y) and (z or w)
a++
void DoWhile_AndOrTest(bool x, y, z, w)
var a = 0
do
a++
while (x or y) and (z or w)
_While_AndOrTest:
xor ecx, ecx
_71:
test al, al
jnz _72
test ah, ah
jz _11
_72:
test dl, dl
jnz _70
test dh, dh
jz _11
_70:
inc ecx
jmp _71
_11:
ret
_DoWhile_AndOrTest:
xor ecx, ecx
_77:
inc ecx
test al, al
jnz _79
test ah, ah
jz _12
_79:
test dl, dl
jnz _77
test dh, dh
jnz _77
_12:
ret
Plans and Ideas
Currently the most important plan is to make the language stable and usable. In this version, objects can be created, but inheritance with properties doesn't work, and there's no gc yet. I hope I will be able to implement a proper string
class within 1-2 months. This takes much time, because I want to program all libraries in Anonymus, so I have a lot of things to do among operator functions.
Inlined Functions Inside Expressions
This could help to create IEnumerable<T>
instances. I would create a recognizer to find commands, and then it would create an inline lambda like function and call it. C# doesn't support even the yield
keyword in lamdas.
void Something(int* pInt, int Count)
AnotherFunction for var i = 0 until Count
yield pInt[i]
void AnotherFunction(IEnumerable(:int) list)
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.