Click here to Skip to main content
16,015,697 members
Articles / Mobile Apps / Windows Mobile

ARM Assembly for eVC with the Mono Jit Macros

Rate me:
Please Sign up or sign in to vote.
4.18/5 (7 votes)
14 Jul 2007CPOL5 min read 65.7K   174   17   13
ARM assembly for eVC with the Mono Jit macros
Screenshot - ArmJitCE.jpg

Introduction

Microsoft's eMbedded Visual C compilers have no possibility for using inline assembly for the ARM family of microprocessors other than emitting the raw opcode. The macros of the Mono Jit for ARM can be rather easily used to do that in a more convenient manner. This article only deals with assembly for ARMV4 versions of Windows CE, which means versions before Windows Mobile 5. The methods used, in the exact form that they are presented here, may not work on ARMV4I and ARMV4T platforms; that has not been tested.

Background

Code written in machine language is often still faster than code written in C, but we generally don't use assembly anymore to compete with C for the purpose of speed. In the past couple of years, however, bytecode languages as Lua and Cil have become increasingly important and they can be dramatically sped-up with Jits. A Jit first translates the otherwise "bureaucratically" interpreted individual bytecodes to one or more assembly instruction(s), puts them together in a row, and only then executes them. This is one of the reasons that Jits start up slowly, but are subsequently real fast.

I feel that abstract bytecode and Jits are the way to go for the future, because the combination is theoretically platform-independent and simpler to use for the common programmer. Unfortunately, no generally used standard for bytecode has yet surfaced: there are (way too) many variations on the same theme around, but hopefully this will change in time.

I discovered that the macros of the Mono ARM Jit can be easily used for eVC by looking at the source code of an implementation of Ogl/Es for Windows CE that can be downloaded from Sourceforge. The author seems to have developed more general wrappers around the macros, but I will not use them here, although I think that an abstract machine language in itself could be very useful. Maybe it could even turn out to be a condition for developing a bytecode standard.

Using the Code

The Mono ARM assembly macros are in a couple of header files; most notably arm-codegen.h and arm_dpimacros.h, and optionally arm-dis.h for disassembling the generated code, which I found real neat. I used Mono version 1.2.4, and corrected a bug for backward branches:

C++
#define ARM_DEF_BR(offs, l, cond) \
((offs) | ((l) << 24) | (ARM_BR_TAG) | (cond << ARMCOND_SHIFT))

in arm-codegen.h should be:

C++
#define ARM_DEF_BR(offs, l, cond) \
((offs & 0x00FFFFFF) | ((l) << 24) |(ARM_BR_TAG) | (cond << ARMCOND_SHIFT))

because branch offsets are 24 bit for the ARM, and negative offset numbers when using the branch macros extend to 32 bit, which interferes with the rest of the opcode. Possibly, Mono uses backward branches in a different way; I looked into that too briefly to tell. Branches are used by regarding the branch instruction itself as being at offset -2, the next instruction at -1 and the previous one at -3, etc.

Some of the more complicated basic ARM operations are not implemented as macros but as functions; like arm_mov_reg_imm32(). In these functions, the index pointer to the instruction array is local and it is therefore not automatically updated, but its new value is returned. I put these functions, and the disassembler functions in a library named ArmJit.lib. In order to use the macros, you have to include the appropriate header files in your source code and if you need the functions, link with the lib.

To make the procedure more understandable, I made a small test program that implements the simple Fibonaccio benchmark in 13 ARM instructions. For fun, the speed is compared to the commonly used C implementation, and it turns out that the ARM version is over 30% faster, but, as stated, competing with C is generally not the purpose of using assembly anymore. Here's the program and I'll comment below it:

C++
#include <windows.h>
#include <stdio.h>
#include <arm-codegen.h>
#include <arm-dis.h>

unsigned long fib_c(unsigned long n) {
if (n < 2)
    return(1);
else
    return(fib_c(n-2) + fib_c(n-1));
}

void setup_fib_jit (unsigned int *pins) {

/* label1 */
ARM_CMP_REG_IMM8 (pins, ARMREG_R0, 2); /* is n < 2 ? */
ARM_MOV_REG_IMM8_COND (pins, ARMREG_R0, 1, ARMCOND_LO); /* if yes return value is 1 */
ARM_MOV_REG_REG_COND (pins, ARMREG_PC,  ARMREG_LR, ARMCOND_LO);
                                        /* if yes return address in PC; */
                                        /* and exit to main or previous recursive call */
ARM_PUSH2 (pins, ARMREG_R0, ARMREG_LR); /* save n and return address to the stack*/
ARM_SUB_REG_IMM8(pins, ARMREG_R0, ARMREG_R0, 2); /* n = n-2 */
ARM_BL (pins, -7); /* recurse to label1 for fib(n-2) */

ARM_LDR_IMM (pins, ARMREG_R1, ARMREG_SP, 0); /* load n from the stack */
ARM_STR_IMM (pins, ARMREG_R0, ARMREG_SP, 0); /* store result fib(n-2) */

ARM_SUB_REG_IMM8(pins, ARMREG_R0, ARMREG_R1, 1); /* n = n-1 */
ARM_BL (pins, -11); /* recurse to label1 for fib(n-1) */
ARM_POP2 (pins, ARMREG_R1, ARMREG_LR); /* pop result fib(n-2) and return address */

ARM_ADD_REG_REG (pins, ARMREG_R0, ARMREG_R0, ARMREG_R1); /* add both results */

ARM_MOV_REG_REG (pins, ARMREG_PC,  ARMREG_LR);
                                        /* return address in PC; */
                                        /* and exit to main or previous recursive call */
}

int main (int argc, char *argv[]) {

unsigned int n, ins[500], *pins = ins;
unsigned long (*fib_jit)(int n) = (unsigned long (*)(int n)) ins;
unsigned long r1, r2, t0, t1, t2;

setup_fib_jit (pins);
_armdis_dump (stdout, ins, 56);

if (argc <= 2) {
    if (argc == 1)
        n=1;
    else
        n=atoi (argv[1]);
t0 = GetTickCount();
r1 = fib_c (n);
t1 = GetTickCount();
r2 = fib_jit (n);
t2 = GetTickCount();
}

else {
    fprintf (stderr, "%s: Wrong number of arguments\n", argv[0]);
    exit (-1);
}

printf ("  fib_c(%d) result: %d\n\texcution time: %lf\n", n, r1, (t1-t0) / 1000.0);
printf ("fib_jit(%d) result: %d\n\texcution time: %lf\n", n, r2, (t2-t1) / 1000.0);

return 0;
}

Using the macros presupposes a great deal of basic knowledge of machine language programming in general, and some basic knowledge of the ARM microprocessor family in particular. The assembly instructions in the function setup_fib_jit() are commented in the program, and I will not get into that here; it is beyond the scope of this article. Comparing the comments to the C version above it will probably give an adequate impression of what goes on; it is virtually a 1 on 1 translation of the C algorithm. I will now rather concentrate on setting up the code and using the macros in practice.

First we need to have an array for the actual opcode instructions; in this case the array is named ins[]. The ARM opcodes are 32 bits each on the ARMV4 platform, which corresponds to unsigned int on my version of Windows CE. Use UINT32 as the type of the array instead if you want to be safe. An index pointer to this array, *pins is also needed for the macros to determine where to put the opcodes. The macros update this pointer themselves, so we can use them without having to care for that. Note that compared to normal programming, the actual assembly function is being "set up" rather than that the function *is* the code: the actual code will not be "in" setup_fib_jit(), but in the array ins[]!

After the opcode array has been filled with the instructions, we must enable ourselves to jump to it by casting the array to a function variable. That is done with the declaration:

C++
unsigned long (*fib_jit)(int n) = (unsigned long (*)(int n)) ins;

We can then call fib_jit() as an ordinary function. The n argument will be passed to it in ARM register 0, which is usual for the first parameter of a __cdecl function on the ARM WinCE platform. Register 0 is also used to pass the return value to main() when the function exits. I think that the rest of main() is self-explanatory. Input and output of the benchmark are from/to the console; you need one to run this program. I personally use PocketConsole (that I adapted to work in VGA, hence hidpi.res in the project), which is available on the Internet.

Points of Interest

Once I got started, I found using the Mono code surprisingly easy and very interesting. I've been having plans for writing a C interpreter for my Toshiba E800 Pocket PC for a long time, and maybe this tool will get me to actually do it. Don't count on it though; rather write it yourself ;).

History

The original zip file and the article have had a minor update: the ARM function of the test program has been speed improved.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Netherlands Netherlands
Sjoerd Bakker was a 6510 (C64) machine language editor for a Dutch computer magazine in the mid eighties of the previous century.

Comments and Discussions

 
Generalhow to put program into RAM with ARM? Pin
Andry_st29-Apr-11 14:07
Andry_st29-Apr-11 14:07 
QuestionARM11 or higher version support? Pin
wheregone1-Jul-10 0:57
wheregone1-Jul-10 0:57 
AnswerRe: ARM11 or higher version support? [modified] Pin
Sjoerd_B2-Jul-10 9:31
Sjoerd_B2-Jul-10 9:31 
GeneralHelp with ARM PROGRAM Pin
crazychikens2-May-08 5:17
crazychikens2-May-08 5:17 
GeneralRe: Help with ARM PROGRAM Pin
Sjoerd_B9-Sep-08 7:42
Sjoerd_B9-Sep-08 7:42 
QuestionQuestion in Compiling Pin
hosila21-Sep-07 14:15
hosila21-Sep-07 14:15 
AnswerRe: Question in Compiling [modified] Pin
Sjoerd_B20-Oct-07 10:29
Sjoerd_B20-Oct-07 10:29 
General//nice Pin
fox_of_197812-Jun-07 2:35
fox_of_197812-Jun-07 2:35 
GeneralRe: //nice Pin
Sjoerd_B12-Jun-07 9:26
Sjoerd_B12-Jun-07 9:26 
GeneralGood Job Pin
wwaidvl6-Jun-07 15:46
wwaidvl6-Jun-07 15:46 
GeneralRe: Good Job Pin
Sjoerd_B6-Jun-07 17:59
Sjoerd_B6-Jun-07 17:59 
GeneralFantastic [modified] Pin
Vince Ricci5-Jun-07 7:44
Vince Ricci5-Jun-07 7:44 
GeneralRe: Fantastic [modified] Pin
Sjoerd_B5-Jun-07 12:50
Sjoerd_B5-Jun-07 12:50 
Thanks for your enthousiastic response; that's the spirit in which I wrote the article. I updated the beginning of the article to make clear that emitting is possible; it was self-evident for me, but it may not be for other readers. Actually, emitting code seems more apropriate for real inlining; I don't see a way how you can do that with the macros; you seem to always have to call a function. So emitting will save the calling overhead, which could be notably faster for small fragments of code.

Also, I think it is fair to say that this is not really "my" way; I just used the available Mono macros; it must have been a lot of work to write them, and I only noticed that they were portable to Windows CE by looking at the source code of an Ogl/Es implementation. If I would help people to make use of the macros, that would be cool though. I wish you succes with them.



-- modified at 20:01 Tuesday 5th June, 2007

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.