Introduction
About WMLScript
Today, more and more cell-phones (or other mobile clients) support WAP browsing functions. To enhance the ability of the browser side, now many WAP browsers support WMLScript. The WMLScript language is based on the ECMAScript [ECMA262] but it has been modified to better support low bandwidth communication and thin clients. WMLScript can be used not only with WML but also as a standalone tool.
One of the main differences between ECMAScript and WMLScript is the fact that WMLScript has a defined bytecode and an interpreter reference architecture. That can give better performance in narrowband and small memory environments. To make the language smaller, and easier to compile into bytecode, many advanced features of the ECMAScript have been dropped. For example, WMLScript is a procedural language and it supports locally installed standard libraries.
What DWmlsc can do
Since WMLScript can be compiled into bytecode (usually using extension file name .wmlsc), sometimes we need to decompile the bytecode to view the source. So I wrote a tool named "DWmlsc" to do this job.
Background
Resources about WML and WMLScript can be found at:
Implementation
DWmlsc is a MFC SDI program. When a user open a .wmlsc file, this function will be called:
void CDWmlscDoc::Serialize(CArchive& ar)
{
if(ar.IsStoring() == FALSE)
{
m_codeLen = ar.GetFile()->GetLength();
m_binCodeBuf = new BYTE[m_codeLen];
ar.Read(m_binCodeBuf,m_codeLen);
m_result_code.RemoveAll();
if(DeCompile(m_binCodeBuf,m_codeLen,m_result_code) == false)
{
MessageBox(NULL,"Sorry,Decompile faild!","Error",MB_OK|MB_ICONSTOP);
return;
}
SetModifiedFlag(TRUE);
}
}
In the function Serialize
, I read the whole file into a buffer, and then call the core function.
bool DeCompile(BYTE *bin_code,int len,CList<CString,CString&> &result);
The output parameter result
will be used to store the de-compilation result.
The WMLScript bytecode consists of the following sections: HeadInfo
, ConstantPool
, PragmaPool
and FunctionPool
. (Refer to the WMLScript specifications please.)
The function DeCompile
reads and parses the file into these parts:
- The information read from
ConstantPool
is stored in a list g_ConstTable
.
- The information of
PragmaPool
is almost ignored.
- The information of
FunctionPool
is stored in a list g_FuncTable
.
Now, we can start to decompile the bytecode in the functions. The following code segment visit through the g_FuncTable
:
struct Function
{
BYTE findex;
CString func_name;
BYTE arg_num;
BYTE lvar_num;
unsigned int func_size;
BYTE *CodeArray;
};
The function TransCode
will do the real decompiling job:
i = 0;
while(i < func.func_size)
{
int n = TransCode(func.CodeArray + i, i,func.arg_num);
if(n < 0)
{
return false;
}
CString code;
}
The function TransCode
will translate the bytecode into textual instructions.
int TransCode(BYTE *data,int addr,int arg_num)
To make the bytecode smaller, WMLScript uses the "Inline parameters" technique.
Signature | Available Instructions | Used for |
1XXPPPPP | 4 | JUMP_FW_S , JUMP_BW_S , TJUMP_FW_S , LOAD_VAR_S |
010XPPPP | 2 | STORE_VAR_S , LOAD_CONST_S |
011XXPPP | 4 | CALL_S , CALL_LIB_S , INCR_VAR_S |
00XXXXXX | 63 | The rest of the instructions |
TransCode
parses these "Inline parameter" instructions with an "if/else...
" statement.
The other 63 instructions will be parsed by indexing the array: Instruction InArray[]
.
const ins_count = sizeof(InArray)/sizeof(InArray[0]);
if(op_code >= ins_count)
{
return -1;
}
Instruction *ip = InArray + op_code;
if(ip->parser == NULL)
{
sprintf(tmp,"%s",ip->ins_name);
}
else
{
int n = ip->parser(data,addr,arg_num);
i = i + n - 1;
}
What then is "InArray
"? See this:
Instruction InArray[] =
{
{"",NULL},
{"JUMP_FW", JUMP_FW},
{"JUMP_FW_W", JUMP_FW_W},
{"JUMP_BW", JUMP_BW},
{"JUMP_BW_W", JUMP_BW_W},
{"TJUMP_FW", TJUMP_FW},
{"TJUMP_FW_W", TJUMP_FW_W},
{"TJUMP_BW", TJUMP_BW},
{"TJUMP_BW_W",TJUMP_BW_W},
{"CALL", CALL},
{"CALL_LIB", CALL_LIB},
}
JUMP_FW
, JUMP_FW_W
etc...are all function pointers of type parser_t
(see decompiler.h):
typedef int (* parser_t)(BYTE *data,int addr,int arg_num);
The program checks the instruction parsing function by indexing "InArray
". Simple, and very fast.
Points of Interest
Multi-byte Integer Format
In many places, the byte code uses the "Multi-byte Integer Format" to represent an integer.
A multi-byte integer consists of a series of octets, where the most significant bit is the continuation flag and the remaining seven bits are a scalar value. The continuation flag is used to indicate that an octet is not the end of the multibyte sequence. A single integer value is encoded into a sequence of N octets. The first N-1 octets have the continuation flag set to a value of one (1). The final octet in the series has a continuation flag value of zero.
The remaining seven bits in each octet are encoded in a big-endian order, e.g., the most significant bit first. The octets are arranged in a big-endian order, e.g. the most significant seven bits are transmitted first. In the situation where the initial octet has less than seven bits of value, all unused bits must be set to zero (0).
For example, the integer value 0xA0 would be encoded with the two-byte sequence 0x81 0x20. The integer value 0x60 would be encoded with the one-byte sequence 0x60.
The function get_mb_uint
helps us to decode the "Multi-byte Integer".
unsigned int get_mb_uint(BYTE *data,int len,int &k)
{
unsigned int r = 0;
int i = 0;
for(i=0;i<len;i++)
{
BYTE b = data[i];
r = (r << 7) | (b & 0x7F);
if( (b & 0x80)==0 )
{
break;
}
}
k = k + i + 1;
return r;
}
Name Translation of WMLScript Standard Libraries
WMLScript bytecode uses "lib index" and "func index" to identify which standard library function is to be called.
char * make_call_name(int lindex,int findex);
Check the lib index and function index in the internal string table, and return the result. Users can read the library and function name in the decompiled text result directly, instead of checking documents.
Summary
Currently, the DWmlsc can only decompile the bytecode into "WMLScript Assembly Language". In future, I will enhance it to decompile bytecode into WMLScript, to be a real "Decompiler" :).
I am a chinese programmer. I am interesting in Compiler,OS,Program Debugging,Mobile device programming...
In 2001,After I graduated from JiNan University(www.jnu.edu.cn),I joined netease(www.163.com) and became a online-game programmer.
welcome to visite my blog:
http://spaces.msn.com/members/AAMissile/