Click here to Skip to main content
15,946,063 members
Articles / Programming Languages / Assembler

Implementing a Universal Turing Machine With Only mov Instructions

Rate me:
Please Sign up or sign in to vote.
5.00/5 (14 votes)
5 Feb 2017CPOL5 min read 25.2K   134   13   4
UTM based on mov is Turing-complete paper x86 and x86-64


A paper by Stephen Dolan which proved that the x86 mov instruction is Turing-complete was released in 2013 and followed by a movfuscator project which made an entire compiler out of the project. See the paper: Inside the paper, a Universal Turing Machine (UTM) is proposed as the proof of the Turing-completeness using only a single jump instruction.

This code also serves as a reference for some details on how to achieve x86 and x86-64 conditional compilation, with both inline assembly and external assembly and some semantics unique perhaps to the Windows platform though applicable to any.


Knowing of models of computation, Turing-completeness, Universal Turing Machines, Turing Machines and the C and x86 assembly programming languages is preferable along with inline assembler and Windows technologies or knowledge of how to port them to other platforms such as Linux and Mac (which should not be too difficult).

Using the Code

2 extra Turing machine models one which is not general and state limited, the other which is not but both of which are input tape size limited are included in the download package for additional examples in both C and assembly formats.

The definition followed by the code are presented which allows for a full C emulation of the research paper. Some cheating using a cmp and je instructions to gracefully abort the code in C without crashing is used as the original research paper allowed a segmentation fault with bad memory address 0 access to halt.

Deductions based on the code had to be made to figure out the data structure and format of the Turing machine, as well as the input tape and how to properly initialize them. Defining registers based on looking at mutual exclusion, and frequency of use was done to do it in the lowest impact way. The C prolog before the assembly adapts a simple encoding of a Turing machine which is assumed to be in current state sorted order into a memory addressed encoded version.

The code compiles and runs successfully on Microsoft Visual Studio 2015.

First, a Turing Machine which does some computation must be codified based on having the following format:

//{old state, old symbol, new symbol, direction, new state}
char turingMachine[][5] = {
    { '1', '0', '0', 'R', '1' },
    { '1', '1', '1', 'R', '1' },
    { '1', '_', '_', 'R', '3' },
    { '3', '0', '0', 'R', '3' },
    { '3', '1', '1', 'R', '4' },
    { '3', '_', '_', 'R', '2' },
    { '4', '0', '0', 'R', '4' },
    { '4', '1', '1', 'R', '4' },
    { '4', '_', '_', 'L', '5' },
    { '5', '0', '1', 'L', '5' },
    { '5', '1', '0', 'L', '6' },
    { '6', '0', '0', 'L', '6' },
    { '6', '1', '1', 'L', '6' },
    { '6', '_', '_', 'L', '7' },
    { '7', '1', '0', 'L', '7' },
    { '7', '0', '1', 'R', '1' },
    { '7', '_', '1', 'R', '1' }
}; //must be sorted
char Input[] = "0100_1001_";

Image 1

This is a Turing machine to perform binary addition using a decrement, then increment until 0 strategy. The initial input is encoded as 4 and 5 so the result should be 9 and 0. It must be in sorted order by current state value so it can be processed so if not presorted then quick sort or another option can be used. The following function is used to do binary search lookups on the new state:

int TMCmp(const void* l, const void* r)
	if (*(unsigned char*)l == *(unsigned char*)r) return 0;
	return *(unsigned char*)l > *(unsigned char*)r ? 1 : -1;

For external assembler support which is required for x64:

#include <basetsd.h>
extern "C" void __cdecl MovTM

 And the rest of the code is as follows:

	//<pointer to trigger symbol> <pointer to next state> 
	//<trigger symbol> <pointer to new symbol> 
    //<new symbol> <pointer to direction> 
    //<direction> <pointer to pointer to new state> 
    //<pointer to new state>
	ULONG_PTR* TransitionTape = (ULONG_PTR*)malloc((sizeof(turingMachine) / 5) 
       * 9 * sizeof(ULONG_PTR)); //{old state, old symbol, new symbol, direction, new state}
	ULONG_PTR* InputTape = (ULONG_PTR*)malloc(sizeof(Input) * sizeof(ULONG_PTR) * 2);
	ULONG_PTR* InputTapeLeft = (ULONG_PTR*)malloc(sizeof(Input) * sizeof(ULONG_PTR) * 2);
	for (int i = 0; i < sizeof(turingMachine) / 5; i++) {
		TransitionTape[i * 9] = (ULONG_PTR)(TransitionTape + i * 9) + sizeof(ULONG_PTR) * 2;
		TransitionTape[i * 9 + 3] = (ULONG_PTR)(TransitionTape + i * 9) + sizeof(ULONG_PTR) * 4;
		TransitionTape[i * 9 + 5] = (ULONG_PTR)(TransitionTape + i * 9) + sizeof(ULONG_PTR) * 6;
		TransitionTape[i * 9 + 7] = (ULONG_PTR)(TransitionTape + i * 9) + sizeof(ULONG_PTR) * 8;
		char* ptr = (char*)bsearch(&turingMachine[i][4], turingMachine, 
                     sizeof(turingMachine) / 5, 5, TMCmp);
		int idx;
		if (ptr != NULL) {
			idx = (ptr - (char*)turingMachine) / 5;
			while (idx != 0 && turingMachine[idx - 1][0] == turingMachine[i][4]) idx--;
		TransitionTape[i * 9 + 8] = ptr == NULL ? (turingMachine[i][4] == '2' ? 
        (ULONG_PTR)InputTapeLeft : (ULONG_PTR)InputTapeLeft) : 
        (ULONG_PTR)(TransitionTape + (idx * 9));
		TransitionTape[i * 9 + 2] = turingMachine[i][1];
		TransitionTape[i * 9 + 4] = turingMachine[i][2];
		TransitionTape[i * 9 + 6] = turingMachine[i][3] == 'R' ? sizeof(ULONG_PTR) : 0;
		TransitionTape[i * 9 + 1] = ((i == sizeof(turingMachine) / 5 - 1) || 
           turingMachine[i + 1][0] != turingMachine[i][0]) ? (ULONG_PTR)InputTapeLeft : 
           (ULONG_PTR)(TransitionTape + (i + 1) * 9);
	for (int i = 0; i < sizeof(Input); i++) {
		InputTape[i * 2] = Input[i];
		InputTapeLeft[i * 2] = '_';
		InputTape[i * 2 + 1] = (ULONG_PTR)(InputTape + (i + 1) * 2);
		InputTapeLeft[i * 2 + 1] = (ULONG_PTR)(InputTapeLeft + (i + 1) * 2);
	ULONG_PTR scratch[256]; // must be the size of the alphabet range
	ULONG_PTR TransitionTapePtr = (ULONG_PTR)TransitionTape;
	ULONG_PTR InputTapePtr = 
	(ULONG_PTR)InputTape + sizeof(ULONG_PTR) * 2; //part of tape right of current position,
                                // initially at beginning of input tape after current symbol
	ULONG_PTR InputTapeLeftPtr = 
	(ULONG_PTR)InputTapeLeft; //part of tape left of 
                                  //current position, initially empty
	                              //temporary registers, 
	                              //4 of which are mutually exclusive in usage
MovTM((ULONG_PTR)TransitionTapePtr, (ULONG_PTR)InputTape, 
(ULONG_PTR)InputTapeLeft, (ULONG_PTR)scratch, (ULONG_PTR)InputTapePtr, (ULONG_PTR)InputTapeLeftPtr);
#if !defined(_WIN64)	
	for (int i = 0; i < sizeof(Input); i++) {
		InputTape[i * 2] = Input[i];
		InputTapeLeft[i * 2] = '_';
		InputTape[i * 2 + 1] = (ULONG_PTR)(InputTape + (i + 1) * 2);
		InputTapeLeft[i * 2 + 1] = (ULONG_PTR)(InputTapeLeft + (i + 1) * 2);
#define ADDRPLIER 4
#define SIZEWORD dword
#define X eax
#define Y ebx
#define M ecx
#define Z Y
#define D Z
#define H M
//5 globally used registers
#define L edi
#define R L
#define T R
#define S edx
#define N esi
	__asm {
		mov T, TransitionTapePtr;; current state, initially Q0
		mov S, InputTape;; current symbol, initially Q0
		mov N, InputTapeLeft;; scratch part of tape left of current position, initially empty

		mov X, [T];; get transition
		mov X, [X];; get trigger symbol
		mov Y, [S];; get current symbol
		mov SIZEWORD ptr [scratch + Y * ADDRPLIER], 0;; compare symbols
		mov SIZEWORD ptr [scratch + X * ADDRPLIER], 1 * ADDRPLIER
		mov M, SIZEWORD ptr [scratch + Y * ADDRPLIER]

		mov X, [T];; get transition
		mov X, [X + 1 * ADDRPLIER];; skip trigger symbol
		mov X, [X];; load new symbol
		mov Y, [S];; load old symbol
		mov [N], Y;; select between X and Y
		mov [N + 1 * ADDRPLIER], X
		mov Z, [N + M]
		mov [S], Z;; write the new symbol

		mov D, [T];; get transition
		mov D, [D + 1 * ADDRPLIER];; skip trigger symbol
		mov D, [D + 1 * ADDRPLIER];; skip new symbol
		mov D, [D];; load direction

			mov R, InputTapePtr
		mov [N], R;; select new value for[S + 1]
			mov L, InputTapeLeftPtr
		mov [N + 1 * ADDRPLIER], L
		mov X, [N + D]
		mov [S + 1 * ADDRPLIER], X
		mov [N], L;; select new value for L
		mov [N + 1 * ADDRPLIER], S
		mov L, [N + D]
			mov InputTapeLeftPtr, L
		mov [N], S;; select new value for R
			mov R, InputTapePtr
		mov [N + 1 * ADDRPLIER], R
		mov R, [N + D]
			mov InputTapePtr, R

		mov SIZEWORD ptr [N], 1 * ADDRPLIER;; set X = not D
		mov SIZEWORD ptr [N + 1 * ADDRPLIER], 0
		mov X, [N + D]
		mov [N], X;; select between D and X
		mov [N + 1 * ADDRPLIER], D
		mov D, [N + M]

			mov L, InputTapeLeftPtr
		mov [N], L;; select new value of S
			mov R, InputTapePtr
		mov [N + 1 * ADDRPLIER], R
		mov S, [N + D]
		mov X, [S + 1 * ADDRPLIER];; get new start of L or R
		mov [N], X;; select new value for L
			mov L, InputTapeLeftPtr
		mov [N + 1 * ADDRPLIER], L
		mov L, [N + D]
			mov InputTapeLeftPtr, L
			mov R, InputTapePtr
		mov [N], R;; select new value for R
		mov [N + 1 * ADDRPLIER], X
		mov R, [N + D]
			mov InputTapePtr, R

			mov T, TransitionTapePtr
		mov X, [T + 1 * ADDRPLIER];; get next transition of this state
		mov Y, [T];; get current transition
		mov Y, [Y + 1 * ADDRPLIER];; skip trigger symbol
		mov Y, [Y + 1 * ADDRPLIER];; skip new symbol
		mov Y, [Y + 1 * ADDRPLIER];; skip direction
		mov Y, [Y];; load transition list of next state
		mov [N], X;; select next transition
		mov [N + 1 * ADDRPLIER], Y
		mov T, [N + M]
			mov TransitionTapePtr, T

		mov X, [T]
		mov SIZEWORD ptr [N], 1 * ADDRPLIER
		mov SIZEWORD ptr [T], 0
		mov H, [N]
		mov [T], X

		mov SIZEWORD ptr [N], 0;; select between 0 and N
		mov [N + 1 * ADDRPLIER], N
		mov X, [N + H]
			cmp X, 0
			je finish
		mov X, [X];; load from 0 or N
		jmp start

x86 Pre-Link Event command (safeseh likely only needed on the release build): ml.exe /safeseh /Fo "$(OutDir)tm.obj" /c tm.asm

x64 Pre-Link Event command: ml64.exe /DX64 /Fo "$(OutDir)tm.obj" /c tm.asm

x86/64 Linker Input Additional Dependency: $(OutDir)\tm.obj

x64 Post-Build Event for debug build only (C runtime issue for assembly linkage): copy "$(MSBuildProgramFiles32)\Microsoft SDKs\Windows Kits\10\ExtensionSDKs\Microsoft.UniversalCRT.Debug\10.0.14393.0\Redist\Debug\x64\ucrtbased.dll" "$(OutDir)" copy "$(MSBuildProgramFiles32)\Microsoft Visual Studio 14.0\VC\redist\debug_nonredist\x64\Microsoft.VC140.DebugCRT\vcruntime140d.dll" "$(OutDir)"

Assembly x64 prologue:

ifndef X64
.model flat, C
option casemap :none


ifndef X64


;eax, ecx, edx are considered volatile
MovTM PROC USES ebx edi esi TransitionTapePtr:DWORD, InputTape:DWORD, InputTapeLeft: DWORD, 
scratch: DWORD, InputTapePtr: DWORD, InputTapeLeftPtr: DWORD

scratch TEXTEQU <r9>
TransitionTapePtr TextEQU <T>
InputTape TextEQU <S>
InputTapeLeft TextEQU <N>

;rax, rcx, rdx, r8, r9, r10, r11 are considered volatile
;x64 calling convention passes arguments fastcall: RCX, RDX, R8, R9, on stack 
;but using default argument scheme uses stack and without first 4 indexes them incorrectly
;PROC -> DWORD PTR[rsp + 40] also could work here to use a more appropriate esp based indexing 
;instead of ebp

MovTM PROC USES rbx rdi Dummy1: QWORD, Dummy2: QWORD, Dummy3: QWORD, 
Dummy4: QWORD, InputTapePtr: QWORD, InputTapeLeftPtr: QWORD

The code inside the __asm {} block above should be put in between followed by this assembly x64 epilogue:



For the sake of simplicity of understanding the dual linked list stack and memory pointer transition table constructs, the following C pseudo code which introduces control flow simplification represents the general functionality:

    ULONG_PTR CurInputTapePtr = (ULONG_PTR)InputTape;
	TransitionTapePtr = (ULONG_PTR)TransitionTape;
	InputTapePtr = (ULONG_PTR)InputTape + sizeof(ULONG_PTR) * 2;
	InputTapeLeftPtr = (ULONG_PTR)InputTapeLeft;
	do {
		if (*(ULONG_PTR*)(*(ULONG_PTR*)TransitionTapePtr) == *(ULONG_PTR*)CurInputTapePtr) {
			*(ULONG_PTR*)CurInputTapePtr = *(ULONG_PTR*)(*(ULONG_PTR*)
            (*(ULONG_PTR*)TransitionTapePtr + sizeof(ULONG_PTR)));
			if (*(ULONG_PTR*)(*(ULONG_PTR*)(*(ULONG_PTR*)(*(ULONG_PTR*)TransitionTapePtr + 
            sizeof(ULONG_PTR)) + sizeof(ULONG_PTR))) == 0) {
				*(ULONG_PTR*)(CurInputTapePtr + sizeof(ULONG_PTR)) = InputTapePtr;
				InputTapePtr = CurInputTapePtr;
				CurInputTapePtr = InputTapeLeftPtr;
				InputTapeLeftPtr = *(ULONG_PTR*)(CurInputTapePtr + sizeof(ULONG_PTR));
			} else {
				*(ULONG_PTR*)(CurInputTapePtr + sizeof(ULONG_PTR)) = InputTapeLeftPtr;
				InputTapeLeftPtr = CurInputTapePtr;
				CurInputTapePtr = InputTapePtr;
				InputTapePtr = *(ULONG_PTR*)(CurInputTapePtr + sizeof(ULONG_PTR));
			TransitionTapePtr = *(ULONG_PTR*)(*(ULONG_PTR*)(*(ULONG_PTR*)(*(ULONG_PTR*)
                                (*(ULONG_PTR*)TransitionTapePtr + sizeof(ULONG_PTR)) + 
                                sizeof(ULONG_PTR)) + sizeof(ULONG_PTR)));
		} else {
			TransitionTapePtr = *(ULONG_PTR*)(TransitionTapePtr + sizeof(ULONG_PTR));
	} while (TransitionTapePtr != (ULONG_PTR)InputTapeLeft);

Points of Interest

The data structure of the Turing machine encoding format which uses memory pointers and lists of state transitions within a given state was deduced, and from the typical encoding derived with some simple C code. The linked list format of the input tape and its left part were also deduced and simulated.

2 errors in the research paper were found, namely around here, we see R was improperly placed instead of L: mov [N], X ;; select new value for L \\ mov [N+1], R \\ mov L, [N + D] and we see that 'H is 1, we must halt the machine' but then in fact the code selects to halt if H is 0 not 1. To correct it in the preceding code, we swap the 0 and 1, mov [N], 0 \\ mov [T], 1 Now if H is 0, we must halt the machine which is done with 2 cheating instructions, a compare and conditional jump simply for successful simulation in C with graceful termination.

Research papers can have small mistakes which one must carefully peruse and spend time fixing. We have tried to remain as exactly faithful to the pseudo code of the paper as possible while only adjusting for register additions, transition from C, and graceful termination which is noted with tabification.

The x86 limited number of registers requires looking for mutual exclusivity and saving or reloading them in a careful manner. The esp and ebp should not be used especially from C where recovering them or accessing the data would become more difficult, though with great effort and care, it is indeed possible. 64-bit code has access to the r8-r15 registers making it possible to have a register for every needed variable. The calling semantics are peculiar and must be very specially handled.

An example of an equivalent UTM in C, as well as a 64-bit conditionally compilable version make a more general and well understood implementation.

Conclusion is that indeed the research paper UTM which although more theoretical and not practically implemented as much as the result of the paper and the ability to use mov instructions for obfuscation, is correct.


  • Initial version
  • x64 and external assembly file support


This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By
Software Developer (Senior)
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

GeneralI always like this kind of article! Pin
Southmountain1-Apr-17 10:06
Southmountain1-Apr-17 10:06 
GeneralMy vote of 5 Pin
Hans Flocken30-Jan-17 11:37
Hans Flocken30-Jan-17 11:37 
PraiseA very interesting idea! Pin
Daniel Pfeffer29-Jan-17 4:24
professionalDaniel Pfeffer29-Jan-17 4:24 
GeneralMy vote of 5 Pin
Frank Lindecke24-Jan-17 6:25
Frank Lindecke24-Jan-17 6:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.