DirectX and Pure Assembly Language: Doing What Can't be Done - Part III

CMalcheski

4.98/5 (19 votes)

Jul 10, 2017

CPOL

25 min read

15384

App Initialization and Main Window Creation

[Apologies for the misformatted code snippets that appear sporadically in this article. I've been over these, deleted then retyped them, redone the formatting, all to no avail. As near as I can tell, this appears to be a bug in the CodeProject HTML - if not, I'm just not seeing what the issue is. The line breaks are not present when I submit the article. They show up afterward; horizontal scrolling in the snippets turns off and the line breaks appear out of nowhere.]

The source code for this article can be downloaded from http://www.starjourneygames.com/demo part 3.zip

Beginning the Main Module

Unlike 32-bit assembly, relevant directives (such as .686P) are few and far between. I have not yet had occasion to use one. With this being the case, the main module begins with the following:

     include     constants.asm                           ;
     include     externals.asm                           ;
     include     macros.asm                              ;
     include     structuredefs.asm                       ;
     include     wincons.asm                             ;

     .data                                               ;

     include     lookups.asm                             ;
     include     riid.asm                                ;
     include     routers.asm                             ;
     include     strings.asm                             ;
     include     structures.asm                          ;
     include     variables.asm                           ;

     .code                                               ;

With this, an adjustment is made to the main source file’s beginning so that utility files, such as typedefs, structure declarations, and macros, come before the .data directive. The files that contain no code or data (those appearing before the .data directive) define macros, structures, etc. for the compiler. They generate no actual output, so these can be placed before the .data directive, where the compiler doesn't yet have any idea where to place generated output and thus would reject actual code or data.

The linker will set the entry point (discussed next) at the start of the .code section, so the ordering of the code and data sections is irrelevant. This app places data first.

Each of the include files listed above is self-explanatory as to its content. The accompanying source code contains the complete files.

The Windows Entry Point

An application’s WinMain function isn’t really its entry point. When an assembly language application declares its code segment with

.code

the first executable instruction after .code becomes the app’s true entry point. WinMain is purely superfluous and is not actually required at all. I don’t use it; in an assembly language application it provides no benefit over creating the app without it. While one could argue that the parameters passed to WinMain might be critical to initializing an application, that argument is zero sum when it comes to an assembly app because you, the developer, need to set up that call to WinMain if you’re going to use it. If you have to retrieve all the information typically sent to WinMain before calling it, why use the function at all?

That said, there is certainly no measurable harm in including WinMain if you want to do it. You may have sound reasons for wanting to include it, so add it if you feel it’s necessary. Just be aware that your own startup code – which begins execution immediately after the .code statement – must manually set up the WinMain call.

To call WinMain, the nCmdShow parameter is retrieved from the wShowWindow field of the STARTUPINFO structure, which is passed to GetStartupInfo.

GetCommandLine returns the lpCmdLine parameter; pass that value as-is to WinMain.

hPrevInstance is always null, per MSDN documentation for WinMain.

hInstance is retrieved by calling GetModuleHandle (0).

Below is the complete initialization source for calling WinMain. Note that even if you're not going to include WinMain, you may still need some or all of the parameters that are passed to it. The discussion that follows covers retrieval of this information, with or without implementation of WinMain.

Declare the STARTUPINFO structure in structuredefs.asm (or wherever you prefer to put it):

STARTUPINFO         struct
cb                  qword     sizeof ( STARTUPINFO )         
lpReserved          qword     ?         
lpDesktop           qword     ?         
lpTitle             qword     ?         
dwX                 dword     ?         
dwY                 dword     ?         
dwXSize             dword     ?         
dwYSize             dword     ?         
dwXCountChars       dword     ?         
dwYCountChars       dword     ?         
dwFillAttribute     dword     ?         
dwFlags             dword     ?         
wShowWindow         word      ?         
cbReserved2         word      3 dup ( ? )
lpReserved2         qword     ?         
hStdInput           qword     ?         
hStdOutput          qword     ?         
hStdError           qword     ?         
STARTUPINFO         ends

In the externals.asm file, declare the functions to be called for initialization:

extrn              __imp_GetCommandLineA:qword
GetCommandLine     textequ     <__imp_GetCommandLineA>

extrn              __imp_GetModuleHandleA:qword
GetModuleHandle    textequ     <__imp_GetModuleHandleA>

extrn              __imp_GetStartupInfoA:qword
GetStartupInfo     textequ     <__imp_GetStartupInfoA>

If you’re using Unicode, you should declare GetCommandLineW, GetModuleHandleW, and GetStartupInfoW instead of the “A” functions shown above.

With the required functions now a known quantity to the compiler, variables need to be declared for holding the parameters to pass to WinMain – these are placed in the file variables.asm:

hInstance          qword     ?
lpCmdLine          qword     ?

In the file structures.asm, declare the STARTUPINFO structure:

startup_info STARTUPINFO <> ; cbSize is already set in the structure declaration

The entry point to the application can then be coded as follows (Startup can be renamed to anything you like - if you're going to call it WinMain, be aware that it doesn't inherently conform to the documentation for WinMain):

.code

align          qword
Startup        proc                          ; Declare the startup function; this is declared as /entry in the linker command line

local          holder:qword                  ; Required for the WinCall macro

xor            rcx, rcx                      ; The first parameter (NULL) always goes into RCX
WinCall        GetModuleHandle, 1, rcx       ; 1 parameter is passed to this function
mov            hInstance, rax                ; RAX always holds the return value when calling Win32 functions

WinCall        GetCommandLine, 0             ; No parameters on this call
mov            lpCmdLine, rax                ; Save the command line string pointer

lea            rcx, startup_info             ; Set lpStartupInfo
WinCall        GetStartupInfo, 1, rcx        ; Get the startup info
xor            rax, rax                      ; Zero all bits of RAX
mov            ax, startup_info.wShowWindow  ; Get the incoming nCmdShow

; Since this is the last "setup" call, there is no reason to place nCmdShow into a memory variable then
; pull it right back out again to pass in a register.  Register-to-register moves are exponentially
; faster than memory access, so all that needs to be done is to move RAX into R9 for the call to WinMain.

mov            r9, rax                       ; Set nCmdShow
mov            r8, lpCmdLine                 ; Set lpCmdLine
xor            rdx, rdx                      ; Zero RDX for hPrevInst
mov            rcx, hInstance                ; Set hInstance
call           WinMain                       ; Execute call to WinMain

xor            rax, rax                      ; Zero final return – or use the return from WinMain
ret                                          ; Return to caller

Startup        endp                          ; End of startup procedure

NOTE: If you’re strictly adhering to 64-bit calling convention, then the startup code’s call to WinMain should use the WinCall macro and not a direct call instruction. I don’t do this in my apps, and I won’t be doing it in this sample code. Stack usage means memory usage, and my first rule of programming is to avoid memory hits. When accessing the shadow area on the stack for RCX, RDX, R8, and R9 during a call, indirect addressing must be used, which further slows the app. As mentioned earlier, the in trepidation over violating the 64-bit calling convention may be strong. However it will still be unfounded as far as necessity is concerned. I simply see no benefit to using that convention within my app’s local functions – it requires a relatively large amount of setup code, which, across an entire app, takes too much time to execute; it costs memory, and it adds to development time. Any function declared within my app (including this one) does not use the 64-bit calling convention directly – parameters are still passed in RCX, RDX, R8, and R9 for the first four, but space for shadowing them is not reserved on the stack. For additional parameters, I simply use other registers. The stack is not used for any parameter data. Some will call this reckless, but “reckless” would only be relative to the standard being compared to.

Further, in my own apps, calls to local functions use a single pointer (in RCX) pointing to a structure that holds all the parameters to be passed to that function. Doing things any other way, the first thing most local functions would need to do would be to save the incoming parameters in local variables for later access; from the CPU’s perspective this is little different from using the 64-bit convention as it was created. I’m not doing that here; individual registers carry the parameters into local functions as they’re called. The reason for this is that from a tutorial standpoint it’s much more confusing to work with a pointer to an array of parameters for every function, many of which will be pointers to other things. When is enough, enough? So it is again reiterated: modify the code as required to suit your own preferences.

The Almighty RAX

Born in the earliest recorded incarnations of Intel CPU design, the RAX register (building on EAX, which built on AX) always holds the return value from a function. I have not seen an exception to this rule in any WinAPI function or method, no matter what it is. Even drivers use it across the board. All functions within any application I write do the same, so it will apply here: no matter what you call, when you call it, or where you call it from, RAX holds the return value.

Registering the Window Class

To create the app’s main window, its window class has to be registered. This necessitates declaring the WNDCLASSEX structure, which in this sample code will be done in a separate file called structuredefs.asm. As always, you’re free to move things around and rename files as you see fit. However, at this point an adjustment is being made to the main source file’s beginning so that utility files, such as typedefs, structure declarations, and macros, come before the .data directive.

The following is placed in the structuredefs.asm file:

WNDCLASSEX        struct
cbSize            dword     ?
dwStyle           dword     ?                   
lpfnCallback      qword     ?                    
cbClsExtra        dword     ?
cbWndExtra        dword     ?
hInst             qword     ?
hIcon             qword     ?
hCursor           qword     ?
hbrBackground     qword     ?
lpszMenuName      qword     ?
lpszClassName     qword     ?
hIconSm           qword     ?
WNDCLASSEX        ends

Nesting structures will be covered when the subject of DirectX structures is delved into – when there are actual examples to work with.

Mind your D’s and Q’s – be careful when copying code to accurately type dword and qword declarations. One typo here could (and probably would) crash the app.

Strictly for the sake of keeping the source readable, I use a single constant to represent long chains of flags that are logically OR’d together. In this app, WNDCLASSEX.dwStyle will equate to:

classStyle equ CS_VREDRAW OR CS_HREDRAW OR CS_DBLCLKS OR CS_OWNDC OR CS_PARENTDC

(The CS_xxx constants come from the Windows header files; they're declared in wincons.asm in the source code attached to this article.)

The OR directive is reserved by the assembler; it functions the same as the | character in most other languages. I add the above line to my constants.asm file; you can place it (if you use it at all) wherever you like.

With this done, the actual WNDCLASSEX structure, which will be used to create the main window, can now be declared. In this sample code, it’s done in the structures.asm file. This is where all fixed data that can be is placed directly into the structure fields. There’s no reason to move data around any more than is required, so on-the-fly initialization where it isn’t required makes no sense at all. With WNDCLASSEX, the hInst, hIcon, hbrBackground, and hIconSm fields won’t be known until runtime, so they’re initialized at zero.

There are several options when declaring the actual data for the WNDCLASSEX structure (which will be named wcl). The most traditional form can be used:

wcl WNDCLASSEX <sizeof (WNDCLASSEX), [. . .]>

Using the format above, all fields must be accounted for if any are declared (however this will vary depending on the actual assembler being used).

The way data is declared in assembly language allows for some flexibility in declaring structures, and at least for me, this comes in handy. Declaring wcl as a label of type WNDCLASSEX allows me to list each field separately, while still having the debugger recognize the structure as WNDCLASSEX. In the source code, I prefer having access to the individual fields per-line; it keeps the source cleaner and makes things easier to read and update. Of course, doing things this way also opens the door to errors; if the field-by-field declaration doesn’t exactly match the WNDCLASSEX definition, there are going to be problems. So using this method may not be for everybody. If you don’t like it, just use the first form shown above.

wcl     label     WNDCLASSEX
        dword     sizeof ( WNDCLASSEX )  ; cbSize
        dword     classStyle             ; dwStyle
        qword     mainCallback           ; lpfnCallback
        dword     0                      ; cbClsExtra
        dword     0                      ; cbWndExtra
        qword     ?                      ; hInst
        qword     ?                      ; hIcon
        qword     ?                      ; hCursor
        qword     ?                      ; hbrBackground
        qword     mainName               ; lpszMenuName
        qword     mainClass              ; lpszClassName
        qword     ?                      ; hIconSm

The function mainCallback is the callback function for the window class; it will be discussed next. The variable mainClass is the class name string, which I define in the strings.asm file, along with the main window name (window text), as:

mainClass     byte     ‘DemoMainClass’, 0  ; Main window class name
mainName      byte     ‘Demo Window’, 0    ; Main window title bar text

Note that the terminating 0 must be declared at the end of the string. The assembler doesn’t automatically place it.

From this point forward, assume that any constant used in Win32 calls must be declared somewhere in your source. I put my Win32 constants in the file wincons.asm, and constants that are unique to my app in the file constants.asm.

To fill in hCursor, LoadImage is called as follows. In constants.asm, declare:

lr_cur equ LR_SHARED OR LR_VGACOLOR OR LR_DEFAULTSIZE

This is an optional step. If you prefer to use the values directly, simply replace lr_cur below with LR_SHARED OR LR_VGACOLOR OR LR_DEFAULTSIZE.

     xor                 r11, r11                                ; Set cyDesired; uses default if zero: XOR R11 with itself zeroes the register
     xor                 r9, r9                                  ; Set cxDesired; uses default if zero: XOR R9 with itself zeroes the register
     mov                 r8, image_cursor                        ; Set uType
     mov                 rdx, ocr_normal                         ; Set lpszName
     xor                 rcx, rcx                                ; Set hInstance to 0 for a global Windows cursor
     WinCall         LoadImage, 6, rcx, rdx, r8, r9, r11, lr_cur ; Load the standard cursor
     mov                 wcl.hCursor, rax                        ; Set wcl.hCursor

NOTE: many data structures requiring runtime initialization will be of dword, or smaller, size. You must pay close attention to this, as assembly will not stop you from writing a qword into a dword location. It’ll simply overwrite the next four bytes after the dword when that isn’t what you want to do. If wcl.hCursor were a dword, then the 32-bit EAX, not the 64-bit RAX, would be written there. If it were a 2-byte word, then the 16-bit AX would be written, and if it were a single 8-bit byte, then AL would be written.

There are countless references to Intel architecture registers online. Use them as needed. The hardware designers, as a predominant rule, are very consistent in their naming of things, so it won’t take long to memorize how the registers work. Just keep using them; the information will stick.

hInstance is assigned when the app starts up; anybody with any appreciable WinAPI experience knows that it’s used a lot. It’ll be required when initializing DirectX, among other things. It isn't used when loading the standard cursor because it's a stock object provided by Windows. If hInstance is specified, Windows will search the calling application for the resource. It won't find it, and the call will fail.

In the call above, the WinCall macro is invoked. LoadImage takes six parameters, so six parameters are specified. 64-bit calling convention requires that RCX, RDX, R8, and R9 hold the first four parameters, meaning these cannot be altered – other registers cannot be used in their place. R11 is arbitrary; any open register except RAX or R10 (which the WinCall macro uses internally) can be used to store the cyDesired parameter because it’s not one of the first four parameters. The WinCall macro will properly set up the stack for the call to LoadImage, caring nothing for which registers are used to carry values on entry into the macro. Just remember R10 is used by the macro itself as a taxi service so don't use it, or RAX, to give your parameters a sendoff into WinCall.

For setting the hIcon and hIconSm fields, LoadImage is used the same as shown above; just change the cxDesired and cyDesired parameters to the appropriate dimensions of the icon; uType is set to image_icon, and of course lpszName is set to the name of the resource as it’s declared in your resource file. hInstance must be set to the application's hInstance, since these resources are part of the application - they are not Windows-global. The resource file is compiled by RC.EXE so nothing changes from its format in any C++ application (or any other language that handles resource files the same as C++). The resource file lines for the icons are shown below:

LARGE_ICON   ICON   DISCARDABLE   “LargeIcon.ico”
SMALL_ICON   ICON   DISCARDABLE   “SmallIcon.ico”

In the assembly source, declare the names of the resources (I place these in the strings.asm file):

LargeIconResource     byte     ‘LARGE_ICON’, 0 ;
SmallIconResource     byte     ‘SMALL_ICON’, 0 ;

Pointers to the variables LargeIconResource and SmallIconResource are passed in turn to each of the two LoadImage calls that load the icons. The results are placed into wcl.hIcon and wcl.hIconSm respectively. These are qword fields so RAX, on return from LoadImage, is assigned to each. To actually load the pointers to the strings, use the assembly lea instruction. This is “load effective address;” it was designed to perform on-the-fly calculation of memory address, and it will be used to its full potential later in the app. For now, it simply loads the pointer with no calculation:

lea rdx, LargeIconResource ;

The reason this is necessary is that ML64.EXE eliminated the offset directive that used to be a part of Microsoft's assembly language. Had they not retired it, the line could simply be encoded as “mov rdx, offset LargeIconResource.” I have no idea why they did it; it’s another step backward in the eternal march toward abstraction de-evolution.

With all the above handled, the window class can now be declared:

lea     rcx, wcl                    ; Set lpWndClass
winCall RegisterWindowClass, 1, rcx ; Register the window class

The return value is not used by this app.

Registering the window class finally allows creation of the window that will serve as the DirectX render target. Win32 is Win32 (64 bits or otherwise) so logically the window creation process continues the same as it would in a C++ application.

The complete startup code is in the attached source for this article.

Your Window to the Future

This section will focus primarily on the callback function for the main window. The main difference between this app and the average C++ application is that here, the switch statement will not be used. I have always disliked that statement, finding it clunky, primitive, and quite inefficient (given that I look at everything from the viewpoint of what is actually executing on the CPU).

In place of switch, the CPU’s group of scan instructions is used. These are CPU-level instructions that scan a given array of bytes (, words, dwords, qwords) in memory until a match is found (or not). The value to scan for is always contained in RAX for qwords, EAX for dwords, AX for words and AL for bytes. RCX holds the number of qwords, etc. to scan, and RDI points to the location in memory to begin scanning.

One of the very convenient uses for the scan instructions is sizing a string. The following code performs this task:

mov rdi, <location to scan>    ; Set the location to begin scanning
mov rcx, -1                    ; Set the max unsigned value for the # of bytes to scan
xor al, al                     ; Zero AL as the value to scan for
repnz scasb                    ; Stop scanning when a match is encountered or the RCX counter reaches zero
not rcx                        ; Apply a logical NOT to the negative RCX count
dec rcx                        ; Adjust for overshot (the actual match of 0 is counted and needs to be undone)

Whatever is encountered in memory that stops the scan (if repnz is used, for “repeat while not zero [repeat while the CPU’s zero flag is clear]”), RDI will point at the next byte, word, dword, or qword after that value. In the sample code above, when the scan completes, RDI will point at the byte immediately after the string’s terminating zero. For Unicode strings, the code would look like this:

mov rdi, <location to scan>   ; Set the location to begin scanning
mov rcx, -1                   ; Set the max unsigned value for the # of bytes to scan
xor ax, ax                    ; Wide strings use a 16-bit word as a terminator
repnz scasw                   ; Scan words, not bytes
not rcx                       ; Apply a logical NOT to the negative RCX count
dec rcx                       ; Adjust for overshot

RCX then holds the length of the string. However, all good things come with a down side; if you forget to add the terminating 0 after a string (whether that string is wide or ANSI), the wrong size will be returned as the scan will continue until either RCX reaches 0 (that’s potentially a whole lot of bytes to scan), or the next 0 value is encountered somewhere in memory after the scan start position. Still, in such a situation, forgetting the terminating 0 is not going to end well regardless of the approach used to sizing the string.

In my apps, I typically precede all strings with a size qword so that I can simply lodsq that size qword into RAX, leaving RSI pointing at the string start. (All the lods? Instructions move data into RAX, EAX, AX, or AL.) If a string is static and isn’t going to change (i.e. the window class name), there is no point in sizing it at all at runtime, let alone multiple times, when the correct size can be set during compile.

You won’t always want to use this approach. If you have a limited size buffer and want to be sure you don’t scan beyond it, then you’ll have to set RCX to the buffer size. However doing this will force you to calculate the string size by either subtracting the ending count in RCX from the starting count, or by subtracting the ending pointer (plus one) from the starting pointer after the scan.

Call Me Right Back, K?

Regarding the window callback, a lookup table contains all the messages handled by the callback. This table always begins with a qword that holds the entry count. The offset into the table is then calculated, and the same offset into a corresponding router table holds the location of the code for handling that message.

     mov                 rax, rdx                                ; Set the value to scan (incoming message)
     lea                 rdi, message_table                      ; Point RSI @ the table start (entry count qword)
     mov                 rcx, [ rdi ]                            ; Load the entry count
     scasq                                                       ; Skip over the entry count qword
     mov                 rsi, rdi                                ; Save pointer @ table start
     repnz               scasq                                   ; Scan until match found or counter reaches 0
     jnz                 call_default                            ; No match; use DefWindowProc
     sub                 rdi, rsi                                ; Get the offset from the first entry of the table (not including entry count qword)
     lea                 rax, message_router                     ; Point RAX at the base of the router table
     call                qword ptr [ rax + rdi - 8 ]             ; Call the handler

For this app, only four messages will be initially handled: WM_ERASEBKGND, WM_PAINT, WM_CLOSE, and WM_DESTROY.

The handler for WM_ERASEBKGND does nothing but return TRUE (1) in RAX. Returning TRUE for this message tells Windows that all handling for the message is complete and nothing more needs to be done. (Relative to all Windows messages a callback might process, the value that your app must return is FALSE (0) far more often than not, but it’s never safe to assume – always check the documentation for each message handled; some require a very different and specific return value depending on how you handled the message. This is especially true in dialog boxes, in particular with NM_ notifications sent through WM_NOTIFY.)

There are cases where an application engaging in complex drawing operations may want to know when the background is being erased, but even these odd men out may no longer exist. The WM_ERASEBKGND message itself is an ancient artefact of a bygone era. In the earliest days of Windows, just drawing a stock window with a frame could seriously tax the graphics adapter. As such, a slew of tricks and compensations had to be employed to speed up the process when and where that could be done. One of these innovations was the idea of identifying the exact portion of a window’s client area that actually needed to be redrawn. This “update region” often does not include the entire client area, and any time a few CPU cycles could be saved, it was worth doing.

So the concept of “update areas” or “update regions” (a region is a collection of rectangles) was created. Within the client area, a region was declared as “invalid,” meaning it had to redrawn. In modern times, the code required to process update regions is arguably slower and bulkier than simply redrawing the entire client area off screen then drawing it in one shot onto the window itself. DirectX employs this “double buffering” technique extensively to avoid flicker, which comes from “erase then redraw” on-screen. Still, even in Windows 10, the remnants of the old window painting system remain; the handler for the WM_PAINT message must explicitly “validate” the update area. If it doesn’t, WM_PAINT messages will repeat forever, flowing into a window’s callback function fast and furious, dragging down the performance of the entire application. This sample code’s WM_PAINT handler uses the Win32 ValidateRect function as its only task, since DirectX takes over all drawing in the window client area.

A Volatile Situation

Note that per MSDN, the following registers are considered nonvolatile – any function called will preserve their values across its entire execution:

R12, R13, R14, R15; RDI, RSI, RBX, RBP, RSP

These must be saved then restored if they’re altered in the window callback (or any other) function. For this application, all of the nonvolatile registers are saved regardless of the message being handled - saving is done before the incoming message is even looked at. You may or may not want to alter this behavior in your own application’s internal functions, but when you call any Windows function, you have to be aware that the contents of volatile registers can never be relied on to persist across the call. One function or another may indeed leave a volatile register unchanged on its return, but specs are specs and that behavior could easily change at any time, especially given the any-time-is-a-good-time update policy for Windows 10.

The complete function for the window callback is shown below, noting that the assembler will handle saving and restoring RBP and RSP for each function declared:

mainCallback       proc                                                        ;

                   local               holder:qword, hwnd:qword, message:qword, wParam:qword, lParam:qword

                   ; Save nonvolatile registers

                   push                rbx                                     ;
                   push                rsi                                     ;
                   push                rdi                                     ;
                   push                r12                                     ;
                   push                r13                                     ;
                   push                r14                                     ;
                   push                r15                                     ;

                   ; Save the incoming parameters

                   mov                 hwnd, rcx                               ;
                   mov                 message, rdx                            ;
                   mov                 wParam, r8                              ;
                   mov                 lParam, r9                              ;

                   ; Look up the incoming message

                   mov                 rax, rdx                                ; Set the value to scan (incoming message)
                   lea                 rdi, message_table                      ; Point RSI @ the table start (entry count qword)
                   mov                 rcx, [ rdi ]                            ; Load the entry count
                   scasq                                                       ; Skip over the entry count qword
                   mov                 rsi, rdi                                ; Save pointer @ table start
                   repnz               scasq                                   ; Scan until match found or counter reaches 0
                   jnz                 call_default                            ; No match; use DefWindowProc
                   sub                 rdi, rsi                                ; Get the offset from the first entry of the table (not including entry count qword)
                   lea                 rax, message_router                     ; Point RAX at the base of the router table
                   call                qword ptr [ rax + rdi - 8 ]             ; Call the handler
                   jmp                 callback_done                           ; Skip default handler

call_default:                                                                  ; The only changed register holding incoming parameters is RCX so only reset that
                   mov                 rcx, hWnd                               ; Set hWnd
                   WinCall             DefWindowProc, 4, rcx, rdx, r8, r9      ; Call the default handler

callback_done:     pop                 r15                                     ;
                   pop                 r14                                     ;
                   pop                 r13                                     ;
                   pop                 r12                                     ;
                   pop                 rdi                                     ;
                   pop                 rsi                                     ;
                   pop                 rbx                                     ;

                   ret                                                         ; Return to caller

mainCallback       ends                                                        ; End procedure declaration

When popping registers, they must be popped in the exact reverse order from how they were saved (pushed). Intel architecture uses a LIFO stack model – last in, first out. Each push instruction (assuming a qword is pushed) stores that qword in memory at the location pointed to by RSP; RSP is then backed up (moves toward 0) by 8 bytes. The “lowest” address on the stack is the “top” of the stack. Entry into this callback will place items on the stack in the order they’re pushed – the lowest (closest to 0) address holds r15; the highest holds RBX (referencing the code above).

One of the most common bugs in an assembly language application is forgetting to pop an item that was pushed onto the stack. You’re not in Kansas anymore, Toto, so you have to do these things manually. Extra power carries extra responsibility. When this occurs, the return from the function (the ret instruction) will itself pop the qword off the top of the stack and jump to whatever address it holds. So inadvertently leaving even one qword on the stack will completely demolish the return from the function and typically send the app careening down to Mother Earth in flames.

The code above represents the entirety of the callback function (minus the individual message handlers) for the main window. Each handler can be thought of as a “subroutine” in the original BASIC language, for those who can remember that far back. The handler code for each message is actually part of the mainCallback function – it lives inside the function itself. Since all the handlers are coded after the ret instruction, they will never execute unless explicitly called.

Within the handlers, the only oddity is the use of the ret instruction to return from the handler into the main code of the function. From the assembler’s viewpoint, you can’t do this. The assembler sees ret and it assumes you’re returning from the function itself. As such, it will insert code that attempts to restore RBP and RSP, doing so at a location where you most definitely don’t want that happening. This is where I employ a semi-unorthodox method: I directly encode the ret statement as:

byte     0C3h     ; Return to caller

Alternatively, you could use a textequ statement to change that to something more palatable, like:

HandlerReturn     textequ     <byte 0C3h> ;

Text equates evaporate beyond the source code level; the assembler will simply replace all occurrences of HandlerReturn with byte 0C3h. You just won’t have to look at it in your source code.

At the CPU level, the call instruction doesn’t do anything with RBP or RSP directly. The instruction itself simply pushes the address of the next instruction after call, then jumps to wherever you’re calling. Correspondingly, ret doesn’t do anything except pop whatever value is waiting at the top of the stack and jump to that value, as an address. (The code to restore RBP, reset RSP [from its setup for local variable usage], and return is generated by the assembler.) The CPU has no concept of what a function is; the assembler gives all of that meaning completely separately from the CPU. So hard-coding 0C3h as the ret statement prevents the compiler from trying to helpfully crash your program by resetting RBP and RSP in preparation for a return from the function. “But wait, I’m still on the potty!” Encoding the byte directly in the code stream is just another benefit of the flexibility afforded by the very direct assembly language data model (which is “no model at all”).

If you don’t like this method, you’ll have to encode a separate function for each message you actually process. Then you can simply call as required, but you’ll have to reload the parameters coming into the mainCallback function before doing so (to the extent that each handler needs the information). This, however, carries the extra overhead of setting up and tearing down a function (saving registers, etc.) as well as placing mainCallback's incoming parameters back into the required registers for passing to each message handling function. It all adds up to a lot of extra overhead just for the sake of ritual. It's hardly worth it. Avoiding this extra overhead is the reason for using “in-function handlers” for message processing. Each handler has direct access to the mainCallback local variables (which hold the incoming parameters) because, from the compiler's viewpoint, each handler is still part of mainCallback.

Accessing local variables – in particular, the incoming parameters to mainCallback – is no problem at all. Within each handler, you’re still in the “space” of the mainCallback function, therefore all its local variables are fully intact and accessible.

The lookup table message_table is shown below:

message_table            qword       (message_table_end – message_table_start ) / 8
message_table_start      qword       WM_ERASEBKGND
                         qword       WM_PAINT
                         qword       WM_CLOSE
                         qword       WM_DESTROY
message_table_end        label       byte ; Any size declaration will work; byte is used as the smallest choice

The router list for the callback function is:

message_router     qword     main_wm_erasebkgnd
                   qword     main_wm_paint
                   qword     main_wm_close
                   qword     main_wm_destroy

The WM_ERASEBKGND handler is shown below:

                   align               qword                                   ;
main_wm_erasebkgnd label               near                                    ;

                   mov                 rax, 1                                  ; Set TRUE return
                   byte                0C3h                                    ; Return to caller

WM_PAINT: with DirectX handling all of the drawing for the main client area, the only thing the WM_PAINT handler needs to do is to validate the invalid client area.

First, ensure the ValidateRect function is declared in the externals.asm file:

extrn            _imp__ValidateRect:qword
ValidateRect     textequ     <_imp__ValidateRect>

The WM_PAINT handler consists of a single call to ValidateRect:

                   align               qword                                   ;
main_wm_paint      label               near                                    ;

                   xor                 rdx, rdx                                ; Zero LPRC for entire update area
                   mov                 rcx, hwnd                               ; Set window handle
                   WinCall             ValidateRect, 2, rcx, rdx               ; Validate the invalid area

                   xor                 rax, rax                                ; Set FALSE return
                   byte                0C3h                                    ; Return to caller

Failure to validate the update area will cause a torrent of WM_PAINT messages to flow into the window’s callback function; Windows will continue sending them forever, thinking there’s an update area still needing to be redrawn.

The WM_CLOSE handler destroys the window. The DestroyWindow function needs to be declared in the externals.asm file, or wherever you’re keeping your externals:

extrn             __imp_DestroyWindow:qword
DestroyWindow     textequ     <__imp_DestroyWindow>

The WM_CLOSE handler follows:

                   align               qword                                   ;
main_wm_close      label               near                                    ;

                   mov                 rcx, hwnd                               ; Set the window handle
                   WinCall             DestroyWindow, 1, rcx                   ; Destroy the main window

                   ; Here it's assumed that the call succeeded.  If it failed,
                   ; RAX will be 0 and GetLastError will need a call to figure
                   ; out what's blocking the window from being destroyed.

                   xor                 rax, rax                                ; DestroyWindow leaves RAX at TRUE if it succeeds so it needs to be zeroed here
                   byte                0C3h                                    ; Return to caller

The handler for WM_DESTROY is a notification, sent after the window has been removed from the screen. This is where the DirectX closeouts occur. The local ShutdownDirectX function will be covered in Part IV, which discusses DirectX initialization and shutdown:

                   align               qword                                   ;
main_wm_destroy    label               near                                    ;

                 ; The DirectX shutdown function is commented out below; it
                 ; has not been covered yet as of part III of the series.  It
                 ; will be uncommented and coded in the source for part IV.

                 ; call                ShutdownDirectX                         ; Calling a local function does not require the WinCall macro

                   xor                 rax, rax                                ; Return 0 from this message
                   byte                0C3h                                    ; Return to caller

The callback function is closed out with a single line:

main_callback endp

The only task remaining, to make the code presented so far a complete application, is the actual startup code. WinMain is not used, so the startup code moves directly into creating the main window, then enters the message loop. The entire block of entry code is shown below:

;*******************************************************************************
;
; DEMO - Stage 1 of DirectX assembly app: create main window
;
; Chris Malcheski 07/10/2017

                   include             constants.asm                           ;
                   include             externals.asm                           ;
                   include             macros.asm                              ;
                   include             structuredefs.asm                       ;
                   include             wincons.asm                             ;

                   .data                                                       ;

                   include             lookups.asm                             ;
                   include             riid.asm                                ;
                   include             routers.asm                             ;
                   include             strings.asm                             ;
                   include             structures.asm                          ;
                   include             variables.asm                           ;

                   .code                                                       ;

Startup            proc                                                        ; Declare the startup function; this is declared as /entry in the linker command line

                   local               holder:qword                            ; Required for the WinCall macro

                   xor                 rcx, rcx                                ; The first parameter (NULL) always goes into RCX
                   WinCall             GetModuleHandle, 1, rcx                 ; 1 parameter is passed to this function
                   mov                 hInstance, rax                          ; RAX always holds the return value when calling Win32 functions

                   WinCall             GetCommandLine, 0                       ; No parameters on this call
                   mov                 r8, rax                                 ; Save the command line string pointer

                   lea                 rcx, startup_info                       ; Set lpStartupInfo
                   WinCall             GetStartupInfo, 1, rcx                  ; Get the startup info
                   xor                 r9, r9                                  ; Zero all bits of RAX
                   mov                 r9w, startup_info.wShowWindow           ; Get the incoming nCmdShow
                   xor                 rdx, rdx                                ; Zero RDX for hPrevInst
                   mov                 rcx, hInstance                          ; Set hInstance

; RCX, RDX, R8, and R9 are now set exactly as they would be on entry to the WinMain function.  WinMain is not
; used, so the code after this point proceeds exactly as it would inside WinMain.

; Load the cursor image

                   xor                 r11, r11                                ; Set cyDesired; uses default if zero: XOR R11 with itself zeroes the register
                   xor                 r9, r9                                  ; Set cxDesired; uses default if zero: XOR R9 with itself zeroes the register
                   mov                 r8, image_cursor                        ; Set uType
                   mov                 rdx, ocr_normal                         ; Set lpszName
                   xor                 rcx, rcx                                ; Set hInstance
                   WinCall         LoadImage, 6, rcx, rdx, r8, r9, r11, lr_cur ; Load the standard cursor
                   mov                 wcl.hCursor, rax                        ; Set wcl.hCursor

; Load the large icon

                   mov                 r11, 32                                 ; Set cyDesited
                   mov                 r9, 32                                  ; Set cxDesired
                   mov                 r8, image_icon                          ; Set uType
                   lea                 rdx, LargeIconResource                  ; Set lpszName
                   mov                 rcx, hInstance                          ; Set hInstance
                   WinCall         LoadImage, 6, rcx, rdx, r8, r9, r11, lr_cur ; Load the large icon
                   mov                 wcl.hIcon, rax                          ; Set wcl.hIcon

; Load the small icon

                   mov                 r11, 32                                 ; Set cyDesited
                   mov                 r9, 32                                  ; Set cxDesired
                   mov                 r8, image_icon                          ; Set uType
                   lea                 rdx, SmallIconResource                  ; Set lpszName
                   mov                 rcx, hInstance                          ; Set hInstance
                   WinCall         LoadImage, 6, rcx, rdx, r8, r9, r11, lr_cur ; Load the large icon
                   mov                 wcl.hIconSm, rax                        ; Set wcl.hIcon

; Register the window class

                   lea                 rcx, wcl                                ; Set lpWndClass
                   winCall             RegisterClassEx, 1, rcx                 ; Register the window class

; Create the main window

                   xor                 r15, r15                                ; Set hWndParent
                   mov                 r14, 450                                ; Set nHeight
                   mov                 r13, 800                                ; Set nWidth
                   mov                 r12, 100                                ; Set y
                   mov                 r11, 100                                ; Set x
                   mov                 r9, mw_style                            ; Set dwStyle
                   lea                 r8, mainName                            ; Set lpWindowName
                   lea                 rdx, mainClass                          ; Set lpClassName
                   xor                 rcx, rcx                                ; Set dwExStyle
                   WinCall             CreateWindowEx, 12, rcx, rdx, r8, r9, r11, r12, r13, r14, r15, 0, hInstance, 0
                   mov                 main_handle, rax                        ; Save the main window handle

; Ensure main window displayed and updated

                   mov                 rdx, sw_show                            ; Set nCmdShow
                   mov                 rcx, rax                                ; Set hWnd
                   WinCall             ShowWindow, 2, rcx, rdx                 ; Display the window

                   mov                 rcx, main_handle                        ; Set hWnd
                   WinCall             UpdateWindow, 1, rcx                    ; Ensure window updated

; Execute the message loop

wait_msg:          xor                 r9, r9                                  ; Set wMsgFilterMax
                   xor                 r8, r8                                  ; Set wMsgFilterMin
                   xor                 rdx, rdx                                ; Set hWnd
                   lea                 rcx, mmsg                               ; Set lpMessage
                   WinCall             PeekMessage, 4, rcx, rdx, r8, r9, pm_remove

                   test                rax, rax                                ; Anything waiting?
                   jnz                 proc_msg                                ; Yes -- process the message

                 ; call                RenderScene                             ; <--- Placeholder; will uncomment and implement in Part IV article

proc_msg:          lea                 rcx, mmsg                               ; Set lpMessage
                   WinCall             TranslateMessage, 1, rcx                ; Translate the message

                   lea                 rcx, mmsg                               ; Set lpMessage
                   WinCall             DispatchMessage, 1, rcx                 ; Dispatch the message

                   jmp                 wait_msg                                ; Reloop for next message

breakout:          xor                 rax, rax                                ; Zero final return – or use the return from WinMain

                   ret                                                         ; Return to caller

Startup            endp                                                        ; End of startup procedure

                   include             callbacks.asm                           ;

                   end                                                         ; Declare end of module

As a skeleton program, the demo application is now complete.

As stated in the README.TXT file (in the accompanying .ZIP file), many improvements will be made to the source code for this specific article, so don't spend too much time modifying this code. Not yet. This code was intended to relate concepts and act as a tutorial; as such, its focus was not on efficiency. This will change in Part IV, which will cover initializing DirectX.

In addition, several handlers will be added to the callback function, to create a custom window frame.

Note that DirectX goes absolutely bonkers with constant declarations and nested structures. (DirectX 11 is used, because DirectX 12 only runs on Windows 10.) Because of the sheer volume of typing to move those declarations and definitions over to assembly, they will no longer be detailed in subsequent articles after this one. It's presumed that by now, you get the point and understand the basics of declaring structures, constants, etc. in assembly - with the exception of nesting structures, which will be covered in the next article.

An assembly language app is much like screen printing: MOST of the time involved is in the initial setup - getting your externals declared, your structures defined, etc. All of this is one-time work and can be used for an infinite number of applications without having to be repeated. As this series continues, the accompanying source code will do much of that work for you. Feel free to copy and paste the nasty declarations and definitions as desired so you don't have to research and type them all manually.

The pace will pick up in Part IV, where DirectX will be initialized, shut down when the main window is destroyed, and the message loop will only render a blank scene with no vertices.