The OX Boot Loader

Roger Doss

5.00/5 (5 votes)

Sep 5, 2013

GPL3

11 min read

23689

377

The OX kernel features its own custom boot loader designed to boot a 32 bit protected mode kernel.

Introduction

The OX kernel features its own custom boot loader designed to boot a 32 bit protected mode kernel. The loader is implemented in two stages, stage 1 which is in s1.s and stage 2 which is in s2.s. The first stage is a traditional loader whose code is organized at address 0x7C00 the address at which the PC BIOS will load the stage 1 loader. Stage 1 necessarily consists of 512 bytes of 16 bit assembler that uses the BIOS int 13 interrupt to load stage 2. Stage 2 is a larger 16 bit assembler program consisting of 16348 bytes. Thus, stage 1 and stage 2 combined are 16 * 1024 + 512 = 16896 or 0x4200. The kernel follows and is a 32 bit executable whose maximum size can be 512 KB. The vmox.img contains the s1 stage 1 loader followed immediately by the stage 2 loader followed immediately by the 32 bit ox kernel. The image is then padded with null bytes, value 0x0, out to 1474560 which is the size of a classic 1.44 MB floppy disk. This is needed so that the kernel can be booted from a Virtual Box virtual machine or a Bochs PC emulator using a floppy boot. The program that creates the floppy disk image is supplied with the ox kernel distribution and is called mkboot. Its source is in boot/mkboot.c. There is also a tool for extracting regions of a binary file called get_data.c whose source is in boot/elf/get_data.c. The get_data program can be used to retrieve sections of the vmox.img file. For example:

./get_data vmox.img 0x0 0x200 s1
./get_data vmox.img 0x200 0x4000 s2
./get_data vmox.img 0x4200 0x2746d vmox.boot

The above commands extract the binary files from the vmox.img boot disk. The first retrieves the 512 bytes of the stage 1 boot loader in a file called s1. The second retrieves the 16348 bytes of the stage 2 boot loader in a file called s2. The third retrieves the 32 bit kernel image in a file called vmox.boot. Thus, the arguments to the get_date program are the image file name, followed by the offset in hex in the file of where to retrieve the image, followed by length in hex, followed by the output file to put the binary images in.

The get_data tool can also be used to extract ELF sections from an ELF file. Given a 32 bit statically linked ELF image (as in the vmox kernel), the various section details can be viewed by first using:

readelf -e vmox

Where for example the program headers can be:

  Type              Offset       VirtAddr       PhysAddr  FileSiz    MemSiz  Flg   Align 
  LOAD           0x001000 0x00100000 0x00100000 0x1f8c8 0x1f8c8   R E  0x1000 
  LOAD           0x021000 0x00120000 0x00120000 0x00e70 0x2d9c0  RW 0x1000 
  LOAD           0x0220d4 0x080480d4 0x080480d4 0x00024 0x00024  R     0x1000 
  NOTE           0x0220d4 0x080480d4 0x080480d4 0x00024 0x00024  R     0x4 
  GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RWE 0x4

Note that the image has three loadable sections and that these are the ones that are loaded in memory by the loader. The PhysAddr notes where in image file the sections start and FileSiz indicates there size within the image file. MemSiz is the size of the section in memory. If FileSiz is less than MemSiz, the difference must be zeroed out by the loader. To extract the first section using get_data use the following command:

./get_data  vmox 0x00100000 0x1f8c8 vmox.section1

The stage 1 loader works to load the stage 2 loader and then jumps to the stage 2 loaders starting address in memory. Stage 2 loads the entire region from the floppy drive starting with sector 1 not 0 as sector 0 is the stage 1 loader. One can see from the stage 2 loader, that the kernel's starting base address is:

%define _K_BASE 0x14000

This is the address just after the stage 2 loader has been loaded at memory segment 0x1000 (address 0x10000) thus the 32 bit kernel image is at physical address 0x14000 given 0x4000 (16348 bytes) for stage 2. The kernel then runs from memory segment 0x1000 to 0x9000. Stage 2 thus consists of three main parts:

initialization code for protected mode, re-enabling real mode, enabling a20 line
loading the stage 2 and kernel image from floppy disk
converting the 32bit ELF image into a flat binary and subsequently jumping to the kernel start.

The initialization code is derived from John Fine's boot loader [1] and initializes the processor into protected mode as well as enables the a20 line. The a20 line allows the processor to access memory above one meg. The code to do this is in the second stage boot loader in the file s2.s:

cli 
enable_a20 
enable_pmode 
enable_rmode 
sti

Note that in this loader, NASM assembler macros were utilized to make the code more readable and modular. One can view the details of the macros by searching for them in the sources. Additional GDT setup logic was derived from the boot loader originally developed by Gareth Owen [2] for GazOS. The license for software from [1] and [2] are public domain and GPLv2 respectively. OX boot loader is therefore available as open source under GPLv2 as well. The GDT logic sets up three segments descriptors, one for a NULL segment, and the other for code and data. This is needed by the enable_pmode logic.

Both the first stage and second stage loaders require logic to read from the floppy drive to retrieve the binaries and execute them. The s1 loader loads the s2 loader and that both the s1 and s2 loaders are flat 16 bit binaries. This means executing the code is a matter of copying the code from the floppy drive into the proper location in RAM and jumping to it. The s2 loader; however, has the ability of loading either a 32 bit flat binary kernel or a 32 bit ELF statically linked kernel. ELF of course is the preferred format given that it allows the loader to initialize the kernels .bss segment and uninitialized memory. The code for reading from floppy and loading into RAM is as follows in s1:

%define _S2_LEN     0x21    ; (32 + 1) * 512 == 0x4000 
%define _S2_BASE    0x600   ; stage 2 base address 
%define _S2_LOAD_SEG    0x60    ; segment to place s2 loader

%define _S2_SIGN_OFF    0x3FFE  ; signature offset 

    mov bx,_S2_LOAD_SEG 
    mov es,bx 
    mov ax,1 
    mov cx,[load_len] ; load_len == _S2_LEN == 33, load load_len – 1 or 32 sectors
    mov di,1 

load_s2:
       call sector_read
       inc ax
       mov bx,es
       add bx,32
       mov es,bx

       cmp  ax,cx
       jne  load_s2

     call turn_off_floppy

     mov ax,[_S2_BASE + _S2_SIGN_OFF]
     cmp ax,0xAA55
     jne .load_err

     mov si,BOOT_MSG            ; display loader msg
     call bprint
     exec_bin_kernel _S2_BASE

Note that the load_s2 code requires a macro called sector_read which uses the BIOS int 0x13 to read the floppy drive. The code reads sequentially from the drive and copies the data into RAM. Then the macro exec_bin_kernel _S2_BASE jumps to that address in memory immediately executing the stage 2 loader.

The stage 2 loader has the following logic to load the kernel using a more advanced macro to load from segment 0x1000 to segment 0x9000:

%define _K_LEN                 0x9000         ; 512kb == 0x90000 / 0x10
%define _K_BASE      0x14000        ; address where kernel loaded
%define _K_LOAD_SEG       0x1000         ; segment where kernel loaded

%define _K_END_SEG          _K_LOAD_SEG + _K_LEN

%define _S1_BASE     0x7C00         ; stage 1 base address

%define _S2_CURR_SECT     0x1     ; chs == lba 0x21 == 33
%define _S2_CURR_HEAD    0x0
%define _S2_CURR_TRACK  0x0
%define _S2_NR_SECT         0x12   ; 18 sectors per track

read_tracks _S2_CURR_SECT, _S2_CURR_HEAD, _S2_CURR_TRACK,_K_LOAD_SEG,_K_END_SEG,_S2_NR_SECT,boot_drive

Where the macro read_tracks does the heavy lifting. Note that we start the load of the 32 bit kernel at address 0x10000 (segment 0x1000) and continue to address 0x90000 (segment 0x9000). Thus we have 0x90000 – 0x10000 = 0x80000 bytes available for the kernel less the fact that the kernel actually starts at address 0x14000 and therefore there are 8 * 16 ^ 4 – (4*16 ^ 3) bytes or 507904 bytes available for the 32bit kernel. As an optimization, there is no need to start loading the 32 bit kernel by reading from the first sector on disk, we could read starting from offset 0x4200 on the floppy drive as this loader actually loads the stage 2 loader twice. However, for the purposes of the current OX kernel this is OK, as the kernel is about 177773 bytes and stripped, it is about 140032 bytes. The kernel currently includes a file system and memory allocator as well as rudimentary process handling (e.g., fork, exec, scheduling).

Unlike the load of the stage 2 loader, loading the 32 bit kernel requires the loader to parse and relocate the kernel in RAM prior to jumping to it. Effectively, what has to happen in 16 bit unreal mode, is that the kernel image has to be converted from 32bit ELF to 32bit flat binary with proper zeroing out of the .bss segment. In this process, the kernel segments are relocated to a physical offset that the kernel will run from. These locations are important as the memory allocator inside the kernel needs to be aware of which physical memory locations belong to the kernel. The code for converting the kernel to a flat binary form ELF is done in the following macro call:

mov edx,_K_BASE 
exec_elf_kernel edx,nr_sections,kernel_start

After this call, the kernel has been relocated to its starting place in memory and the address of its start routine _start is stored in kernel_start variable. To see the value of kernel_start at the command line, run:

nm vmox | grep _start

or use readelf vmox -e And look for "Entry point address:" in the output. These values are given in hex. In the s2.s sources, you can print the address using:

mov eax, [kernel_start]
print_reg eax

which will also print the entry point. These must match in order for the load to work as this is the value in ELF of e_entry which is the C _start routine.

Note that in C, the starting function is not main, it is a function called _start which in the case of a kernel contains logic for initializing the kernel. In normal C programs, it contains initialization code for starting up a process to run on the native operating system. The exec_elf_kernel macro uses the ELF header to find the loadable sections and move them into memory. If the section has a file size smaller than memory size, the difference is zeroed out. Since this is happening in so-called unreal mode, the process is in 16bit mode with the a20 line enabled so that 32 bit memory access is possible. As an aside, it is important to note that the memmove implementation used by the exec_elf_kernel in the stage 2 loader had a bug in it where it couldn't correctly load a kernel larger than a 16bit segment as the loop instruction was not properly setup to use the ecx register. To fix this, the assembler directive 'a32' was added to have the assembler generate 16 bit looping logic that uses ecx not cx for its counter. This is why there is effectively two sources for s2.s, with one designated s2.s.DEBUG which was a debug version used to find this issue. Debugging a loader is difficult as there is no debugger that works at that level so print statements must be used to output register contents and trace through the program to see its progress. The .DEBUG file will help if this is needed in the future.

The load places each section at the physical address determined by the compilation and linking of the 32 bit kernel. Thus, the kernel is linked using the following directives:

cc $(OBJS) -o vmox -m32 -nostdinc -nostdlib -nostartfiles -nodefaultlibs -static -Ttext 0x100000

The directive -m32 compiles to 32 bits (as the development machine is now 64bits). The directives -nostdinc -nostdlib -nostarfiles -nodefaultlibs instruct the compiler not to use the standard includes, standard libraries, standard startup files, and default libraries. This is required since the kernel can not use user level libraries. The directive -static link the executable without any shared objects. The directive -Ttext 0x100000 instructs the linker to organize the code/data relative to this physical address in memory which is the first one meg of ram. This address offset is used by the exec_elf_kernel macro to place the kernel at that offset in memory. The actual space used can be computed using:

size vmox

which may return:

text                   data            bss              dec              hex                     filename 
 129257           3696          183104         316057         4d299         vmox

Note that the total is 0x4d299 or 316057 in decimal. The bss segment is the largest component and is most directly affected by the variable BLOCK_ARRAY_SIZE which is the size of the buffer cache in the file system. See file include/ox/fs/block.h for more details. Knowing the size of the kernel in RAM and its location at 0x100000 the memory allocator can start assigning to user and dynamic kernel memory users RAM pages at an address greater then 0x100000 + kernel_memor_size (0x4d299). For example, starting at address 0x200000 would likely be a could place to start.

Once the kernel_start address is obtained and the kernel has been converted from ELF to a flat binary and relocated to its starting address in physical RAM, the second stage loader then re-enables protected mode and jumps to the kernel_start address. This effectively executes the 32bit kernel.

An example 32 bit "bare bones" kernel can be as follows:

Basic Kernel

/*
 * Test kernel
 * for loader.
 */

void main ( void );

/* without C startup code,
 * we need to write our own _start
 * entry point
 */
void
_start( void )
{
         main();

}/* start */

char *message = "--> Executing test kernel <--
";

/* Force a large .bss segment for testing the loader. */
#define D_SIZE 0x76a0b
char data[D_SIZE] = {0};

void
main ( void )
{
     char *vram = (char *)0xB8000;

     while(*message) {
       *vram++        = *message++;
       *vram++        = 0x7;
     }

     for( ; ; )
        /* idle */;

}/* main */

Note that the kernel simply prints "--> Executing test kernel <-- " by writing directly to Video RAM at address 0xB8000 and then idles the CPU. It has a large .bss segment because of a large character array that is requested this is to test the ELF loader's ability to zero out those bytes, but otherwise, this has no impact.

Loading in Virtual Box

Testing low level programming can now be done using virtual machines. This removes the requirement of running the software on real hardware and potentially making a mistake on real hardware that may cause undesirable consequences such as damaging the data on a hard drive. Since the OX boot loader loads from floppy we can actually simulate a floppy boot using Virtual Box. Given the image file vmox.img, follow the instructions in [3]. The instructions walk you through how to select a floppy drive media for boot and Virtual Box should just read the image from the drive and load it in the virtual machine.

References

Fine, J. S. (1999). Protected mode programming examples, system utilities, building embedded systems. Retrieved April 14, 2013, from http://geezer.osdevbrasil.net/johnfine/index.htm
Gareth, O. (1999). GazOS. Retrieved May 1999, from http://gazos.sourceforge.net/
Hoffman, C. (2013). How Do I Use a Floppy Disc Image in VirtualBox ? Retrieved April 14, 2013, from http://www.ehow.com/how_8456703_do-floppy-disc-image-virtualbox.html