QA ASM Visual Studio 2013 VC++Design / Graphics Architect Advanced C Dev Windows C++

sprintf_s: Speed Bumps Ahead

David A. Gray

4.95/5 (18 votes)

Jun 1, 2015

CPOL

33 min read

29359

157

This article documents issues that I have identified in the new secure overloads of the CRT buffered print routines.

Download SecurePrintFHazard.ZIP - 767.4 KB

Introduction

A few weeks ago, I upgraded a program that I initially developed using Visual Studio 6, and wrote entirely in C++, to use the new CRT library that ships with Visual Studio 2013. Since the Security Development Lifecycle checks are enabled by default (even in a project that is an upgrade from Visual C++ 6), the first compiler log flagged all of its many calls to swprintf [1] as potentially insecure, and recommended replacing them with calls to swprintf_s [2].

Although the Visual C++ team thoughtfully provided convenience macros to ease the transition, they cannot be used unless the output buffer is a static array [3]. Unfortunately, all but two of the swprintf calls in question use static buffers that are accessed through pointers. Nevertheless, those weren’t the problem, because the DLL that owns the buffers exports companion functions that return their sizes.

The problem arose with the two others, both of which use smaller buffers, also accessed through pointers. Alas, when I added the sizeOfBuffer argument to them, I habitually specified the same size as the bigger buffers exported by the DLL. Suddenly, I had a very unexpected, and ugly, buffer overflow exception. What happened?

Background

Since the overwritten memory included the stack frame of the function that owned the buffer, the immediate cause was obvious, but why was a CRT routine creating a buffer overflow? The answer lay deep in the new code that differentiates swprintf_s from its legacy processor, swprintf. To eliminate distractions, I wrote a very simple test program that allocated a static buffer on the local stack, as did the program in which the problem arose, and called it from a loop that varies the value to use for the sizeOfBuffer argument to swprintf_s. Table 1 lists 4 cases, though only the last test matters to the issue at hand, though case 1 deserves a word.

Table 1 is a summary of the test cases implemented by the demonstration program.

Case	sizeOfBuffer	Relative to szBuffer	Test Outcome
1	128	Smaller	Since the test message contains more than 128 characters, causing _set_invalid_parameter_handler to be invoked [4].
2	255	Smaller	Since the test message fits comfortably, printing succeeds without causing a buffer overflow.
3	260	Same	The outcome is the same as for case 2.
4	384	Bigger	Output is formatted, and the subsequent call to _tprintf to display it on the console succeeds. The overflow isn’t caught and reported until the test routine attempts to return to the loop in the main routine.

Cautious single step debugging identified the culprit, deep in the CRT library. A new feature of swprintf_s is macro _SECURECRT__FILL_STRING, which hides a call to memset. However, it isn’t in swprintf_s, itself; you must drill down a level, to _vswprintf_s_l, and follow that routine almost to its very end.

The last statement in _vswprintf_s_l (Listing 2) that does anything significant is implemented as a macro, _SECURECRT__FILL_STRING(string, sizeInWords, retvalue + 1), shown in Listing 3, that expands into a call to CRT function memset. The prototype of memset is as follows.

void *memset( 

   void *dest, 

   int c, 

   size_t count 

);

In the context of _vswprintf_s_l, the argument values are as follows.

dest = the byte just past the null character that terminates the output written starting at the address given by string

c = the fill character, expressed as an integer, 0xfe

count = the number of bytes to fill with character c, starting at address dest, given by the formula discussed next

The third argument to memset, which specifies the number of bytes to write, is a ternary expression, of which the significant part is the false block that follows the colon: ((_Size) - (_Offset))) * sizeof(*(_String)).
Substituting the macro arguments into the expression yields the following expression, which becomes part of the C code that replaces the macro: ((sizeInWords) - (retvalue + 1)))) * sizeof(*(string)), where:
- sizeInWords = sizeInWords (buffer size) expressed in characters (TCHARs)
- retvalue = characters written by _vswprintf_helper (the workhorse of the formatted printing routines), also expressed in characters (TCHARs), which eventually becomes the return value of swprintf_s.
- Since _UNICODE is defined, sizeof(*(string)) corresponds to sizeof ( TCHAR ), which is equal to 2.

The following example uses actual values from the test program, which should make it a lot easier to visualize what happens.

Table 2 lists actual values taken from notes made during a careful debugging session, which are the values used in the example that follows, in which the string value plays no role.

base	string	sizeInWords	retvalue	*sizeof (string)**
Decimal	4,454,748	384	148	2
Hexadecimal	0x0043f95c	0x00000180	0x00000094	0x00000002

The following example uses the decimal values shown in Table 2.

Macro expression	`((_Size) - (_Offset))) * sizeof(*(_String))`
Expanded expression	`((sizeInWords) - (retvalue + 1)))) * sizeof(*(string))`
Substituting values	`((384) - (148 + 1)))) * 2)`
Evaluation, step 1	`(384 – 149) * 2`
Evaluation, step 2	`235 * 2`
Result	`470`

Contrast this result with the actual number of slack bytes in the buffer.

Macro expression	`((_Size) - (_Offset))) * sizeof(*(_String))`
Expanded expression	`((sizeInWords) - (retvalue + 1)))) * sizeof(*(string))`
Substituting values	`((260) - (148 + 1)))) * 2)`
Evaluation, step 1	`(260 – 149) * 2`
Evaluation, step 2	`111 * 2`
Result	`222`

To summarize, the size of the buffer overrun is 248 bytes, more than enough to trample the stack frame that sits above it.

Slack space computed based on invalid size argument of 384................... 470

Slack space computed based on actual buffer size of 260............................ 222

Amount of overrun....................................................................................... 248

Proof: Invalid buffer size............................................................................................ 384

Correct buffer size............................................................................................ 260

Excess TCHARs..................................................................................... 124

Size of TCHAR......................................................................................................... 2

Overrun, in bytes............................................................................. 248

The code shown in Listing 1 through Listing 3 is copied verbatim from the CRT library source files that ship with Microsoft Visual Studio 2013. Their default installation directory is C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src. Function _vswprintf_s_l and the other formatted printing routines call upon one function, _vswprintf_helper, to process the format control string and optional arguments. Since that routine is long, complex, and has no bearing on the buffer overflow, I omitted it from these listings. If you are curious, it is also in the CRT source files, in vswprint.c.

To keep the source listings close to these examples, the narrative resumes below Listing 3.

int __cdecl swprintf_s (

        wchar_t *string,

        size_t sizeInWords,

        const wchar_t *format,

        ...

        )

{

    va_list arglist;


    va_start(arglist, format);


    return _vswprintf_s_l(string, sizeInWords, format, NULL, arglist);

}

Listing 1 is all of function swprintf_s, which creates a private reference to the optional arguments that follow the format control string, then returns through _vswprintf_s_l.

int __cdecl _vswprintf_s_l (

        wchar_t *string,

        size_t sizeInWords,

        const wchar_t *format,

        _locale_t plocinfo,

        va_list ap

        )

{

    int retvalue = -1;


    /* validation section */

    _VALIDATE_RETURN(format != NULL, EINVAL, -1);

    _VALIDATE_RETURN(string != NULL && sizeInWords > 0, EINVAL, -1);


    retvalue = _vswprintf_helper(_woutput_s_l, string, sizeInWords, format, plocinfo, ap);

    if (retvalue < 0)

    {

        string[0] = 0;

        _SECURECRT__FILL_STRING(string, sizeInWords, 1);

    }

    if (retvalue == -2)

    {

        _VALIDATE_RETURN(("Buffer too small", 0), ERANGE, -1);

    }

    if (retvalue >= 0)

    {

        _SECURECRT__FILL_STRING(string, sizeInWords, retvalue + 1);

    }


    return retvalue;

}

Listing 2 is every line of function _vswprintf_s_l, which is also deceptively simple, but includes macro _SECURECRT__FILL_STRING, which is the source of the overrun. If it succeeds, _vswprintf_helper returns the number of characters actually written into the output buffer, excluding the trailing null character, for which macro _SECURECRT__FILL_STRING compensates by adding 1 to it. The first and second arguments, string and sizeInWords, are passed along unchanged from swprintf_s.

#if !defined (_SECURECRT_FILL_BUFFER_THRESHOLD)

#ifdef _DEBUG

#define _SECURECRT_FILL_BUFFER_THRESHOLD __crtDebugFillThreshold

#else  /* _DEBUG */

#define _SECURECRT_FILL_BUFFER_THRESHOLD ((size_t)0)

#endif  /* _DEBUG */

#endif  /* !defined (_SECURECRT_FILL_BUFFER_THRESHOLD) */


#if _SECURECRT_FILL_BUFFER

#define _SECURECRT__FILL_STRING(_String, _Size, _Offset)                            \

    if ((_Size) != ((size_t)-1) && (_Size) != INT_MAX &&                            \

        ((size_t)(_Offset)) < (_Size))                                              \

    {                                                                               \

        memset((_String) + (_Offset),                                               \

            _SECURECRT_FILL_BUFFER_PATTERN,                                         \

            (_SECURECRT_FILL_BUFFER_THRESHOLD < ((size_t)((_Size) - (_Offset))) ?   \

                _SECURECRT_FILL_BUFFER_THRESHOLD :                                  \

                ((_Size) - (_Offset))) * sizeof(*(_String)));                       \

    }

#else  /* _SECURECRT_FILL_BUFFER */

#define _SECURECRT__FILL_STRING(_String, _Size, _Offset)

#endif  /* _SECURECRT_FILL_BUFFER */


#if _SECURECRT_FILL_BUFFER

#define _SECURECRT__FILL_BYTE(_Position)                \

    if (_SECURECRT_FILL_BUFFER_THRESHOLD > 0)           \

    {                                                   \

        (_Position) = _SECURECRT_FILL_BUFFER_PATTERN;   \

    }

#else  /* _SECURECRT_FILL_BUFFER */

#define _SECURECRT__FILL_BYTE(_Position)

#endif  /* _SECURECRT_FILL_BUFFER */

Listing 3 is macro _SECURECRT_FILL_BUFFER and its dependents, which are defined in internal.h. (Look for internal.h in the directory where your CRT source code is installed. Mine is in C:\Program Files (x86)\Microsoft Visual Studio 12.0\VC\crt\src\.)

Anatomy of a Buffer Overflow

In most cases, it is enough to know that arguments passed into functions and automatic variables are allocated somewhere on the stack, and their machine addresses are not especially important. In this case, it matters a great deal, and the order in which variables are are defined has significant consequences, and exposes the risk of using automatic variables as buffers.

For the benefit of readers who aren’t thoroughly familiar with how the C and C++ compilers assigns memory to arguments and variables, the following section offers a brief illustrated tutorial.

Please feel free to skip the next section (Kaboom!) if you know this cold.

Memory for Arguments and Automatic Variables

Management of function arguments (parameters) and automatic variables in Windows programs revolves largely around two CPU registers, while four others play minor parts, summarized in Table 3.

Table 3 summarizes the roles played by CPU registers in managing argument lists and automatic variables.

Abbr.	Full Name	Role
EBP	Extended Base Pointer	Within each function, the addresses of its arguments and local (automatic) variables are usually encoded as offsets relative to the address stored in this register, which lies somewhere within the address space reserved for the stack.
ESP	Extended Stack Pointer	The stack pointer is primarily used to get function arguments and return addresses on and off the stack. A secondary, but related role of the stack pointer is to mark off the boundaries of a function’s working storage, from which its automatic variables are allocated.
ESI	Extended Source Index	In the function body of a debug build, the ESP register is saved for sanity checking the stack pointer when control returns from a function that follows the `__cdecl` calling convention.
EDI	Extended Destination Index	This register plays two roles. ¨ In the function prologue of a debug build, the ESI register is used, along with the ECX register (discussed next) to initialize the memory that is being set aside for its automatic variables. ¨ It fulfills the roles usually fulfilled by the ESI register when a function call is nested inside another function call, since ESI is tied up tracking the stack frame for the outer function.
ECX	Extended Counter	In the prologue of a debug build, the ECX register is loaded with the number of machine words that are being reserved for the function’s automatic variables, which tells the `rep stosd` instruction that initializes it when to stop. When an instance method is called, a pointer to the object (the ubiquitous this variable) goes into the ECX register immediately before the method is invoked via the call instruction. Since the prologue of a debug build needs ECX, it is the last item pushed onto the stack before the memory initialization code is set up and run. Following the initialization, ECX is popped off the stack, and then a copy is saved into the very top of the function’s automatic storage area.
EIP	Extended Instruction Pointer	Throughout its run, the EIP register points to the instruction that is about to execute. Since the calling routine expects to resume where it left off when the function returns, before it transfers control to the first instruction of the called function, the call instruction pushes the address of the instruction that immediately follows the call onto the stack. The call, itself, is executed by jumping to the address of the first instruction in the called routine. Like any other jump, this changes the EIP to refer to the instruction at that location.
EAX	Extended Accumulator	Regardless of which of several common calling conventions a function follows, almost all functions place their return value into the EAX register. Both of the most common conventions, `__cdecl` and `__stdcall`, return through EAX.

The stack is just a big block of memory, allocated to the process by the loader when it starts, and mapped to an address well above the program’s code. By default, most applications get a one megabyte stack, which sounds like a lot, until you realize how many things go into it. When a process starts, its stack pointer (ESP) points to the highest address in the space reserved for the stack.

Since the stack pointer points to its top address, its value decreases as things are added to the stack (by pushing them onto it) and increases when items are removed from it (by popping them off).

When it is first allocated to a process, the stack pointer (ESP) and base pointer (EBP) point to the same location, but this state of affairs is short lived. The first thing that any routine does is push the current value of EBP onto the stack, then set EBP to the new value of ESP, which is 4 bytes less than it was before the push executed.
The EBP register doesn’t change again until another routine is called, when the process just described is repeated. This process is repeated each time your code dives deeper into its lower level routines, makes a system call, or invokes a runtime library routine, such as printf. Since many library routines use helper routines, calls can go much deeper in a hurry.
There are two circumstances that cause the stack pointer (ESP) to change during the lifetime of a function.
- The prologue decreases its value by the number of bytes that it needs to reserve for automatic variables. The effect of this adjustment is that subsequent additions to the stack happen beneath the local storage used by the function, preventing subsequent uses of the stack from overwriting its data.
- When one function calls another, the arguments, if any, are pushed onto the stack, working from right to left as they appear in its prototype, so that the first argument goes on last. For example, when you call printf with a format control string and a series of variables to substitute into it, the format string goes onto the stack last. As described previously, once the last argument is on the stack, the call instruction pushes the address of the instruction that immediately follows it, and hands control off to the called routine.
As each function completes its work and prepares to return, the processes that happened during its prologue are reversed. Items are popped off the stack in reverse order, and the decrement that moved the stack pointer below the caller’s reserved code is reversed by adding the same amount to the stack pointer. Finally, the function copies its own base pointer into the stack pointer, which then points to the location where the caller’s base pointer was pushed. It is then popped off, and the function returns. If the calling convention is __stdcall, the return instruction has one modifier, which tells it how many bytes to add to the stack pointer to account for the function’s arguments. Otherwise, the return simply pops off the return address, which becomes the target of a reverse jump.

Important: Although the arguments and return address are not actually removed from the stack, increasing the stack pointer as each function returns to its caller conserves the finite space reserved for the stack, which is reused for subsequent function calls.

Memory for Automatic Variables

The preceding section alluded to a block of memory set aside by the function prologue for use by its local (automatic) variables. The last concept to be grasped in order to understand why the buffer overflow happened is how memory from this block is assigned to variables.

Since it uses memory allocated from the stack, it is not very surprising to learn that variable assignments begin at the top, and work down, so that the address of each new variable is lower than that of the one defined above it. Significantly, address assignments are made as soon as a variable is defined, even if initialization is deferred, as it is in the case of the second variable, szBuffer, shown in Listing 4. This is necessary because the compiler must avoid assigning another variable to the same address, or there would be serious and frequent chaos. Table 4 shows how this plays out for the local variables defined at the top of the demonstration routine in the included sample application, Exercise_stprintf_s. Especially noteworthy is that the address of szBuffer is 528 bytes below that of rintResult. The reason for the large gap is that it needs 520 bytes, enough room for MAX_PATH wide characters, plus the 8 byte buffer left by the compiler between variables.

    int rintResult = SPH_TEST_SUCCEEDED;


    TCHAR szBuffer [ SPH_BUFFER_ACTUAL_SIZE ] ;


    INT32 intNumericVariable1of2 = SPH_NUMERIC_VARIABLE_VALUE_1 ;

    INT32 intNumericVariable2of2 = SPH_NUMERIC_VARIABLE_VALUE_2 ;


    _invalid_parameter_handler oldHandler , newHandler ;

oldHandler = _set_invalid_parameter_handler ( newHandler ) ;

Listing 4 is the local variables that are defined at the top of function Exercise_stprintf_s, giving them function scope.

Table 4 lists the machine addresses of the local (automatic) variables defined by the test function, Exercise_stprintf_s, and reported by it. Note that the value of the stack pointer is lower than the base pointer by 864, which is 28 bytes more than the space reserved by its prologue. The extra space is occupied by hidden data structures used to manage its exception handlers.

Machine Addresses			Contents
Label	Hexadecimal	Decimal	Hexadecimal	Decimal
Base Pointer (EBP)	0x0031FAF0	3,275,504
Stack Pointer (ESP)	0x0031f78c	3,274,636
rintResult	0x0031fad4	3,275,476	0x00000000	0
szBuffer	0x0031f8c4	3,274,948
intNumericVariable1of2	0x0031f8b8	3,274,936	0x00001000	4,096
IntNumericVariable2of2	0x0031f8ac	3,274,924	0x0000ffff	65,535
NewHandler	0x0031f894	3,274,900	0x010010B9	16,781,497
oldHandler	0x0031f8a0	3,274,912	0x00000000	0

Kaboom!

Taking into account where the output buffer intended for use by swprintf is allocated, the puzzle almost solves itself. When the code generated by macro _SECURECRT__FILL_STRING (Listing 3) calls on memset to backfill the buffer, it uses the capacity of the buffer that it is told, through the new SizeInWords argument to derive the number of bytes to use for the count argument of memset. Like any good program, memset obeys its master, filling the specified number of bytes from the end of the text written into it by swprintf. What happens next is made painfully clear by the last column in Table 5. Since the backfill value is 0xfe (decimal 254), the outcome is enough to cause the machine to contain the damage by forcibly killing the application. The specific message is, “Run-Time Check Failure #2 - Stack around the variable 'rintResult' was corrupted.”

Table 5 summarizes the relationship between the asserted buffer size and the stack frame that sits above the buffer, and holds the argument list, and points the way back to the caller.

Test Case	1	2	3	4
Address of `szBuffer`	3,274,948	3,274,948	3,274,948	3,274,948
Asserted buffer size (`TCHAR`s)	128	255	260	384
Size of 1 `TCHAR`	2	2	2	2
Asserted buffer size (bytes)	256	510	520	768
Actual Buffer Size	520	520	520	520
Underrun or Overrun	-264	-10	0	248
Headroom	36	36	36	36
Overlap Beyond Headroom				212

Thankfully, the buffer overflow is easy to spot in the Visual Studio debugger, though you will need to display the Memory window, and enter the address of the buffer to see it. In case you’ve missed it, the memory window is accessible during any debugging session by pressing ALT-6 (that is, the ALT key and the numeral 6 key on the top row of your keyboard).

Output Buffer, showing output followed by backfill

Figure 1 shows the overrun buffer as it appears in the Visual Studio 2013 debugger. The legitimate text is in the top portion of the buffer, followed by the backfill. For the eagle eyes among you, the machine address shown here differs from that shown in the other examples, because my development machine has EMET installed and configured to enforce mandatory Address Space Layout Randomization (ASLR).

When the code executes from the desktop or a command prompt, the error report comes in the form of the large, rather ugly message box shown in Figure 2. Since the message box is displayed with its Application Modal flag disabled, you can get an unobstructed view of the output window shown in Figure 3. This is very handy, since the default action is Abort, which promptly terminates the program, causing its output to disappear if it launched from a desktop.

Buffer overrun message from debug build, run from command prompt

Figure 2 is the message box that reports the fatal error when you run the debug build from a command prompt.

Command Window, showing evidence of the buffer overflow, which overwrote the argument list

Figure 3 is the command window, which can be activated because the message box is displayed with its application modal flag switched off. The bogus test number is further evidence of the buffer overrun, since the message should read, “Test # 4 Done.”

Important: The release build of swprintf doesn’t backfill, because the release version of the _SECURECRT__FILL_STRING macro is null (That is, it generates no code.).

There are two reasons that I was relieved to discover that a retail build doesn’t backfill buffers.

Setting the buffer size too high doesn’t cause an overrun.
Backfilling wastes processor cycles and time.

As with most such engineering matters, this, too, is a compromise.

¨ Although no backfilling occurs, if a print operation uses more space than is actually reserved for the buffer, you can still get a buffer overrun. If you’re lucky, the overrun will cause a spectacular crash.

¨ On the plus side, even the retail build of the new overloads of swprintf and its cousins fail, reporting a trappable error, if the specified size indicates that the buffer is too small to accommodate the formatted output. There are two ways to detect this error.

The value returned by swprintf is -2, which can be evaluated without creating a scratch variable by wrapping the function call in a switch or if statement.
When the buffer is too small, swprintf invokes the _invalid_parameter_handler routine. The CRT library provides a default invalid parameter handler that raises an assert in a debug build, and fails silently in a release build. However, a program, or one of its functions, can install its own handler. I did so, and its output, when it executes in a retail build, is at the bottom of Listing 5. The output generated in a debug build, shown in Listing 6, is a bit more useful.

Begin Test # 1: Asserted buffer size = 0x00000080 (128 decimal):


    Buffer Address                   = 0x003efc50 (4127824 decimal)

       Actual Size (bytes)           = 0x00000208 (520 decimal)

       Actual Size (TCHARs)          = 0x00000104 (260 decimal)

       Actual Top                    = 0x003efe58 (4128344 decimal)


    Numeric Variable 1 of 2: Address = 0x003efc4c (4127820 decimal)

                             Value   = 0x00001000 (4096 decimal)


    Numeric Variable 2 of 2: Address = 0x003efc48 (4127816 decimal)

                             Value   = 0x0000ffff (65535 decimal)


    Base Pointer (EBP)               = 0x003efe5c (4128348 decimal)

    Stack Pointer (ESP)              = 0x003efc34 (4127796 decimal)


    ERROR: Invalid parameter detected in function (null).

           File:       (null)

           Line:       0

           Expression: (null)


    ERROR: Nothing printed!



    Test # 1 Done

    Total characters printed by last output statement = -1

    Outcome of test # 1 = Success

Listing 5 is the output of the first of the four tests that uses swprintf, which fails because the specified buffer size of 128 characters is too small, by 21 characters, to accommodate the formatted output and its terminal null character, which requires a buffer size of at least 149.

    ERROR: Invalid parameter detected in function _vswprintf_s_l.

           File:       f:\dd\vctools\crt\crtw32\stdio\vswprint.c

           Line:       280

           Expression: ("Buffer too small", 0)

Listing 6 is the output generated by my invalid parameter handler when it runs in a debug build. The output of the debug version is considerably more useful, although it still leaves a lot to be desired. Nevertheless, compared to the output generated by the same routine when it runs in a release build, shown in Listing 5, its first line gives you a place to start.

Using the Code

The demonstration project is the program that generated all of the output shown in the foregoing tables and listings. Since only the debug build exhibits the buffer overrun, its output directory (the \Debug directory off the main project directory) deserves the most attention. Nevertheless, I left the retail build, so that you can quickly see for yourself that it completes without incident.

The first time you open the project in Visual Studio, the size of the solution directory will balloon when the IntelliSense data base file, SecurePrintFHazard.sdf, is regenerated. To reduce the overall size of the package, I deleted it from the distribution package, because Visual Studio recreates it from scratch when it is missing.

Unlike many of my projects, including the demonstrations for the last two articles I wrote about C++ applications, this solution is completely self-contained. However, there are a few things that I must call to your attention.

Mixed Languages: The modules that comprise this project target three distinct programming languages, each with its own compiler.
1. The two main modules, SecurePrintFHazard.cpp and Exercise_stprintf_s.CPP, are implemented in C++.
2. One of two helper routines, ProgramIDFromArgV, defined in module ProgramIDFromArgV.C, is imported from another project, and is implemented in straight ANSI C.
3. The other helper routine, CPURegisterPeek, defined in module CPURegisterPeek.ASM, is written in assembly language, and must be assembled by a downlevel assembler, MASM 6.11. The need for the dwonlevel assembler is explained in item 4 below.
No precompiled headers: Due to the unavoidable overlap in header usage between the C and C++ modules, precompiled headers are impractical. Since this project contains only 3 modules that target the C/C++ compiler, the whole project builds from scratch in only a few seconds, and they aren’t missed.
No stdafx.h: Since I dispensed with precompiled headers, I renamed stdafx.h to SecurePrintFHazard.h. This is something that I almost always do, to remind myself that precompiled headers are disabled. Concurrently, I delete stdafx.cpp, which generates a fatal compiler error and fails the project build if you use the file explorer to delete it, and forget to remove it from the Solution Explorer before your next attempt to build the solution.
CPURegisterPeek.ASM is incompatible with MASM 12.0.31101.0: I used the copy of Microsoft ® Macro Assembler Version 6.11 that I have installed on an older machine to assemble it. The newer assembler emitted error A2071: initializer magnitude too large for specified size, calling out the endp directive at the bottom of the source file although I suspect the real issue is the expression at line 167, _ARG_UPPER_LIMIT equ ( __REG_INDEX_END - _REG_INDEX ) / _SIZEOF_DWORD. However, since the older assembler assembled a provably correct version, I didn’t investigate further. Today, I removed CPURegisterPeek.ASM from the solution, so that you and I don’t have to deal with it when the build engine decides that it needs to rebuild from scratch. This is more common than you might guess; it happens when anything in the project configuration changes.

Points of Interest

The main routine has the generic TCHAR mapped name _tmain and the standard two-argument signature, and is defined in module SecurePrintFHazard.cpp. Following four static arrays that require no further explanation is the first executable block, which is well protected by guard code that ensures that it is excluded from the compilation unless both preprocessor symbols _DEBUG and _PROGRAMIDFROMARGV_DBG are defined. I could have condensed the guard code into a single line on each side, but I didn’t, since the outer test was an afterthought.

    #if defined ( _DEBUG )

           #if defined ( _PROGRAMIDFROMARGV_DBG )

                  #pragma message ( "Preprocessor symbol _PROGRAMIDFROMARGV_DBG is defined.")

                  DebugBreak ( );

           #else

                  #pragma message ( "Preprocessor symbol _PROGRAMIDFROMARGV_DBG is UNdefined.")

           #endif /* #if defined ( _PROGRAMIDFROMARGV_DBG ) */

    #endif /* #if defined ( _DEBUG ) */

Listing 7 is the guard code around the call to Windows API routine DebugBreak, which I used to help me coerce a debug build of the program started from a command prompt into the Visual Studio debugger.

Next comes the first call into an application defined function, ProgramIDFromArgV, which extracts the base name of the program, SecurePrintFHazard, from the name by which it was invoked, which it receives in the form of a pointer to the first element in the argv array. Since it is unrelated to the subject at hand, I leave its analysis as an exercise for insatiably curious readers.

The main body of the routine is the switch statement nestled inside the two nested for loops shown in Listing 8. The outermost of the two for statements defines and uses unsigned integer uintOutputMethod to iterate the elements of two-element array s_enmOutputMethod. This array is populated with one each of the nonzero members of the SPH_OUTPUT_METHOD enumeration.

The innermost of the two for statements defines and uses uintTestIndex, another unsigned integer, to iterate the s_auintAssertedSizes array, a collection of unsigned integers representing buffer sizes to be passed, in turn, into wsprintf on the second iteration of the outer loop.

The main thing that happens within the innermost loop is a call to the other major application defined function, Exercise_stprintf_s, which is the first, and principal, routine defined in Exercise_stprintf_s.CPP. The only task remaining for the inner loop is to use the value returned by Exercise_stprintf_s to determine which of three messages to display about the outcome. Other than calling to your attention that the calls to wprintf are made through its generic text mapping, the print statements are unremarkable.

Since I do my best to avoid calculating anything more than once, uintTestNumber is defined and used to store the ordinal test number (used twice), which starts at one, even though deriving it from the index of the inner loop, an array index that starts at zero, is trivial. All that remains to be said about the main routine is that, although calculation of the limit values of the two loops is nominally data driven, they appear as constants in the emitted code, because the calculation depends entirely on values that are known at compile time, and the compiler performs them and writes the answers into the generated code as constants.

In general, this is true of any value that is expressed as either sizeof a variable or type, or an expression composed entirely of such expressions and basic arithmetic operators. This concept plays a key role in in Exercise_stprintf_s, too, as well as a great many of the macros that I usually employ.

    for ( unsigned int uintOutputMethod = 0 ;

                       uintOutputMethod < sizeof ( s_enmOutputMethod ) / sizeof ( SPH_OUTPUT_METHOD ) ;

                       uintOutputMethod++ )

    {

           _tprintf ( TEXT ( "\nTest group %d: %s\n\n" ) ,

                         ( uintOutputMethod + 1 ) ,                                                                                              // The derivation of the test group from the array subscript is completely disposable.

                            s_szOutputMethodMsg [ s_enmOutputMethod [ uintOutputMethod ] ] ) ;                               // The descriptions are read from a parallen table of static strings.


           for ( unsigned int uintTestIndex = 0;

                              uintTestIndex < sizeof ( s_auintAssertedSizes ) / sizeof ( unsigned int );

                              uintTestIndex++ )

           {

                  unsigned int uintTestNumber = uintTestIndex + 1;


                  switch ( int intResult = Exercise_stprintf_s ( uintTestNumber ,                                                   // const unsigned int             puintTestNumber ,    // Ordinal number of test case

                                                                 s_auintAssertedSizes [ uintTestIndex ] ,          // const unsigned int             puintAsserteSize ,   // Tell _tsprintf_s that the buffer has a capacity of this many TCHARs.

                                                                                                                                                           s_enmOutputMethod [ uintOutputMethod ] ) )   // const SPH_OUTPUT_METHOD penmOutputMethod     // Tell Exercise_stprintf_s whether to use _tprintf to output directory, or indirectly through _tstprintf_s.

                  {

                         case SPH_TEST_SUCCEEDED:

                         case SPH_TEST_FAILED:

                                _tprintf ( TEXT ( "    Outcome of test # %d = %s\n\n" ) ,

                                             uintTestNumber ,                                                                                                  // Substitute for token %d

                                                s_szResultMsg [ intResult ] );                                                                   // Substitute for token %s.

                                break; // cases SPH_TEST_SUCCEEDED and SPH_TEST_FAILED do the same thing, and end here.


                         case SPH_TEST_REPORTING_ERROR:

                                _tprintf ( TEXT ( "    Test # %d reported that a call to function _tprintf produced nothing.\n\n" ) ,

                                             uintTestNumber );                                                                                          // Substitute for token %d

                                break; // Case SPH_TEST_REPORTING_ERROR has a dedicated message, and ends here.


                         default:

                                _tprintf ( TEXT ( "    Test # %d reported an unexpected result code of 0x%08x (%d decimal)\n\n" ) ,

                                             uintTestNumber ,

                                                intResult ,                                                                                                           // Hexadecimal (format token 0x%08x)

                                           intResult );                                                                                                         // Decimal (format token %d)

                  }      // switch ( int intResult = Exercise_stprintf_s ( uintTestNumber , s_auintAssertedSizes [ uintTestIndex ] , s_enmOutputMethod [ uintOutputMethod ] ) )

           }      // for ( unsigned int uintTestIndex = 0; uintTestIndex < sizeof ( s_auintAssertedSizes ) / sizeof ( unsigned int ); uintTestIndex++ )

    }      //     for ( unsigned int uintOutputMethod = 0 ; uintOutputMethod < sizeof ( s_enmOutputMethod ) / sizeof ( SPH_OUTPUT_METHOD ) ; uintOutputMethod++ )

Listing 8 is the core of the main routine, consisting of a switch statement that evaluates the value returned by application defined function Exercise_stprintf_s, which runs in the innermost of two for loops that index its two key arguments, which come from a pair of arrays iterated by the two loops.

Function Exercise_stprintf_s, the heart of the test program, was almost completely dissected above, in the section titled “Anatomy of a Buffer Overflow.” This routine takes three arguments, all effectively unsigned integers, as shown in Listing 9.

The first argument, puintTestNumber, goes into a couple of messages, and is otherwise ignored.
The second argument, puintAssertedSize, is ignored unless the third argument, penmOutputMethod, is SPH_INDIRECT (2), which is true on the second iteration of the outermost loop in the main routine.
You have probably already guessed that penmOutputMethod determines whether the result of the simple math problem represented by the first statement within the try block is printed directly, via wprintf, or indirectly, by calling swprintf, then sending the buffer to wprintf.

    int    __stdcall Exercise_stprintf_s

           (

                  const unsigned int         puintTestNumber ,          // Ordinal number of test case

                  const unsigned int         puintAsserteSize ,         // Tell _tsprintf_s that the buffer has a capacity of this many TCHARs.

                  const SPH_OUTPUT_METHOD penmOutputMethod        // Tell Exercise_stprintf_s whether to use _tprintf to output directory, or indirectly through _tstprintf_s.

           ) ;

Listing 9 is the prototype of function Exercise_stprintf_s, the real workhorse of this program.

Though it is by far the biggest function in the entire project, Exercise_stprintf_s is straightforward.

Four variables are declared with function scope, three of which are scalars (two INT32 and one int), all three of which are initialized by the declaration.
Next, two _invalid_parameter_handler function pointers are defined, the first of which is set aside to hold a pointer to the default invalid parameter handler, while the second is initialized with the address of a custom handler, SPH_InvalidParameterHandler, which is declared in SecurePrintFHazard.h, and defined near the end of Exercise_stprintf_s.CPP. In retrospect, I would have better off to write _CrtSetReportMode ( _CRT_ASSERT , CRTDBG_MODE_FILE ), followed by _CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDERR), to divert the standard assertion message to the console window.
Next come five fairly unremarkable calls to wprintf (through generic TCHAR mapping macro _tprintf). The accompanying format string contains two format items, 0x%08x, followed shortly by %d. The first format item causes the argument that replaces it to be represented as a hexadecimal string, while the second formats the same item as an unformatted decimal integer.
Although the foregoing technique works well for pointers to strings, displaying the address and value of an integer takes a bit more work. This is the domain of SPH_ShowAddressAndValueOfInt32, which takes the address of the integer (e. g., &rintResult for the first call) and pointers to two strings (e. g., ( LPCTSTR ) &m_szRetCdeAddrTpl1 and ( LPCTSTR ) &m_szScalarValueTpl).
1. SPH_ShowAddressAndValueOfInt32 first calls SPH_ShowAddressOfScalar with the address of the variable and the first of the two format strings.
2. SPH_ShowAddressOfScalar wraps a simple call to wprintf (through the _tprintf macro, as above), returning the number of characters that it wrote. This function could be folded into SPH_ShowAddressAndValueOfInt32, or replaced with a macro. I did neither, because it was easier to thoroughly test and document it as a separate routine, and because the same routine or the macro that supersedes it, can be applied to displaying the address of any scalar. Anticipating that, I cast its plpScalar argument to const void * instead of const INT32 *.
3. The second call to wprintf uses the second format string, dereferencing pintValue along the way to wprintf (hence, the asterisk preceding it in the argument list), so that the print statement renders its value, and explaining why pintValue is cast to const INT32 *, instead of const void *.
Next is a pair of print statements that display the current values of the EBP and ESP registers. While the print statements are more of the same, CPURegisterPeek, the function that reads the CPU registers, deserves a short explanation. The original version of this routine used the two short bits of straightforward inline assembly shown in Listing 12. Its successor, CPURegisterPeek, can report the current contents of any of the general purpose registers except EFL, the flags register. Due to its complexity, and that it is 100% assembly language, CPURegisterPeek is here treated as a black box and out of scope. I have tested it sufficiently to cover the use cases applicable to this program, and the source code is in the main solution directory. I may dissect it in a future article.
Finally, a simple multiplication problem is solved to generate enough material for a small, but nontrivial report, followed by one or two function calls to print it. The first four cases print the report directly, via wprintf, which succeeds for all four cases. The second four cases repeat the same calculation, and invoke swprintf to write the report into a buffer, which is expected to fail for the first and fourth buffer sizes. The first case is expected to write nothing, reporting that the buffer is too small, while the fourth case is the overflow that motivated me to create this program.

Since I wasn’t completely sure how the test routine would behave, I put the math problem and the routines that print the report inside a try block, followed by an ellipsis catch block. I discovered that neither C++ try/catch blocks, nor C style Structured Exception Handling play any role because changes in the new CRT library force any program in which a buffer overflow is detected to terminate. However, since its presence was harmless, I left the try/catch block.

    _invalid_parameter_handler oldHandler , newHandler ;

    newHandler = SPH_InvalidParameterHandler ;

    oldHandler = _set_invalid_parameter_handler ( newHandler ) ;


    _CrtSetReportMode ( _CRT_ASSERT , 0 ) ;  // Disable the message box for assertions.

Listing 10 is the section of Exercise_stprintf_s that registers a custom invalid parameter handler and disables the assertion message box, which it replaces.

int  __stdcall SPH_ShowAddressAndValueOfInt32

(

    const INT32   *             pintValue ,                                            // Pointer to value, which MUST be passed by reference to yield the correct result.

    LPCTSTR                           plpAddressFormat ,                              // Format string for address, which must first specify a string format token (%s), followed by a decimal format token (%d).

    LPCTSTR                           plpValueFormat                                         // Format string for value, which must first specify a string format token (%s), followed by a decimal format token (%d).

)

{

    if ( int rintRC = SPH_ShowAddressOfScalar ( pintValue , plpAddressFormat ) )

    {

           rintRC = _tprintf ( plpValueFormat ,                          // Format string for value

                                             *pintValue ,                             // Hexadecimal representation

                                             *pintValue ) ;                                  // Decimal     representation

           return rintRC ;

    }      // TRUE (expected outcome) block, if ( int rintRC = SPH_ShowAddressOfScalar ( pintValue , plpAddressFormat ) )

    else

    {

           return SPH_ERROR_NOTHING_PRINTED ;

    }      // FALSE (UNexpected outcome) block, if ( int rintRC = SPH_ShowAddressOfScalar ( pintValue , plpAddressFormat ) )

}   // SPH_ShowAddressAndValueOfInt32



int __stdcall SPH_ShowAddressOfScalar

(

    const void *               plpScalar ,                                     // Pointer to the address to print

    LPCTSTR                                  plpFormat                                       // Pointer to format string, which must first specify a string format token (%s), followed by a decimal format token (%d).

)

{

    return _tprintf ( plpFormat ,                                              // Format string

                         plpScalar ,                                            // Hexadecimal representation, replaces 0x%08x format token.

                         plpScalar ) ;                                          // Decimal representation, replacs %d format token.

}   // SPH_ShowAddressOfScalar

Listing 11 is function SPH_ShowAddressAndValueOfInt32, followed by its dependent function, SPH_ShowAddressOfScalar.

    {

        VOID * szESPAddress = NULL;

        __asm

        {

            lea eax , [ EBP ] ;

            mov dword ptr [ szEBPAddress ] , eax ;

        }   /* __asm */

    }   // szEBPAddress goes out ot scope.


    {

        VOID * szESPAddress = NULL;

        __asm

        {

            lea eax , [ ESP ] ;

            mov dword ptr [ szESPAddress ] , eax ;

        }   /* __asm */

    }   // szESPAddress goes out ot scope.

Listing 12 is the inline assembly code that CPURegisterPeek replaced.

To close this section, I call to your attention that all function prototypes and macros are in SecurePrintFHazard.h. This practice affords maximum flexibility, because having the prototype in the master header means that the function can be defined in any source file. Unless the prototype needs macros or typedefs defined in it, the file in which it is defined can omit the master header. SecurePrintFHazard.h is omitted from ProgramIDFromArgV.C, which compiles and links just fine. Declaring ProgramIDFromArgV in SecurePrintFHazard.h and including ProgramIDFromArgV.C in the project is enough to get it compiled and linked.

Lessons Learned

I learned several things from this exercise, some of which were more like blunt force reminders.

The new “secure” functions present a two edged sword.
- They are no panacea, because they introduce new hazards, which won’t necessarily manifest as profoundly as they did in this case. The bugs they introduce may be more subtle, harder to identify, and more dangerous than the deficiencies they are intended to address.
- Pay close attention to the output buffer when you test a new call to any of the functions that incorporate this backfilling technique. This is sound practice whenever you use a function that outputs to a buffer. Thankfully, the fill pattern is easy to identify (Figure 1 was generated from the test program.)
I discovered a new way to report invalid parameters, although the limitations imposed by its interface make it unlikely that I will use it, unless I find a way to use the knowledge that went into this article to extend its capabilities by peering into the previous stack frame.
- In a debug build, its signature exposes details that would be significantly more useful when user code invokes the handler. Nevertheless, even in a debug build, the information directly available through its arguments is only marginally useful.
- Since its arguments are null in a release build, a garden variety implementation such as the one in the test program is useless in a release build, and the code to hook it may as well be suppressed by wrapping it in a test for presence of the _DEBUG preprocessor symbol. To fight code bloat, the routine, itself, should also be guarded, although its prototype may be safely left unguarded, since it its purpose is to provide the compiler with a template against which to validate the syntax of a call.
Studying its signature to discover what useful tidbits are available for use in the error report reminded me of the limitations that all callback functions impose on the code that registers and invokes them, and the desirability of anticipating future requirements when designing a callback interface.
I discovered CRT function _CrtSetReportMode, which can disable the assertion message box if it gets to be too annoying, or send the information to STDERR.
A callback function that is used before it is defined needs a prototype, even if it implements a system defined interface such as _invalid_parameter_handler.
I rediscovered DebugBreak, a Windows API function that can be used to force any process into a debugger. Since the Visual Studio IDE insists on a fully qualified file name for the program to load into the debugger, this is the only way I could monitor its behavior when called from a command prompt or batch file by its unqualified name. I needed to do this to find and fix a character truncation error in helper function ProgramIDFromArgV.
Assembly language modules can be incorporated into a Visual Studio project, which ships with an assembler. I could use the current assembler to assemble CPURegisterPeek.ASM until I got cute with an equate that simulates the behavior of an expression constructed from several sizeof expressions.
Any extra file, even one, such as readme.txt created by the New Project Wizard, that is neither code, nor content, that you delete directly from the file system, remains visible in the Solution Explorer, and causes the solution to be perpetually marked as out of date, prompting a request to rebuild whenever you start the debugger. Use the context menu in the Solution Explorer to remove the file from the solution, and the message goes away.
You can force the build engine to skip a target by setting its modified date to the current time, making it appear to have been modified since the last full build. I used the 64 bit version of FSTouch, a free utility published by Funduc Software, and available at http://www.funduc.com/fstouch.htm.

References

1	“sprintf, _sprintf_l, swprintf, _swprintf_l, __swprintf_l,” the classic family of buffered formatted printing functions, is documented at https://msdn.microsoft.com/en-us/library/ybk95axf.aspx and elsewhere.
2	“sprintf_s, _sprintf_s_l, swprintf_s, _swprintf_s_l,” the corresponding collection of “secure” overloads, is documented at https://msdn.microsoft.com/en-us/library/ce3zzk1k.aspx.
3	“Secure Template Overloads” documents the template macros intended to simplify upgrading at https://msdn.microsoft.com/en-us/library/ms175759.aspx
4	“Howto prevent process crash on CRT error C++",” at http://stackoverflow.com/questions/10719626/howto-prevent-process-crash-on-crt-error-c, gets credit for steering me to the solution that enabled me to create a detailed report about an expected invalid parameter error in the first of four test cases in the demonstration program, even though I eventually concluded that it would have been simpler to redirect the assert to STDERR.
5	“Using Static Buffers to Improve Error Reporting Success,” David A. Gray, 9 April 2015, http://www.codeproject.com/Articles/894564/Using-Static-Buffers-to-Improve-Error-Reporting-Su.

History

20 January 2016 - Corrected a technical error in the table that describes the uses of the various CPU registers as they relate to memory management.