Introduction to Native DLLs - Part 1: Boilerplate

Bruno van Dooren

Rate me:

5.00/5 (5 votes)

11 Jan 2024CPOL10 min read

3.5K

In this article, I explain how to create win32 style DLLs and what the various significant intricacies are.

In Windows, win32 style DLLs are used to provide a transparant and functional interface to client applications. The main reason for using this particular type of interface is that it is programming language agnostic. Virtually every programming language or scripting language in existence can interface with with old-style DLL interfaces. There are various rules, requirement and limitations in place when doing this. In this article, I will be explaining the boilerplate.

Introduction

This is the first part of a short series about building native DLLs, meaning DLLs that are built in plain C or C++. In this first part, I explain the boilerplate code and the various restrictions and considerations. There is not going to be a lot of code in this article. It's mostly explanation of things you need to know and understand before starting your project.

Since this is already quite lengthy, I've decided to split it up because explaining the actual process of building a DLL and the various techniques can also be lengthy and I didn't want to put everything in one mammoth article.

Background

As a software developer, there is a good chance that at some point, you will be implementing code that needs to be called by other programs. This could mean software algorithms but it's equally possible that you may be a developer for a hardware company, providing third party software developers a means to perform a voltage measurement, to open a valve or to open an inter-dimensional portal.

There are various ways in which you could do that. There is a wide variety of technologies such as .NET component libraries, ActiveX objects, DCOM, etc. All these methods have their merits but the problem, in terms of accessibility, is that it limits the range of client applications that can use them. A .NET library cannot easily be used by non-.NET applications. DCOM objects cannot be used by all scripting languages. Both ActiveX and DCOM need type libraries and security configuration, etc.

However, there is one technology that is universally useable by virtually every programming language and scripting language, which is the concept of a dynamic linked library (DLL) that exports function calls. The reason is simple: every language needs to be able to interact with the Windows platform APIs. These are provided as exported functions in various DLLs, so any language needs to support that functionality just to exist.

If you provide a DLL functional interface, it is also trivial to wrap a .NET class library or DCOM object around it to provide client applications with a full spectrum of interface technologies. The concept of a DLL is rather simple. You have a C or C++ project in which you implement certain functions that are exported, and those functions are compiled into binary file with a .DLL extension. This file can then be loaded by client applications which can call any of the previously exported functions.

Depending on how the client application is made, the DLL can be added as a dependency during the client build process (meaning it is statically loaded, always), or the client application can be programmed to manually load the DLL at a certain point when it is running (dynamic loading). Why you would choose one way over the other is explained later.

The DLLMain Function

Just like any Windows executable has a function that acts as the entry point where the program starts executing (usually main or winmain), DLLs have a main function. This function is called by the Windows DLL Loader at various points in its lifecycle.

The default function body looks like this:

C++

BOOL APIENTRY DllMain( HMODULE hModule,
                       DWORD  ul_reason_for_call,
                       LPVOID lpReserved
                     )
{
    switch (ul_reason_for_call)
    {
    case DLL_PROCESS_ATTACH:
    case DLL_THREAD_ATTACH:
    case DLL_THREAD_DETACH:
    case DLL_PROCESS_DETACH:
        break;
    }
    return TRUE;
}

hModule is a handle that identifies the DLL to the Windows subsystem. lpReserved can be ignored most of the time except during process attach / detach where it is used to indicate if a DLL is statically loaded / unloaded, and ul_reason_for_call identified the reason for which the DllMain function is executed at that particular time.

DLLMain Function Call Reasons

During the lifecycle of the DLL, the DllMain function will be called in the context of a specific thread at various points in time.

DLL_PROCESS_ATTACH and DLL_PROCESS_DETACH are the call reasons when the DLL is loaded into, and unloaded from the application's memory space, in the thread that happens to be active when this happens.

DLL_THREAD_ATTACH is the call reason whenever a thread uses the DLL for the first time. If multiple threads use the DLL, then the DllMain function will be called multiple times by the Loader. DLL_THREAD_DETACH is executed for every thread. Note that if DllMain is executed for a DLL_PROCESS_... reason, the thread on which that is done will not call DllMain for the DLL_THREAD_... reason.

You can use that switch statement to implement some initialization of variables such as counters, thread-local storage, etc. But there are a couple of issues you need to be aware of.

It's Unreliable

The first issue is that the DllMain function is not executed for threads that are already existing when the DLL is loaded. This is not an issue when the DLL is statically linked to the client and it is always loaded before any threads are created. But if it is dynamically loaded, there may be threads for which that initialization is not yet done. And it may happen that the DLL_THREAD_DETACH case is executed a thread even if its initial DLL_THREAD_ATTACH case was not executed.

Furthermore, DLL_PROCESS_DETACH may be executed on a different thread than the one on which DLL_PROCESS_ATTACH was executed. If that happens, DLL_THREAD_DETACH will be called for the thread where DLL_PROCESS_ATTACH was called, and DLL_PROCESS_DETACH will be executed on whatever the active thread is. That thread will then not receive the DLL_THREAD_DETACH call.

And of course, if a process just terminates, nothing is executed.

Obviously, you cannot prevent a process being terminated, so that is not some you even need to worry about. Thread level initialization or global DLL initialization are things you should worry about if they are applicable to your use case. I will cover that later in a follow-up article.

Load Order Issues

DllMain is called during the process of loading and unloading modules. Only 1 such operation can be in progress at any given time in a single process.

The documentation says to never rely on anything in a DllMain function that is not either in the DLL itself, or in kernel32.dll. Kernel32.dll is the only DLL guaranteed to be in the address space already. This means that whatever initialization or cleanup you do in a DLL can never, under any circumstance, rely on code that is located in another DLL.

If your DllMain function would call User, Shell, and COM functions for example (or other code relying on such functions), that may result in calls to DLLs that are not yet loaded and initializes, causing access violations.

Loader Locking

The locking issue boils down to this: In any particular application context, the DllMain function is executed under the protection of a Loader lock. So if two different threads try to load a DLL at the same time, the system will use the lock to guarantee that only 1 DllMain function can execute at the same time.

It is unlikely that you would design your program in a way that you'd be using LoadLibrary in multiple threads at the same time or recursively, but it may be called invisibly.

At some point in my past career, I was maintaining a library for interfacing with databases through ODBC connections. And I thought it prudent to close all open ODBC handles when DLL_PROCESS_DETACH happens. It worked well on my own system and passed all tests. Yet I had one colleague who reported application hangs on exit. As it turned out, he was using a database for which the ODBC driver used a specific DLL that was unloaded in the background as soon as the last connection handle was closed.

The unloading of that DLL would trigger an immediate attempt to acquire the loader lock to protect the calls to its DllMain function. But the lock was already held in the DllMain function where I was closing the ODBC handles, resulting in a deadlock.

Global Object Constructors / Destructors

The same restrictions that apply to DllMain, apply to constructors and destructors that are executed in global scope, because their execution is not synchronized with DllMain. The construction and destruction of global objects should not rely on DLLs other than kernel32.dll.

Conclusion

While it is certainly tempting, do not use DllMain for anything but simple value initialization, and initializing thread local storage or synchronization objects. Anything more can lead to hard to diagnose problems.

DLL Design Restrictions

With that part out of the way, it's important to note that there are some more restrictions that need consideration.

The Runtime / Memory Ownership

Every C++ or C project is linked against a runtime. This runtime takes cares of many things such as allocating and deallocating memory. Applications and the DLLs they use can be built with different languages or different versions of the same language. Even if they are built with the same development tools, they could use different versions of the same runtime (debug vs release).

This means that a piece of memory that is new'ed in a certain module (DLL or EXE), must be delete'd in the same module. Memory ownership stays with the module that allocated the module, or runtime exceptions will be the result.

If it is needed to allocate memory in your DLL and have it cleaned up by the client, you can do this by either exporting a dedicated memory release function in your DLL or -in case you use the win 32 API to allocate dynamic memory- document which win32 function the client needs to call to release the memory.

Using Classes / Structs

C++ has no binary interface standard. Every compiler and compiler version is free to implement things their own way. This means that you cannot pass a pointer to an object to a client (or from a client into the DLL) and expect to dereference the pointer. There is no guarantee that 2 modules have the same idea about how a class is mapped to memory and how an object pointer can be dereferenced.

You can use structs if you are using them as plain old data structures without using member functions, constructors, etc. This is why the win32 API itself uses structs in many of its function calls. If you do use structs, be sure to explicitly specify and document the memory packing so that client applications can use the structs properly even if their default pack parameters are different.

Error Handling

No doubt, the ability to report errors is crucial when implementing a library for client applications. Sadly, we are bound by the same runtime limitations. You cannot use C++ exceptions, because there is no binary standard for doing so. Even if that was not a problem, if your client application is a Powershell script, or a LabVIEW project, or a VB application, they simply cannot deal with C++ exceptions.

The same is true for Structured Exception Handling (SEH). Not all client applications can use them, so it is inadvisable to do so.

When all is said and done, the only real option is to use return codes. The ability to deal with function return values is universal.

The Exception to All These Limitations

The previous restrictions all explain what you cannot / should not do. There is one exception to this rule. If you have a very large codebase with multiple teams working on it, and it is split into different projects for organizational and management reasons, everything can be compiled and linked in the same build environment.

If that is the case, all those restrictions can be ignored because you have the organizational guarantee that the compiler version is the same. This means the vendor specific implementation details can be considered universal, which means that it is possible to ignore these limitations.

Of course, that doesn't necessarily make it a smart design decision to new memory in one module, and delete it in another. But it makes it ok to use classes and objects across module boundaries for example.

Points of Interest

When all is said and done, the reality of having code in different modules forces a large number of design constraints on what you can or cannot do. The explanations above are pretty boring, but ultimately it is important to keep them in mind.

The next article is much more interesting and demonstrates the various coding techniques and considerations.

History

11^th January, 2024: First version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Bruno van Dooren

Software Developer

Belgium

I am a former professional software developer (now a system admin) with an interest in everything that is about making hardware work. In the course of my work, I have programmed device drivers and services on Windows and linux.

I have written firmware for embedded devices in C and assembly language, and have designed and implemented real-time applications for testing of satellite payload equipment.

Generally, finding out how to interface hardware with software is my hobby and job.

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.