Deadlocks are common problems in multi-threaded programming. When it comes to multithreading development, the most common problem developers are facing is critical sections. It is not uncommon to use more than a single lock, but if one does not pay attention to the order of the locks, or to the context in which they are being called (e.g., from within a callback), deadlocks will form. (There are many reasons for deadlocks to occur other than the obvious critical section, e.g., two threads that are waiting for each other to signal an event, but we will not discuss them here).
As with anything that is related to threads, timing is everything. The most problematic deadlocks are those which occur rarely, they have this amazing nature of occurring at your client's site...
What if we could make the rare case the normal case? Recall that the reason a deadlock does not occur has to do with the fact that two threads that might deadlock happen not to be at the problematic places at the same time. So, all we have to do is record their "visits" at the problematic places, then we need to verify that the locks order is always the same, and if not, output the stack trace and notify the developer that we found a mismatch in the locks order.
The attached ZIP file contains a DLL that does exactly that. The DLL hacks all common (
TryEnter methods, but it can be extended easily to support others, if used) monitor calls (including, of course, the .NET
lock keyword) and keeps track of the locks order. Once it finds a problem in the order, it creates two stack traces, and directs you to a sample of the problematic locks (after fixing the error, repeat the test and see if there is another problem with the detected locks in another flow).
Note that there is no need for the deadlock to really occur; rather, it is only important that suspected flows (or all flows) will be performed at least once.
Using the code
- Add the file incslock.cs to your project
- Add a reference to the slockimp.dll
- Compile your component and execute it
Analyzing the stacks (slockimp based)
- Once a problem is detected, the console (if one exists) will output the last lock conflicts with the nth of the stack
- Two files are being created in the working directory: first_xxx.txt and now_yyy (xxx and yyy represent numbers)
- Go to the "now" file – find the last lock (prior to the last four calls that are inner to the DLL)
This is the lock that when locked caused the problem
- In order to find the other problematic lock, you can either:
- Spot the nth lock from the beginning of the stack (not counting locks that were recursively locked, and locks that were locked and released)
- Find the last lock in the "first" file
- Go over the "first" file stack, and find the place in which the lock from 3.a was locked
This is the second version of the implementation, which now supports more complex scenarios like the dining philosophers (thanks to Sergey's question below). The stack files are numbered as follows: 0_xxx.txt, 1_xxx.txt, and so on...
0_xxx points to the lock that when locked caused the problem. Other files point to other locks that created a kind of circular waiting.
Points of interest
Notice that you do not need to change a single existing line of code. Rather, you add two files to your project. The trick here is to cause the compiler to use our reference for implementing monitor calls rather then .NET's. (In the DLL itself, the locks are being locked and released properly, so your program should work fine critical-section wise). This trick is similar to replacing a header file in C.
Notice that the DLL is not meant for production, since it affects performance. Also, notice that the DLL allows recursive locks of the same lock. The DLL, however, will notify about possible deadlocks even if the waiting time is not infinite. (Despite the fact this is a false alarm, it indicates bad behavior, since such a behavior might influence performance and will deadlock if the time would be set to infinite).
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.