Multi threaded high throughput applications and services must choose the synchronization mechanisms carefully to gain an optimum throughput. I have run several tests on different computers so I can compare common synchronization mechanisms in the Microsoft .NET CLR.
Here are the results.
Test Results
Processor | Empty Loop | Interlocked | Locked | Polling | Reader Writer Lock
Read / Write |
AMD Opteron 4174 HE 2.3 GHz | 10.4 ms | 30.6 ms | 90.8 ms | 1095 ms | 208 ms / 182 ms |
AMD Athlon 64 X2 5600+ 2.9 GHz | 7,1 ms | 19.9 ms | 37.8 ms | 546 ms | 88.9 ms / 82.8 ms |
Intel Core 2 Quad Q9550 2.83 GHz | 4.3 ms | 19.3 ms | 56.2 ms | 443 ms | 99 ms / 82.6 ms |
Azure A1 (Intel Xeon E5-2660 2.2 GHz) | 8.0 ms | 19.9 ms | 57.5 ms | 502 ms | 108 ms / 104 ms |
Processor | Auto Reset
Event | Manual Reset
Event | Semaphore | Mutex | Thread Static |
AMD Opteron 4174 HE 2.3 GHz | 2927 ms | 3551 ms | 2732 ms | 2764 ms | 12.8 ms |
AMD Athlon 64 X2 5600+ 2.9 GHz | 1833 ms | 2052 ms | 1870 ms | 1733 ms | 18,2 ms |
Intel Core 2 Quad Q9550 2.83 GHz | 1015 ms | 1328 ms | 1169 ms | 1099 ms | 13.2 ms |
Azure A1 (Intel Xeon E5-2660 2.2 GHz) | 1576 ms | 2215 ms | 1760 ms | 1847 ms | 8.4 ms |
As we see, there are huge differences in the throughput of the synchronization mechanisms. Next, I will discuss each mechanism and its use cases.
.NET Synchronization Mechanisms
Interlocked (Interlocked.Increment ...)
Interlocked access has 3-4 times the overhead than unsynchronized access. It is best when a single variable must be modified. If more than one variable must be modified, consider the use of the lock
statement. On standard hardware, a throughput of 400.000.000 Interlocked operations per second can be achieved.
Locked (C# lock Statement)
Locked access is about 5-10 times slower than unsynchronized access. It is quite equal to Interlocked
when two variables must be modified. From a general perspective, it is fast and there is no need for tricks (malicious double locking) to avoid it. On standard hardware, a throughput of 200.000.000 lock operations per second can be achieved.
Polling (Thread.Sleep)
The overhead of Thread.Sleep
itself has an overhead of about 100 times, which is quite equal to other complex synchronization mechanisms like events and semaphores. Polling with Thread.Sleep(0)
can achieve a throughput of about 1.500.000 operations per second. But it burns a lot of CPU cycles. Due to the fact that Thread.Sleep(1)
will sleep about 1.5 ms, you can't get much more than 750 operations per second with a real polling mechanism that doesn't burn a lot of CPU cycles. Polling is a no go for high throughput synchronization mechanisms.
Reader Writer Lock (ReaderWriterLockSlim)
A reader / writer lock is about two times slower than locked access. It has an overhand of about 15-20 times to the unsynchronized access. The acquiring of the lock itself has nearly no difference if you choose read or write access. It has a throughput of nearly 100.000.000 lock operations per second on standard hardware. It is meant for resources with a lot of readers and one (or only a view) writers and in this scenario, you should use it.
Auto Reset Event (AutoResetEvent)
An auto reset event is about 100-150 times slower than the unsynchronized access. It has an throughput of about 700.000 operations per second.
Manual Reset Event (ManualResetEvent)
An manual reset event is about 150-200 times slower than the unsynchronized access. It has an throughput of about 500.000 operations per second. You should check if your architecture requires the manual reset, and consider to use the faster auto reset event if possible.
Semaphore (SemaphoreSlim)
An semaphore is about 100-150 times slower than the unsynchronized access and has an throughput of about 700.000 operations per second. There is no alternative to its functions
Mutex (Mutex)
A mutex is about 100-150 times slower than the unsynchronized access and has an throughput of about 700.000 operations per second. There
Thread Static Field ([ThreadStatic] static)
Last but not least, the thread static field. It isn't a synchronization mechanism, but it is a great way to avoid synchronization at all. With a thread static each thread can keep its own object to work with. So there is possibly no need to synchronize with other threads. Thread static field access is about 2-3 times slower than unsynchronized access.
Conclusion
If you have to synchronize a piece of code that runs much longer than 0.1 milliseconds or has a throughput less than 10.000 operations per second, don't worry about the performance of the synchronization mechanisms. They are all fast enough and will have nearly no performance impact to the rest of your code. It is more important to avoid over-synchronization than to choose the fastest method. If you have to synchronize small operations with a throughput of 100.000+ operations per second, the synchronization mechanism must be optimized to gain not too much impact on the throughput.
Source Code
Here is the C# source code for the tests:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Threading;
namespace ThreadingPerformance
{
public class StaticFieldTestClass
{
public static int StaticField;
[ThreadStatic]
public static int ThreadStaticField;
}
public class Program
{
public static int mtResource;
private static object resourceLock;
private static ReaderWriterLockSlim resourceReaderWriterLockSlim;
private static AutoResetEvent resourceAutoResetEvent;
private static ManualResetEvent resourceManualResetEvent;
private static Semaphore resourceSemaphore;
private static Mutex resourceMutex;
public static void Main(string[] args)
{
Console.WriteLine("Performance Tests");
Console.WriteLine(" Stopwatch Resolution (nS): " +
(1000000000.0 /Stopwatch.Frequency).ToString());
resourceLock = new object();
resourceReaderWriterLockSlim = new ReaderWriterLockSlim();
resourceAutoResetEvent = new AutoResetEvent(true);
resourceManualResetEvent = new ManualResetEvent(true);
resourceSemaphore = new Semaphore(1,1);
resourceMutex = new Mutex();
RunTests(1000000);
Console.WriteLine("Tests Finished, press any key to stop...");
Console.ReadKey();
}
public static void RunTests(int iterations)
{
Console.WriteLine(" Iterations: " + iterations.ToString());
Stopwatch watch = new Stopwatch();
Console.WriteLine();
Console.WriteLine(" Simple (Empty) Call - Bias for all tests");
SimpleCall();
SimpleCall();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
SimpleCall();
watch.Stop();
Console.WriteLine(" Simple (Empty) Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" Interlocked Call");
InterlockedCall();
InterlockedCall();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
InterlockedCall();
watch.Stop();
Console.WriteLine(" Interlocked Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" Locked Call");
LockCall();
LockCall();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
LockCall();
watch.Stop();
Console.WriteLine(" Locked Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" Polling Call");
PollingCall();
PollingCall();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
PollingCall();
watch.Stop();
Console.WriteLine(" Polling Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" ReaderWriterLockSlim Read Call");
ReaderWriterLockSlimReadCall();
ReaderWriterLockSlimReadCall();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
ReaderWriterLockSlimReadCall();
watch.Stop();
Console.WriteLine(" ReaderWriterLockSlim Read Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" ReaderWriterLockSlim Write Call");
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
ReaderWriterLockSlimWriteCall();
watch.Stop();
Console.WriteLine(" ReaderWriterLockSlim Write Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" AutoResetEvent Call");
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
AutoResetEventCall();
watch.Stop();
Console.WriteLine(" AutoResetEvent Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" ManualResetEvent Call");
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
ManualResetEventCall();
watch.Stop();
Console.WriteLine(" ManualResetEvent Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" Semaphore Call");
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
SemaphoreCall();
watch.Stop();
Console.WriteLine(" Semaphore Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" Mutex Call");
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
MutexCall();
watch.Stop();
Console.WriteLine(" Mutex Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" StaticField Setter Call");
StaticFieldSetter();
StaticFieldSetter();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
StaticFieldSetter();
watch.Stop();
Console.WriteLine(" StaticField Setter Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" ThreadStaticField Setter");
ThreadStaticFieldSetter();
ThreadStaticFieldSetter();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
ThreadStaticFieldSetter();
watch.Stop();
Console.WriteLine(" ThreadStaticField Setter Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" StaticField Getter Call");
StaticFieldGetter();
StaticFieldGetter();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
StaticFieldGetter();
watch.Stop();
Console.WriteLine(" StaticField Getter Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
Console.WriteLine();
Console.WriteLine(" ThreadStaticField Getter");
ThreadStaticFieldGetter();
ThreadStaticFieldGetter();
watch.Reset();
watch.Start();
for (int i = 0; i < iterations; ++i)
ThreadStaticFieldGetter();
watch.Stop();
Console.WriteLine(" ThreadStaticField Getter Call Elapsed Time (mS): " +
((double)watch.ElapsedTicks / Stopwatch.Frequency * 1000).ToString());
}
public static void SimpleCall()
{
++mtResource;
}
public static void InterlockedCall()
{
Interlocked.Increment(ref mtResource);
}
public static void LockCall()
{
lock (resourceLock)
{
++mtResource;
}
}
public static void PollingCall()
{
Thread.Sleep(0);
lock (resourceLock)
{
++mtResource;
}
}
public static void ReaderWriterLockSlimReadCall()
{
resourceReaderWriterLockSlim.EnterReadLock();
++mtResource;
resourceReaderWriterLockSlim.ExitReadLock();
}
public static void ReaderWriterLockSlimWriteCall()
{
resourceReaderWriterLockSlim.EnterWriteLock();
++mtResource;
resourceReaderWriterLockSlim.ExitWriteLock();
}
public static void AutoResetEventCall()
{
resourceAutoResetEvent.WaitOne();
++mtResource;
resourceAutoResetEvent.Set();
}
public static void ManualResetEventCall()
{
resourceManualResetEvent.WaitOne();
resourceManualResetEvent.Reset();
++mtResource;
resourceManualResetEvent.Set();
}
public static void SemaphoreCall()
{
resourceSemaphore.WaitOne();
++mtResource;
resourceSemaphore.Release();
}
public static void MutexCall()
{
resourceMutex.WaitOne();
++mtResource;
resourceMutex.ReleaseMutex();
}
public static void StaticFieldSetter()
{
StaticFieldTestClass.StaticField = ++mtResource;
}
public static void StaticFieldGetter()
{
mtResource += StaticFieldTestClass.StaticField;
}
public static void ThreadStaticFieldSetter()
{
StaticFieldTestClass.ThreadStaticField = ++mtResource;
}
public static void ThreadStaticFieldGetter()
{
mtResource += StaticFieldTestClass.ThreadStaticField;
}
}
}