Click here to Skip to main content
15,867,488 members
Articles / Programming Languages / C#

Unified Concurrency I - Introduction

Rate me:
Please Sign up or sign in to vote.
5.00/5 (10 votes)
24 Mar 2019CPOL14 min read 26.2K   191   32   11
The Cross-Platform Object-Oriented approach to Synchronization Primitives for .NET and .NET Core based on one shared pattern between two interfaces for General Threading and Async/Await.
In this introductory article, I present a list of implemented synchronization primitives with a statement of their properties, performance assessment and code examples without complexities of performance measurement/analysis.

Image 1

Articles Series

Introduction

The Unified Concurrency framework is designed to provide simple to use synchronization primitives designed with OOP principles in mind with the interface and pattern-based approach for general and async/await threading scenarios. Implemented synchronization primitives cover general thread locking, spin-locking, ticket spin-locking, async locking, async spin-locking and async ticket spin-locking. The Unified Concurrency framework has been developed in such way to allow agile changes of the synchronization primitive with limited changes of source code, usually one line. It is also easier to move from general threading to async/await style and back, thanks to the shared pattern. Implemented synchronization primitives usually offer better performance than what can be done with C# lock, offer abilities not implemented in .NET or wrapup existing synchronization primitive with minimum overhead.

Unified Concurrency framework intends to offer an agile tool with the ability to easily change synchronization primitive with minimalistic code changes. After all, today's code is tomorrows petrified Line Of Business legacy code nobody dares to touch. One has to choose well and keep the doors opened for later changes.

This introductory article is first in the row. It is my intent to present here a list of implemented synchronization primitives with a statement of their properties, performance assessment and code examples without complexities of performance measurement/analysis. It is the job for the next articles to dig into complex issues, measurement scenarios, methods, detailed analysis and concurrent unit testing of implemented synchronization primitives.

The Unified Concurrency framework is implemented in the open-source GreenSuperGreen library available on the GitHub and the Nuget (3 packages, lib, unit tests, benchmark). Examples will be provided in the form of Visual Studio solution with a project targeting necessary Nuget package as in real use case.

NET 4.6
NetStandard 2.0
NetCore 2.1
Net 4.7.2

Interface and Pattern-based

Every general synchronization primitive must implement the interface ILockUC:

C#
public interface ILockUC
{
    SyncPrimitiveCapabilityUC Capability { get; }

    EntryBlockUC Enter();
    EntryBlockUC TryEnter();
    EntryBlockUC TryEnter(int milliseconds);
}

Every async/await style synchronization primitive must implement the interface IAsyncLockUC:

C#
public interface IAsyncLockUC
{
    SyncPrimitiveCapabilityUC Capability { get; }

    AsyncEntryBlockUC Enter();
    AsyncEntryBlockUC TryEnter();
    AsyncEntryBlockUC TryEnter(int milliseconds);
}

EntryBlockUC is a struct with an implemented IDisposable interface ready for the using statement in the C# language. The implemented method Dispose() can be used only once and it should let the using construct to invoke the method at the end of executing the block as is the common rule in using pattern.

The using statement is expecting an object with an implemented method: Dispose() and it will call it at the end of the using statement block.
Even when the EntryBlockUC is a struct, using statement in C# is not going to box the EntryBlockUC in order to gain access to the method: Dispose().
But, if we will store the EntryBlockUC to variable of type object or IDisposable, the boxing will happen. That is why all the examples either won't specify returned type if they do not need it for method Enter() or they will specify the returned type as EntryBlockUC.

The AsyncEntryBlockUC is async/await style awaiter, it must be awaited and will provide a result of the EntryBlockUC type.

The EntryBlockUC is giving us information, whether we got the entry or not with property, bool HasEntry.
It is useful only in case of TryEnterTryEnter(timeout) methods. In the case of the method: Enter() it is always a successful entry.

Reasonable Limitations

The Unified Concurrency framework in order to unify multiple synchronizations primitives had to be limited in certain abilities. These limitations are actually very reasonable.

  • Synchronization primitives under Unified Concurrency won't support recursive access to itself. There is no language support to make it happen for async/await scenarios efficiently and not all synchronization primitives are capable of providing recursive access.
  • Synchronization primitives under Unified Concurrency are not thread-affine, thus access entry and access exit can occur on unrelated threads.

These limitations are actually making possible to await inside any locked block created by Unified Concurrency synchronization primitive.

Access Fairness

One of the defining qualities of each synchronization primitive is the ability to guarantee access-fairness. Some synchronization primitives do not guarantee this ability at all. For example, general SpinLocks do not as they use atomic instructions and hardware is to decide who gets the access first and the decision is driven by cache state. Unfair access is then causing thread starving, it is easy to show, that SpinLocks during heavy contention like to starve some threads for minutes if the time spent inside the protected block gets over some critical time which is number of threads, number of cores and hardware dependent. The only interesting point is that this critical time is getting smaller with new hardware architectures and more cores and thus the issue is going to play higher roles with each passing year.

On the other hand, some synchronization primitives can guarantee access fairness and these will be noticed here.
It is also interesting that RealTime Operating Systems tend to guarantee access fairness as it is bringing very important quality, ensured load balancing.

Fairness, or rather lack thereof is not one kind, but there are more ways it can happen and also fair/unfair can have a broader meaning.

Access Fairness: KernelRoughlyFIFO

One way unfairness can occur is well described in the Joe Duffy's Concurrent Programming on Windows: "Because monitors use kernel objects internally, they exhibit the same roughly-FIFO behavior that the OS synchronization mechanisms also exhibit (described in the previous chapter). Monitors are unfair, so if another thread tries to acquire the lock before an awakened waiting thread tries to acquire the lock, the sneaky thread is permitted to acquire a lock. Threads awakened that fail to acquire the resource for which they have been awakened will have to rewait and do it over again at some point."

Access Fairness: AtomicInstructionsUnfairness

Another way unfairness can occur is with atomic instructions based synchronization mechanisms as SpinLocks in specific scenarios, where a thread enters the SpinLock and keeps access for some time while other threads start contending the SpinLock then the thread owning access is leaving the SpinLock, but immediately is trying to enter the SpinLock again. The thread owning the SpinLock access last time has somewhat pre-opened doors than others, it has everything needed already in the cache, so it does not have to lose time and is ready to enter easily. If there were many threads contending the SpinLock then we can observe some threads being starved for longer time and threads unfairly gaining the access out of order.

Observing or proving that this unfairness has occurred is not easy.

Access Fairness: EnhancedAtomicInstructionsUnfairness

The previous type of unfairness, AtomicInstructionsUnfairness, is usually even enhanced if the SpinLock is enhanced to waste fewer CPU resources just by burning instruction cycles. Enhancements come in form of SpinWaiting, thread yielding, thread sleeping and combinations of previous techniques. While some CPU resources can be saved, especially if the access is kept longer time, like hundreds of microseconds or even milliseconds, but it also invites the unfairness because now a thread can exit from SpinLock and enter and can get access out of order very easily. It is all just a matter of pure chance and wrong timing and thread can sit and starve even for seconds. Creating this kind of starving is quite easy and it will be specifically mentioned in the third article.

Implemented Synchronization Primitives

  • MonitorLockUC - implemented as a thin wrapper around C# lock, Monitor, implemented only internally in the framework, for comparison and benchmarking, the C# lock has significant performance and side-effect issues on many-core CPU's, it does not guarantee access fairness,
  • LockUC - replacement for C# lock, Monitor.Enter(), it has in general better performance than C# lock, it has guaranteed access fairness,
  • SpinLockUC - a thin wrapper around the .NET SpinLock struct, it does not guarantee access fairness,
  • TicketSpinLockUC - a simple Ticket Spin Lock, it has guaranteed access fairness, does not implement TryEnter methods,
  • SemaphoreLockUC - Semaphore WaitOne/Release based lock, operating system dependent, on windows roughly FIFO, fairness is not guaranteed.
  • SemaphoreSlimLockUC - SemaphoreSlim Wait/Release based lock incorporates a hybrid approach with atomic instructions which does not play well in FIFO style, unfair access that can cause threads to stall!
  • MutexLockUC - This synchronization primitive is not accessible, only to predefined benchmarking projects, because it requires thread affinity on entering and exit calls, which is not supported in the Unified Concurrency by design, but for the benchmarking it is maintainable and interesting for gathering data.
  • AsyncLockUC - the async/await style synchronization primitive is a most performant synchronization primitive implemented in the Unified Concurrency framework, it has guaranteed access fairness,
  • AsyncSpinLockUC - async/await style Spin Lock, it does not guarantee access fairness,
  • AsyncTicketSpinLockUC - async/await style Ticket Spin Lock, it has guaranteed access fairness.
  • AsyncSemaphoreSlimLockUC - SemaphoreSlim WaitAsync/Release based lock, which seems to play well in FIFO style, fair access. Performance-wise similar to the AsyncLockUC.

MonitorLockUC

The MonitorLockUC is a thin wrapper internally implemented in the Unified Concurrency framework for benchmarking and testing purposes. It is based on .NET Monitor and used by C# lock. Both references will be used interchangeably to reference each other. The reason why it is not accessible is that firstly, there are other more performant synchronization primitives and secondly, the C# lock has some unforeseen performance issues and side effects that do not appear until heavy contention is achieved. In the complex code, it is hard to artificially create a necessary condition to cause the same effects yet they occur in production. That makes the C# lock problematic for testing and analyzing the cause of the issue.

At some point in history, the C# lock was clean threading suspend/resume synchronization primitive. It made the behavior more predictable with fewer side effects. That changed fast with invent of multi-core CPUs. The Monitor is now a hybrid synchronization primitive, threads are actually spinning at the application level for some time before the call to the Operating System is executed, that allows gaining the lock faster. Unfortunately, this schema has scaling issues based on a number of cores. Where there were 4 cores, there are now 24 or 40 cores. If the number of threads accessing the lock is this large, we start to see limitations. Cores are spinning longer and often and the waste of CPU cycles on cores is growing fast than actual work being done, the synchronization primitive throughput period is extremely short yet CPU resources can be completely wasted. This kind of behavior can be explained as CPU gridlock.

One more issue to mention is, that the Monitor / C# lock cannot guarantee access fairness. It is probably caused by hybrid nature, but also thread suspend/resume FIFO behavior can be overridden in certain cases at the Operating System level. The thread that actually came later can get access sooner.

Heavy contention through C# lock when analyzed for its performance issues and side effects says a lot about the design of software. Obviously, it is the case of throwing too much parallelism on the synchronous problem than is healthy. New designs should count on these issues and avoid them everywhere.

C#
//it is internal class, not accessible in general
ILockUC Lock { get; } = new MonitorLockUC();

using (Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

LockUC

The LockUC synchronization primitive seems to be more performant than the Monitor, the CPU waste growth is way slower. In the very extreme cases of heavy contention and extremely short times in the locked block, it is losing on the Monitor, but these kind cases are well beyond any reasonable usability for LockUC and the Monitor / C# lock as well, pushing to the areas where spin-locking or ticket spin-locking is playing best. LockUC does not try to spin-wait in the heuristic attempt to avoid thread suspension, quite differently, it is trying to suspend as fast as possible if the access is already taken by another thread.

Suspend/Resume based on threads is for scenarios where we expect hundreds of microseconds, milliseconds or longer waiting to gain access to the locked area. It is also useful for cases where we do not expect high contention and/or we are not specifically troubled by performance. But, we should always be on guard to disallow less performant parts of our code base to dissipate CPU resources needlessly. For this kind scenarios, the LockUC is the best case in general threading and is also a good C# lock replacement.

It is the main issue of the C# lock / Monitor to try to work across the whole spectrum from long processing in locked areas to super short ones, from heavy contention to no contention at all.

C#
ILockUC Lock { get; } = new LockUC();

using (Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

SpinLockUC

C#
ILockUC Lock { get; } = new SpinLockUC();

using (Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

TicketSpinLockUC

The TicketSpinLockUC synchronization primitive is an implementation of classic Ticket Spin-Lock missing in the .NET. The main goal is to provide lock-free access similar to the SpinLock, but ensuring access fairness. It is then little less performant than the SpinLockUC as throughput period is obviously little worse, but thread starving is avoided and load balancing is ensured.

Unfortunately, the class Interlocked does not implement all the necessary atomic operations to implement easily TryEnter methods. So these are not implemented in the TicketSpinLockUC and throw an appropriate exception if used.

C#
ILockUC Lock { get; } = new TicketSpinLockUC();

using (Lock.Enter())
{
  //locked area
}

//not implemented yet
//using (EntryBlockUC entry = Lock.TryEnter())

//not implemented yet 
//using (EntryBlockUC entry = Lock.TryEnter(msTimeout: 25))

SemaphoreLockUC

Semaphore WaitOne/Release based lock, operating system dependent, on windows roughly FIFO, fairness is not guaranteed.

C#
ILockUC Lock { get; } = new SemaphoreLockUC();


using (Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

SemaphoreSlimLockUC

SemaphoreSlim Wait/Release based lock incorporates a hybrid approach with atomic instructions which does not play well in FIFO style, unfair access that can cause threads to stall!

C#
ILockUC Lock { get; } = new SemaphoreSlimLockUC();


using (Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

MutexLockUC

This synchronization primitive is not accessible, only to predefined benchmarking projects, because it requires thread affinity on entering and exit calls, which is not supported in the Unified Concurrency by design, but for the benchmarking, it is maintainable and interesting for gathering data.

C#
ILockUC Lock { get; } = new MutexLockUC();

using (Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

AsyncLockUC

The AsyncLockUC is an async/await style synchronization primitive, it has guaranteed access fairness and it is the most performant synchronization primitive for anything that takes more than a hundred microseconds inside the locked area. It is the only synchronization primitive from Unified Concurrency framework, that is known to prevent CPU gridlock under heavy contention. The CPU gridlock can be explained as a moment when synchronization primitive under heavy contention is spending most of the CPU resources on synchronization itself and almost none or bare minimum on any useful work. The reason the AsyncLockUC is capable of avoiding CPU gridlock is that there is no suspend/resume of threads waiting for the access. If nobody has access, then incoming thread executes the locked area immediately and synchronously. If the incoming thread cannot get access, as it is already taken, then TaskCompletionSource is created and enqueued. Each thread before returning the access dequeues the TaskCompletionSource before leaving, that is if any is present and initiates its completion, effectively scheduling the continuation on the ThreadPool.

C#
IAsyncLockUC Lock { get; } = new AsyncLockUC();

using (await Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = await Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = await Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

AsyncSpinLockUC

The AsyncSpinLockUC is an async/await style synchronization primitive wrapper around .NET SpinLock. Awaiting in the general is partially synchronous up to the first await which is not yet completed. From that point, the call returns and the rest have to be awaited asynchronously. Thus the Enter() call to the AsyncSpinLockUC is actually spin-waiting to gain access and always returning completed awaiter, so awaiting is synchronously proceeding forward. Awaiters here are structs, so there is no allocation involved. Rest is the same with the SpinLocUC, access fairness is not guaranteed and has to be dealt with.

C#
IAsyncLockUC Lock { get; } = new AsyncSpinLockUC();

using (await Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = await Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = await Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

AsyncTicketSpinLockUC

The AsyncTicketSpinLockUC is an async/await style synchronization primitive similar to TicketSpinLockUC. Awaiting in the general is partially synchronous up to the first await which is not yet completed. From that point, the call returns and the rest have to be awaited asynchronously. Thus the Enter() call to the AsyncTicketSpinLockUC is actually ticket spin-waiting to gain access and always returning completed awaiter, so awaiting is synchronously proceeding forward. Awaiters here are structs, so there is no allocation involved. Rest is the same with the TicketSpinLocUC, access fairness is guaranteed. TryEnter methods are not implemented.

C#
IAsyncLockUC Lock { get; } = new AsyncTicketSpinLockUC();

using (await Lock.Enter())
{
  //locked area
}

//not implemented yet
//using (EntryBlockUC entry = await Lock.TryEnter())

//not implemented yet
//using (EntryBlockUC entry = await Lock.TryEnter(msTimeout: 25))

AsyncSemaphoreSlimLockUC

SemaphoreSlim WaitAsync/Release based lock, which seems to play well in FIFO style, fair access.
Performance-wise similar to the AsyncLockUC.

C#
IAsyncLockUC Lock { get; } = new AsyncSemaphoreSlimLockUC();

using (await Lock.Enter())
{
  //locked area
}

//immediate access or no entry at all
using (EntryBlockUC entry = await Lock.TryEnter())
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

//immediate access or access within timeout milliseconds or no entry at all
using (EntryBlockUC entry = await Lock.TryEnter(msTimeout: 25))
{
    if (entry.HasEntry)
    {
        //locked area
    }
}

Summary

The main purpose of this article is to present the list of synchronization primitives implemented in the Unified Concurrency framework and present their abilities and examples.

Revision History

  • 24th March, 2019
    • Extended article series
    • Latest implemented synchronization primitives from the 4th article
    • Download includes the latest examples from the 4th article and cross-platform nuget targets

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Czech Republic Czech Republic
Hi, I am Marek Pavlu,

I am a Software Engineer and Applied Physicist.
I love complex problems because they come with structure, constraints, limitations, interactions - it always helps me remember, understand, manipulate and solve the problem with a limited number of principles and rules helping me build solution capable to reduce the complexity of the problem.

I love logic, humanism, and ethics.
I like to follow politics of USA and especially congressional/senate hearingsSmile | :) .
I like to plant new trees in the large garden around the family house and in recent years I learned how to successfully grow roses.


www.linkedin.com/in/ipavlu

Comments and Discussions

 
GeneralMy vote of 5 Pin
Eugene Sadovoi16-May-19 15:18
Eugene Sadovoi16-May-19 15:18 
QuestionNot sure about your results... Pin
Paulo Zemek26-Apr-18 9:35
mvaPaulo Zemek26-Apr-18 9:35 
AnswerRe: Not sure about your results... Pin
ipavlu28-Apr-18 20:31
ipavlu28-Apr-18 20:31 
GeneralRe: Not sure about your results... Pin
Paulo Zemek29-Apr-18 4:51
mvaPaulo Zemek29-Apr-18 4:51 
GeneralRe: Not sure about your results... Pin
ipavlu29-Apr-18 12:53
ipavlu29-Apr-18 12:53 
GeneralRe: Not sure about your results... Pin
Paulo Zemek30-Apr-18 8:44
mvaPaulo Zemek30-Apr-18 8:44 
GeneralRe: Not sure about your results... Pin
ipavlu30-Apr-18 12:01
ipavlu30-Apr-18 12:01 
GeneralRe: Not sure about your results... Pin
Paulo Zemek30-Apr-18 8:49
mvaPaulo Zemek30-Apr-18 8:49 
Anyways, I will try the library in different contexts to see the results.

What worries me is that, even if it tries to avoid unfairness by spending only a little time inside the SpinLock to queue the threads, it might actually be spending too much time there.

2 main reasons:
* When adding items to a queue, its inner array might get resized.
* Memory allocations. These will be used to allocate any new items in the heap and to resize the queue.

And all memory allocations come with the potential of doing their own locks, starting garbage collections, reserving large areas of the heap (to make the new allocations faster)... and all of those will reduce fairness and will increase the time spent on the "fast" SpinLock.

GeneralRe: Not sure about your results... Pin
ipavlu30-Apr-18 12:29
ipavlu30-Apr-18 12:29 
QuestionI like what you are doing Pin
Stephen Russell AAI MCT27-Mar-18 3:58
professionalStephen Russell AAI MCT27-Mar-18 3:58 
AnswerRe: I like what you are doing Pin
ipavlu29-Mar-18 15:13
ipavlu29-Mar-18 15:13 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.