Click here to Skip to main content
15,878,852 members
Articles / Programming Languages / C#

Unified Concurrency IV - Going Cross-platform

Rate me:
Please Sign up or sign in to vote.
5.00/5 (2 votes)
24 Mar 2019CPOL5 min read 7.7K   41   3  
The Cross-Platform Object-Oriented approach to Synchronization Primitives for .NET and .NET Core based on one shared pattern between two interfaces for General Threading and Async/Await.
In this article, I will discuss and show with cross-platform benchmarks a great potential and incentive for the upgrade from .NET to .NET Core 2.1+, thanks to JIT compilation improvements and also improvements to the multithreaded code, thanks to C# lock (Monitor class) improvements in reduction of CPU waste.

Image 1

Articles Series

Introduction

Today, I want to announce that the Unified Concurrency is going Cross-Platform!
Unified Concurrency is now compliant with .Net Standard 2.0, ensuring access in .NET 4.7+ and .NET Core 2.0+, Mono 5.4+.

The recent cross-platform development in the .NET world seems to be unstoppable and from the .Net Standard 2.0, it seems that the adoption rate of this platform has even accelerated across the spectrum in the open-source and commercial spheres alike. This was a great incentive for the Unified Concurrency to become cross-platform. It is still my intention to keep alive .Net 4.6 version as long as it will be possible, but the main development has shifted to the .Net Standard projects.

Pushing library GreenSuperGreen (containing the Unified Concurrency) under the .Net Standard 2.0 also required complete benchmarking and cross-benchmarking libraries to undergo the same update, opening opportunity to run benchmarks, cross-benchmarks, and platform-cross-benchmarks on .NET / .NetCore and potentially Mono. It would be interesting to do also Linux benchmarking, but currently benchmarking is dependent on PerformanceCounters and they are platform dependent (Windows) and a solution to this problem seems to be an opened question for the future versions of .Net Standard.

Examples of all the implemented synchronization primitives to start to play with are included as the unit test project under the .NetCore 2.1 here:

The Unified Concurrency framework is implemented in the open-source GreenSuperGreen library available on the GitHub and the Nuget.

.NET 4.6
NetStandard 2.0
NetCore 2.1
Net 4.7.2

More Synchronization Primitives

It is now possible to use 3 more synchronization primitives and 1 another is internal only (benchmarking purposes).

AsyncSemaphoreSlimLockUC : IAsyncLockUC

SemaphoreSlim WaitAsync/Release based lock, which seems to play well in FIFO style, fair access.

Performance-wise similar to the AsyncLockUC.

SemaphoreSlimLockUC : ILockUC

SemaphoreSlim Wait/Release based lock incorporates a hybrid approach with atomic instructions which does not play well in FIFO style, unfair access that can cause threads to stall!

SemaphoreLockUC : ILockUC

Semaphore WaitOne/Release based lock, operating system dependent, on windows roughly FIFO, fairness is not guaranteed.

MutexLockUC : ILockUC - internal, specific usage, benchmarking only

This synchronization primitive is not accessible, only to predefined benchmarking projects, because it requires thread affinity on entering and exit calls, which is not supported in the Unified Concurrency by design, but for the benchmarking it is maintainable and interesting for gathering data.

.NET Core Speedup

It has become general knowledge based on reports from Microsoft, outside sources and technical communities that .NET Core can speed-up an existing code base.

With cross-platform benchmarks, I can report improvements on two fronts.

.NET Core Speedup: JIT Compilation

In benchmarking scenarios, the sequential base-line for throughput period is a useful tool to measure potential speedup on the same hardware and the given code was 1.997 times faster on .Net Core 2.1 than on .Net 4.7.2. It does not mean that every code will be this times faster, only that certain code can be JITted more efficiently and thus run faster, but the potential speedup is always code-dependent, there will be cases where further optimization is not possible. Similar speedup has been reported by Stephen Toub for some specific cases.

Image 2

Chart 1: Sequential Throughput Speedup .NET / .NET Core (Speedup is code dependent)

.NET Core Speedup: C# lock - Monitor Class Improvements

The result of the cross-platform benchmarking shows considerable improvement to the C# lock (Monitor class) under Heavy Load scenario and Bad Neighbor scenario as well.

.NET implementation is prone to CPU gridlock, a moment where C# lock (Monitor class) is wasting most CPU resources with very little work being done, effectively synchronization costs takes most CPU resources. This has been reported in previous articles.

.NET Core 2.1 seems to have a way better implementation of C# lock (Monitor class / AwareLock class in C++ of .NET Core 2.1 runtime).

Image 3

Chart 2: CPU resource waste of C# lock (Monitor class) on .Net and .Net Core and LockUC on .Net Core.

Based on Chart 2, it is very easy to conclude where the .NET Core 2.1 is gaining in performance. C# locks are usually spread in most projects all over the code and are part of many common libraries including runtime itself and here, we see considerable CPU resources improvement in certain timing cases with more than 80% improvement in CPU waste! Please compare blue and green trend lines for C# lock, Monitor class.

JIT improvements are important, but multithreaded code full of C# locks is still prevalent in many projects from simple to Line Of Business code bases.
Even with simple projects, the performance gains can be considerable and incentive to upgrade to .NetCore 3.0 with access to WinForms and WPF can be very interesting.

This is an important improvement in the .NetCore 2.1 but there is still room for improvement left, as an example, we can consider the LockUC, please compare the green and red trend lines, which suggests that there is still about 10% to gain, but it is usually bought in part by little worse throughput below 1ms throughput periods, where atomic instructions based synchronization primitives can help gain throughput while managing reasonable CPU waste, but that requires modern architectural designs counting with many-core era processors.

Cross-platform-benchmarks

Image 4

Chart 3: Cross-benchmark of C# lock(Monitor) / LockUC under Heavy Load scenario, .NET 4.7.2, 16 cores.

Image 5

Chart 4: Cross-benchmark of C# lock(Monitor) / LockUC under Heavy Load scenario, .Net Core 2.1, 16 cores.

Summary

This article serves as an announcement of the .NET Standard 2.0 version of the GreenSuperGreen library (with built-in Unified Concurrency) including some improvements of the library.

We have discussed and showed with cross-platform benchmarks a great potential and incentive for the upgrade from .NET to .NET Core 2.1+, thanks to JIT compilation improvements and also improvements to the multithreaded code, thanks to C# lock (Monitor class) improvements in reduction of CPU waste.

Revision History

  • 16th March, 2019: First version
  • 24th March, 2019: Correcting a few typos and 2 more cross-benchmarks

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
Czech Republic Czech Republic
Hi, I am Marek Pavlu,

I am a Software Engineer and Applied Physicist.
I love complex problems because they come with structure, constraints, limitations, interactions - it always helps me remember, understand, manipulate and solve the problem with a limited number of principles and rules helping me build solution capable to reduce the complexity of the problem.

I love logic, humanism, and ethics.
I like to follow politics of USA and especially congressional/senate hearingsSmile | :) .
I like to plant new trees in the large garden around the family house and in recent years I learned how to successfully grow roses.


www.linkedin.com/in/ipavlu

Comments and Discussions

 
-- There are no messages in this forum --