Introduction
Nowadays, all personal computer and workstations come with multiple cores. Most .NET applications fail to harness the full potential of this computing power. Even when developers attempt to do so, it is generally by means of writing low level manipulation of threads and locks. This often leads to a situation, where the code becomes either un-readable or full of potential threats. These threats are often not detected if running on a single Core machine.
The task parallel library allows you to write code which is human readable, less error prone, and adjusts itself with the number of Cores available. So you can be sure that your software would auto-upgrade itself with the upgrading environment.
What Kind of Performance Boost are we Talking About?
What is the first thing that you try to do, when you see parts of your code not performing well. Lazy load, Linq queries, Optimizing For loops, etc. We often overlook parallelization in the time consuming independent units of work.
Most often, the CPU will show you the following story during your performance intensive routines.
Shouldn’t your CPU be utilized more like this?
Task Parallel Library
The Task Parallel Library (TPL) is a set of public types and APIs in the System.Threading
and System.Threading.Tasks
namespaces in the .NET Framework 4.0. The TPL scales the degree of concurrency dynamically to efficiently use all the cores that are available. By using TPL, you can maximize the performance of your code while focusing on the work that your program is designed to accomplish.
The Task Parallel Library introduces the concept of “Task”. Task parallelism is the process of running these tasks in parallel. A Task is an independent unit of work, which runs within a program. Benefits of identifying tasks within your system are:
- More efficient and more scalable use of system resources.
- More programmatic control than is possible with a thread or work item.
The task parallel library utilizes the Threads under the hood to execute these tasks in parallel. The decision and number of Threads to use is dynamically calculated by the runtime environment.
Why Tasks? Why Not Threads?
The creation of a thread comes with a huge cost. Creating a huge number of Threads within your application also comes with an overhead of Context Switching. In a single core environment, it might lead to a bad performance as well, since we have a single core which serves various threads.
The task on the other hand, dynamically calculates if it needs to create different threads of execution or not. It uses the ThreadPool
under the hood, in order to distribute the work, without going through the overhead of Thread
creation/or un-necessary context switching if not required.
Fig 1. The time difference between a traditional Thread based approach, and a task based approach.
The following code snippet shows the creation of parallel tasks using Threads and Task.
You can download the sample used above.
So how is this different from creating a thread again? Well, one of the first advantages of using Tasks over Threads is that it becomes easier to guarantee that you are going to maximize the performance of your application on any given system. For example, if I am going to fire off multiple threads that are all going to be doing heavy CPU bound work, on a single core machine, we are likely to cause the work to take significantly longer. It is clear, threading has overhead, and if you are trying to execute more CPU bound threads on a machine than you have available cores for them to run, then you can possibly run into problems. Each time the CPU has to switch from thread to thread, there is a bit of overhead, and if you have many threads running at once, then this switching can happen quite often, causing the work to take longer than if it had just been executed synchronously. This diagram might help spell that out for you a bit better:
As you can see, if we aren’t switching between pieces of work, then we don’t have the context switches between threads. So, the total cumulative time to process in that manner is much longer, even though the same amount of work was done. If these were being processed by two different cores, then we could simply execute them on two cores, and the two sets of work would get executed simultaneously, providing the highest possible efficiency.
Why Tasks? Why Not ThreadPools?
Now when we have a slight idea of Tasks and their capacity, let us look into these Tasks in a little more detail and how they are different from ThreadPool
s.
Let us see how you can start a new execution on a ThreadPool
.
Let us see what you will have to do if you wish to Wait()
for the thread to finish.
Messy! Isn’t is?.
What if you have to wait for 15 threads to finish?
How do you capture the return values from multiple threads?
How do you return the control back to GUI thread?
There are answers to it. Delegates, Raising events but this leads to an error prone situation when we drill into a chain of multi threaded actions.
Let us see how Tasks handle this situation elegantly:
Creation of a New Task
Waiting on Tasks:
Execute another Async
task when the current task is done:
In real world scenarios, we often have multiple operations which we want to perform asynchronously. Look at the following code snippet and see how you can model it alternatively.
Parallel Extensions
Parallel extensions have been introduced along with the Task Parallel Library to achieve data Parallelism. Data parallelism refers to scenarios in which the same operation is performed concurrently (that is, in parallel) on elements in a source collection or array. The .NET provides new constructs to achieve data parallelism by using Parallel.For
and Parallel.Foreach
constructs.
Let us see how we can use these:
The above mentioned Parallel.ForEach
construct utilizes the multiple cores and thus enhances the performance in the same fashion.
The following graph shows, how parallel extensions improve the performance of the system:
Fig 1.
Matrix multiplication running on a Dual Core machine. The parallel extensions consume less time.
Fig 2. Matrix multiplication running on a Quad Core machine. The same code consumes far less time without any modifications
Fig 3. Matrix multiplication running on a single core machine. The execution time remains identical.
You can download the code from the link at the top of this article.
Conclusion
The parallel extensions and the task parallel library help developers to leverage the full potential of the available hardware capacity. The same code can adjust itself to give you the benefits across various hardware. It also improves the readability of the code and thus reduces the risk of introducing nasty bugs which drives developers crazy.
History
- 9th April, 2012: Initial version
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.