Click here to Skip to main content
16,020,447 members
Articles / Programming Languages / C#

Parallel Processing

Rate me:
Please Sign up or sign in to vote.
4.11/5 (3 votes)
21 Sep 2009CPOL3 min read 198.3K   28   5
Utilizing your CPU cores with Parallel Extensions (TPL).

Introduction

As you can probably tell from my previous articles, I have a thing for optimization utilizing hardware resources, primarily CPU cores. This article is no exception, and deals with the new TPL libraries. This example uses the CTP Parallel Extensions for .NET 3.5, and is geared towards an intermediate audience.

Background

The goal of this project was to utilize the TPL to speed up calculation performance of a simple report. This example uses a LINQ group by, then performs calculations in parallel using TPL. The code attempts a fairly typical sequential calculation, and reports the time taken; it's immediately followed up by running the same calculations in parallel and displaying the results.

Since this project is geared towards intermediate developers, I will not go into detail on some aspects of this code. The reader should have a decent understanding of LINQ, Predicates, and of course, TPL.

Using the code

The project utilizes an abstract class. I didn't want to rewrite timer and calculation code twice, and perhaps thrice if I were adventurous enough to attempt a third theory.

The method that needs to be overridden is StartCalculations; the rest of the code handles the timer for the comparison of the various methods if I choose to create more.

C#
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace ParallelClass
{
    public abstract class ReportCalculations
    {
        private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();
        private long _elapsedTime;

        public long Elapsed_ms
        {
            get { return _elapsedTime; }
        }


        public void Begin(IEnumerable<IGrouping<int, 
               IGrouping<int, CompanyInfo>>> companyGroups)
        {
            var sw = new Stopwatch();
            sw.Start();

            StartCalculations(companyGroups);

            sw.Stop();
            _elapsedTime = sw.ElapsedMilliseconds;
        }

        public virtual string Name
        {
            get { return "Generic Report Class"; }
        }

        public virtual void StartCalculations(IEnumerable<IGrouping<int, 
               IGrouping<int, CompanyInfo>>> companyGroups)
        {
            
        }
    }
}

The rest of the code is pretty straightforward. The class accepts an int that defines the number of transaction lines to be initialized in the database. The database in this example is just a collection of CompanyInfo classes.

In the abstract class, you'll notice it is passed an interesting IEnumerable object. This is the result of the nested LINQ query devised in the code below in the GetGrouping method. The GetGrouping method groups the objects by CompanyID, then TransactionCode, so it's easier to handle multiple calculations via TPL.

The PopulateCompanyTransactions method randomly generates all the transactions that will be up for sorting and calculating.

In this example, I have two classes that are derived from our abstract class ReportCalculations.

They are MySeq and MyTPL. MySeq operates a typical sequential loop that calculates each transaction group on a single thread. The latter, MyTPL, caclulates the sums utilizing all CPUs present when possible.

C#
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;

namespace ParallelClass
{
    public class CompanyInfo
    {
        public int CompanyId { get; set; }
        public int TransactionCode { get; set; }
        public decimal Amount { get; set; }
    }

    public class Process
    {
        
       public Process(int recordsToProcess)
        {
            var rec = PopulateCompanyTransactions(recordsToProcess);
            var grouping = GetGrouping(rec);

            var calcClasses = new List<ReportCalculations> { new MySeq(), new MyTPL() };
            
            foreach(var calc in calcClasses)
            {
                calc.Begin(grouping);
                Console.WriteLine("{0} : {1}", calc.Name, calc.Elapsed_ms);
            }
            
            Console.ReadLine();
        }
       
        //Group records by Company then by Transaction
        private static IEnumerable<IGrouping<int, IGrouping<int, 
                CompanyInfo>>> GetGrouping(IEnumerable<CompanyInfo> companyInfos)
        {
            var query = from company in companyInfos
                        group company by company.CompanyId
                        into companyGroup
                            from transactionGroup in
                            (
                                from company in companyGroup
                                group company by company.TransactionCode
                            )
                            group transactionGroup by companyGroup.Key;

            return query;
        }


        //Populate record values with random data
        private static List<CompanyInfo> PopulateCompanyTransactions(int totalRecords)
        {
            var rnd = new Random();
            var companyInfo = new List<CompanyInfo>();

            for (int count = 0; count < totalRecords; count++)
                companyInfo.Add(new CompanyInfo
                            {
                                Amount = (decimal) (rnd.Next(-50, 1000)*rnd.NextDouble()),
                                CompanyId = rnd.Next(0, 100),
                                TransactionCode = rnd.Next(100, 120)
                            });
            return companyInfo;
        }
    }

    public class MySeq : ReportCalculations
    {
        private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();
        public override string Name { get { return "Sequential"; } }

        public override void StartCalculations(IEnumerable<IGrouping<int, 
               IGrouping<int, CompanyInfo>>> companyGroups)
        {
            foreach (var firstGroup in companyGroups)
            {
                foreach (var secondGroup in firstGroup)
                {
                    decimal total = 0;
                    foreach (var details in secondGroup)
                        total += details.Amount;

                    _totals.Add(new CompanyInfo { Amount = total, 
                       CompanyId = firstGroup.Key, TransactionCode = secondGroup.Key });
                }
            }

        }
    }

    public class MyTPL : ReportCalculations
    {
        private readonly List<CompanyInfo> _totals = new List<CompanyInfo>();

        public override string Name { get { return "TPL"; } }

        public override void StartCalculations(IEnumerable<IGrouping<int, 
               IGrouping<int, CompanyInfo>>> companyGroups)
        {
            
            foreach (var firstGroup in companyGroups)
                Parallel.ForEach(firstGroup, group => Calculate(group, firstGroup.Key));
        }

        //TPL Parallel method
        private void Calculate(IGrouping<int, CompanyInfo> grouping, int companyID)
        {
            decimal total = 0;
            Parallel.ForEach(grouping, g => { total += g.Amount; });
            _totals.Add(new CompanyInfo { Amount = total, 
               CompanyId = companyID, TransactionCode = grouping.Key });
        }   
    }
}

Conclusion

Initially I didn't expect a big difference in the calculations because my hypothesis was that by the time the underlying thread handler did an analysis, the last thread would have already completed, and this seems to be the case with limited levels of transactions.

My experience shows a huge performance boost when transaction groups contain enough lines that the thread is still alive by the time the handler checks. This boost, in this example, starts at about 1 million transactions. I was getting a 23%-25% boost in performance, consistently, with 10 million transactions, but a much slower result with anything below 1 million.

The impact this project shows should be obvious in that although parallel models are theoretically more efficient, there are cases that need more investigation and benchmarking prior to implementation.

This example should not be a definitive benchmarking guide for your own work, since each case of parallel design is going to be unique.

For fun, try creating a new calculation class, and outperform both the MySeq and MyTPL classes at any number of transactions.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



Comments and Discussions

 
QuestionC# Parallel.For multi thread parallel Pin
ytyet20058-Oct-14 14:31
ytyet20058-Oct-14 14:31 
C# Parallel.For multi thread parallel, in native win7 64 test can be multiple threads to run in parallel, but deployed after the Windows Server 2008 R2, a maximum of 16 threads running in parallel, people know this is why? Have set the maximum thread does not limit.
GeneralLike it. Pin
Abhishek Sur21-Sep-09 21:55
professionalAbhishek Sur21-Sep-09 21:55 
GeneralGreat point Pin
Goran Bacvarovski21-Sep-09 8:36
Goran Bacvarovski21-Sep-09 8:36 
GeneralRe: Great point Pin
User 527145421-Sep-09 9:25
User 527145421-Sep-09 9:25 
GeneralRe: Great point Pin
Goran Bacvarovski24-Sep-09 9:29
Goran Bacvarovski24-Sep-09 9:29 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.