Click here to Skip to main content
15,867,568 members
Articles / General Programming / Performance

So You've Got 10M Particles to Simulate

Rate me:
Please Sign up or sign in to vote.
4.76/5 (21 votes)
8 Sep 2021CPOL3 min read 33.2K   257   27   55
Performance of data structures in C++ vs. C#
I've experienced that raw arrays, structs, and classes have different performance profiles. I wanted to put this experience to the test with programs written in C++ and C#.

Introduction

Let's say you're writing a particle simulator physics program. You need to simulate a lot of particles, say 10 million. To represent each particle, you have position, velocity, and acceleration. At each iteration of your program, you need to update the position with the velocity, the velocity with the acceleration, and for the sake of this article, the acceleration is constant. There would be x, y, and z, but let's just focus on one dimension.

To model your data in a computer program, you could use raw arrays for the position, velocity, and acceleration for each particle, or you could create a struct - C++ or C# - and put the position, velocity, and acceleration in there, or you could create a class like the struct.

It would seem that raw arrays would have the best performance, then C++ structs, then C++ classes, then C# structs, then C# classes. I say this because C++ structs are so thin, and C++ classes are identical to C++ structs in memory, and C# structs are pretty lean, but C# classes incur a whole other level of performance overhead, namely that an array of C# structs have data locality, but C# classes are all over the place.

The Test Programs

Using the Code

The code.zip contains the solution. plusplusms is the C++ test rig, and sharpms is the C# one.

plusplusms has a types.h file with the C++ data structures:

C++
#pragma once

// UPDATE: adding a dummy field for cache alignment (RE: Daniel Pfeffer) did not pay off
struct BareStruct
{
    double p, v, a, dummy;
};

class HeaderClass
{
public:
	HeaderClass()
	{
		set(0, 0, 0);
	}

	void set(double p, double v, double a)
	{
		m_p = p;
		m_v = v;
		m_a = a;
	}

    // UPDATE: Separating out updating position from updating velocity did not pay off
	void updatep()
	{
		m_p += m_v;
	}

	void updatev()
	{
		m_v += m_a;
	}

    // NOTE: Having one update function that calls the other two performed very well,
    //       inlining all around.
	void update()
	{
		updatep();
		updatev();
	}

private:
	double m_p, m_v, m_a, dummy;
};

class SourceClass
{
public:
	SourceClass();

	void set(double p, double v, double a);

    // UPDATE: Separating out...did not pay off
	//void updatep();
	//void updatev();
	void update();

private:
	double m_p, m_v, m_a, dummy;
};

types.cpp has the implementation of the SourceClass, same code as HeaderClass, just separated out. This separation was to test Link Time Code Generation (LTCG) performance, whether it would result in the same performance as the header-only implementation.  UPDATE: LTCG definitely works as advertised.

The C++ main.cpp has macros for the scale of the tests and timing results, and functions that do the data processing.

C++
#include "types.h"

#include <stdio.h>
#include <stdlib.h>

#include <chrono>

#define SAMPLE_SIZE 10 * 1024 * 1024
#define TEST_RUNS 4

double parr[SAMPLE_SIZE];
double varr[SAMPLE_SIZE];
double aarr[SAMPLE_SIZE];

BareStruct structs[SAMPLE_SIZE];

HeaderClass headers[SAMPLE_SIZE]; 
SourceClass sources[SAMPLE_SIZE];

typedef std::chrono::high_resolution_clock timer;
typedef std::chrono::milliseconds ms;
typedef std::chrono::duration<int> fsec;

#define START_RUN(name) { timer t; printf("%s: ", name); auto start = t.now();
#define END_RUN() \
    auto end = t.now(); \
    int elapsedMs = (int)std::chrono::duration_cast<ms>(end - start).count(); \
    printf("%d ms\n", elapsedMs); \
}

void updateArrays1()
{
    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        parr[i] += varr[i];
    }

    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        varr[i] += aarr[i];
    }
}

void updateArrays2()
{
    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        parr[i] += varr[i];
        varr[i] += aarr[i];
    }
}

void updateStructs1()
{
    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        structs[i].p += structs[i].v;
    }

    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        structs[i].v += structs[i].a;
    }
}

void updateStructs2()
{
    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        structs[i].p += structs[i].v;
        structs[i].v += structs[i].a;
    }
}

void updateHeaders1()
{
    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        headers[i].updatep();
    }

    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        headers[i].updatev();
    }
}

void updateHeaders2()
{
    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        headers[i].update();
    }
}

void updateSources1()
{
    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        sources[i].updatep();
    }

    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        sources[i].updatev();
    }
}

void updateSources2()
{
    for (size_t i = 0; i < SAMPLE_SIZE; ++i)
    {
        sources[i].update();
    }
}

int main()
{
    printf("Setting up...\n");
    srand(0);
    for (int i = 0; i < SAMPLE_SIZE; ++i)
    {
        structs[i].p = parr[i] = rand();
        structs[i].v = varr[i] = rand();
        structs[i].a = aarr[i] = rand();

        headers[i].set(structs[i].p, structs[i].v, structs[i].a);
        sources[i].set(structs[i].p, structs[i].v, structs[i].a);
    }

    for (int o = 1; o <= 3; ++o)
    {
        START_RUN("arrays separate");
        for (int r = 1; r <= TEST_RUNS; ++r)
        {
            updateArrays1();
        }
        END_RUN();

        START_RUN("arrays together");
        for (int r = 1; r <= TEST_RUNS; ++r)
        {
            updateArrays2();
        }
        END_RUN();

        START_RUN("structs separate");
        for (int r = 1; r <= TEST_RUNS; ++r)
        {
            updateStructs1();
        }
        END_RUN();

        START_RUN("structs together");
        for (int r = 1; r <= TEST_RUNS; ++r)
        {
            updateStructs2();
        }
        END_RUN();

        START_RUN("header classes separate");
        for (int r = 1; r <= TEST_RUNS; ++r)
        {
            updateHeaders1();
        }
        END_RUN();

        START_RUN("header classes together");
        for (int r = 1; r <= TEST_RUNS; ++r)
        {
            updateHeaders2();
        }
        END_RUN();

        START_RUN("source classes separate");
        for (int r = 1; r <= TEST_RUNS; ++r)
        {
            updateSources1();
        }
        END_RUN();

        START_RUN("source classes together");
        for (int r = 1; r <= TEST_RUNS; ++r)
        {
            updateSources2();
        }
        END_RUN();

        printf("\n");
    }
}

The C# code is more concise, as to be expected.

C#
using System;
using System.Diagnostics;

namespace sharpms
{
    // UPDATE: Adding the dummy field did not pay off in C# either
    struct Struct
    {
        public double p, v, a, dummy; 
    }

    // UPDATE: marked class sealed (RE: Andre_Prellwitz) did not pay off
    sealed class Class 
    {
        public void updatep()
        {
            p += v;
        }

        public void updatev()
        {
            v += a;
        }

        public void update()
        {
            p += v;
            v += a;
        }

        public double p, v, a, dummy;
    }

    class Program
    {
        const int SAMPLE_SIZE = 10 * 1024 * 1024;
        const int TEST_RUNS = 4;

        static double[] parr = new double[SAMPLE_SIZE];
        static double[] varr = new double[SAMPLE_SIZE];
        static double[] aarr = new double[SAMPLE_SIZE];

        static Struct[] structs = new Struct[SAMPLE_SIZE];
        static Class[] classes = new Class[SAMPLE_SIZE];

        static void updateArrays1()
        {
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                parr[i] += varr[i];
            }

            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                varr[i] += aarr[i];
            }
        }

        static void updateArrays2()
        {
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                parr[i] += varr[i];
                varr[i] += aarr[i];
            }
        }

        // UPDATE: Resorting to unsafe code (// RE: igrulya) performs well,
        //         but is not necessary as the next solution shows
        static unsafe void updateArrays3() 
        {
            fixed (double* pparr0 = &parr[0])
            fixed (double* pvarr0 = &varr[0])
            fixed (double* paarr0 = &aarr[0])
            {
                double* pparr = pparr0;
                double* pvarr = pvarr0;
                double* paarr = paarr0;
                int i = SAMPLE_SIZE;
                while (i-- > 0)
                {
                    *pparr++ += *pvarr;
                    *pvarr++ += *paarr++;
                }
            }
        }

        // RE: <a href="https://www.codeproject.com/script/Membership/View.aspx?mid=15342246">OracPrime</a>
        static void updateArrays4() 
        {
            var pref = parr;
            var vref = varr;
            var aref = aarr;
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                pref[i] += vref[i];
                vref[i] += aref[i];
            }
        }

        static void updateStructs1()
        {
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                structs[i].p += structs[i].v;
            }

            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                structs[i].v += structs[i].a;
            }
        }

        static void updateStructs2()
        {
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                structs[i].p += structs[i].v;
                structs[i].v += structs[i].a;
            }
        }

        static void updateClasses1()
        {
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                classes[i].updatep();
            }

            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                classes[i].updatev();
            }
        }

        static void updateClasses2()
        {
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                classes[i].update();
            }
        }

        static void updateClasses3()
        {
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                classes[i].p += classes[i].v;
                classes[i].v += classes[i].a;
            }
        }

        static Stopwatch sw = new Stopwatch();

        static void Start(string phase)
        {
            Console.Write(phase + ": ");
            sw.Restart();
        }

        static void Finish()
        {
            Console.WriteLine(sw.ElapsedMilliseconds + " ms");
        }

        static void Main(string[] args)
        {
            Random rnd = new Random(0);
            for (int i = 0; i < SAMPLE_SIZE; ++i)
            {
                structs[i] = new Struct();
                classes[i] = new Class();

                classes[i].p = structs[i].p = parr[i] = rnd.Next();
                classes[i].v = structs[i].v = varr[i] = rnd.Next();
                classes[i].a = structs[i].a = aarr[i] = rnd.Next();
            }

            for (int run = 1; run <= 3; ++run)
            {
                Start("arrays separate");
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateArrays1();
                }
                Finish();

                Start("arrays together");
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateArrays2(); // bug fix RE: Kermel
                }
                Finish();

                Start("arrays unsafe");
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateArrays3();
                }
                Finish();

                Start("arrays with refs");
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateArrays4();
                }
                Finish();

                Start("structs separate");
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateStructs1();
                }
                Finish();

                Start("structs together");
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateStructs2();
                }
                Finish();

                Start("classes separate");
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateClasses1();
                }
                Finish();

                Start("classes together");
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateClasses2();
                }
                Finish();

                Start("classes inlined"); // RE: Andre_Prellwitz
                for (int r = 1; r <= TEST_RUNS; ++r)
                {
                    updateClasses3();
                }
                Finish();
            }
        }
    }
}

The C++ Results

C++
arrays separate: 75 ms
arrays together: 76 ms

structs separate: 182 ms
structs together: 100 ms

header classes separate: 182 ms
header classes together: 98 ms

source classes separate: 189 ms
source classes together: 100 ms

You can see that using raw arrays is much faster, 2X, than doing the separate technique with structs or classes. Note that LTCG inlines the Source class's code perfectly, and classes are a bit faster than structs. Interesting.

The C# Results

C#
arrays separate: 129 ms
arrays together: 112 ms

arrays unsafe: 91 ms
arrays with refs: 90 ms

structs separate: 196 ms
structs together: 104 ms

classes separate: 384 ms
classes together: 191 ms
classes inlined: 204 ms

Arrays in C# do not have the great performance gain over others as in C++. Note that C# structs have comparable performance with C++ classes. And C# structs are faster than C# arrays. Strange. As expected, C# classes do not perform well, especially with separated updating.

Conclusion and Points of Interest

  • If you want optimum data processing performance, parallel arrays in C/C++ are the way to go. You had to know that going in.
  • If that code pattern is unacceptable, using C++ classes or C# structs are good choices. The code is more manageable than parallel arrays, and the performance is good.
  • C# classes should be avoided. Sealing them and inlining their code does not make them worthwhile.
  • Using separate loops for handling different fields in classes or structs does not pay off at all, and should be avoided due to the negative effect is has on code quality. Data caching seems to trump code caching big time in this use case.
  • In C#, for parallel arrays you can get the same performance as unsafe code by using local variables to "pin" the data in place while you work with it.

History

  • 13th August, 2021: Initial version
  • 26th August, 2021: Took Andre_Prellwitz's comment and applied it to the code and the article

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer
United States United States
Michael Balloni is a manager of software development at a cybersecurity software and services provider.

Check out https://www.michaelballoni.com for all the programming fun he's done over the years.

He has been developing software since 1994, back when Mosaic was the web browser of choice. IE 4.0 changed the world, and Michael rode that wave for five years at a .com that was a cloud storage system before the term "cloud" meant anything. He moved on to a medical imaging gig for seven years, working up and down the architecture of a million-lines-code C++ system.

Michael has been at his current cybersecurity gig since then, making his way into management. He still loves to code, so he sneaks in as much as he can at work and at home.

Comments and Discussions

 
GeneralMy vote of 5 Pin
Ștefan-Mihai MOGA3-Nov-21 4:35
professionalȘtefan-Mihai MOGA3-Nov-21 4:35 
GeneralRe: My vote of 5 Pin
Michael Sydney Balloni3-Nov-21 5:59
professionalMichael Sydney Balloni3-Nov-21 5:59 
QuestionMessage Closed Pin
9-Sep-21 2:20
Lucy Chitamaluka9-Sep-21 2:20 
QuestionPadding Pin
Paramecium136-Sep-21 5:26
professionalParamecium136-Sep-21 5:26 
AnswerRe: Padding Pin
Michael Sydney Balloni7-Sep-21 10:33
professionalMichael Sydney Balloni7-Sep-21 10:33 
Question"Let's say you're writing a particle simulator physics program" Pin
Jon6-Sep-21 0:48
Jon6-Sep-21 0:48 
AnswerRe: "Let's say you're writing a particle simulator physics program" Pin
Michael Sydney Balloni7-Sep-21 10:31
professionalMichael Sydney Balloni7-Sep-21 10:31 
GeneralIt's all about Data Locality Pin
David On Life3-Sep-21 11:55
David On Life3-Sep-21 11:55 
GeneralRe: It's all about Data Locality Pin
Michael Sydney Balloni5-Sep-21 6:22
professionalMichael Sydney Balloni5-Sep-21 6:22 
Thanks, I'm glad you like. It's been great to see so many ideas. I think we've all learned a lot.

In terms of your suggestion, I tried it, and it was consistently around 6% faster than direct struct offsets. The numbers I saw were like 116ms versus 123ms. I think the compiler optimizes away all the offsets, but you're definitely on to something.

A similar approach in C++...

void updateAray2()
{
for (auto& obj : aray)
{
obj.update();
}
}

...results in similar performance. And isn't that pretty?

QuestionAll described approaches are bad Pin
SMD1113-Sep-21 10:40
SMD1113-Sep-21 10:40 
AnswerRe: All described approaches are bad Pin
Michael Sydney Balloni5-Sep-21 6:10
professionalMichael Sydney Balloni5-Sep-21 6:10 
GeneralRe: All described approaches are bad Pin
Southmountain8-Oct-21 16:54
Southmountain8-Oct-21 16:54 
GeneralRe: All described approaches are bad Pin
Michael Sydney Balloni10-Oct-21 9:54
professionalMichael Sydney Balloni10-Oct-21 9:54 
QuestionC++ has generic array classs Pin
James Rinkevich3-Sep-21 8:49
James Rinkevich3-Sep-21 8:49 
AnswerRe: C++ has generic array classs Pin
Michael Sydney Balloni4-Sep-21 17:38
professionalMichael Sydney Balloni4-Sep-21 17:38 
QuestionNice to know. Pin
SlovakBird013-Sep-21 6:07
SlovakBird013-Sep-21 6:07 
QuestionC# can be made faster and still safe... Pin
OracPrime31-Aug-21 3:04
OracPrime31-Aug-21 3:04 
AnswerRe: C# can be made faster and still safe... Pin
Michael Sydney Balloni31-Aug-21 15:16
professionalMichael Sydney Balloni31-Aug-21 15:16 
GeneralRe: C# can be made faster and still safe... Pin
OracPrime1-Sep-21 4:18
OracPrime1-Sep-21 4:18 
GeneralRe: C# can be made faster and still safe... Pin
Michael Sydney Balloni1-Sep-21 18:33
professionalMichael Sydney Balloni1-Sep-21 18:33 
GeneralRe: C# can be made faster and still safe... Pin
OracPrime7-Sep-21 3:31
OracPrime7-Sep-21 3:31 
GeneralRe: C# can be made faster and still safe... Pin
Andre_Prellwitz18-Sep-21 11:06
Andre_Prellwitz18-Sep-21 11:06 
GeneralRe: C# can be made faster and still safe... Pin
Michael Sydney Balloni20-Sep-21 4:19
professionalMichael Sydney Balloni20-Sep-21 4:19 
QuestionMessage Closed Pin
27-Aug-21 1:22
Member 1533816327-Aug-21 1:22 
QuestionC++ weakness Pin
jmaida26-Aug-21 17:59
jmaida26-Aug-21 17:59 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.