Click here to Skip to main content
15,887,214 members
Articles / Programming Languages / C# 7.0
Tip/Trick

Real Case of Premature Micro-Optimization

Rate me:
Please Sign up or sign in to vote.
4.33/5 (3 votes)
31 Jan 2020CPOL2 min read 8.8K   19   1   3
Ternary Operator vs Lookup Table Benchmark

The benchmark is hosted at Github.

I discovered a real case of premature micro-optimization when you don't measure it. You know it is pretty bad when you read premature and micro in the same sentence. On page 44 of Optimizing C++ ebook by Agner Fog, it is written ...

In some cases it is possible to replace a poorly predictable branch by a table lookup. For example:

C++
float a;
bool b;
a = b ? 1.5f : 2.6f;

The ?: operator here is a branch. If it is poorly predictable then replace it by a table lookup:

C++
float a;
bool b = 0;
const float lookup[2] = {2.6f, 1.5f};
a = lookup[b];

If a bool is used as an array index then it is important to make sure it is initialized or comes from a reliable source so that it can have no other values than 0 or 1. In some cases the compiler can automatically replace a branch by a conditional move.

I was trying to implement this lookup table optimization (to replace ternary operator) on a floating-point value which the code is compiled with G++ and ran on Linux. I also ran the integer benchmark and on other compilers such like Visual C++ 2019 and Clang and also the Visual C# 7 to see their differences. Below is the C++ benchmark code. The lookup array is declared as a static local variable inside the function.

C++
int64_t IntTernaryOp(bool value)
{
    return value ? 3 : 4;
}

int64_t IntArrayOp(bool value)
{
    static const int64_t arr[2] = { 3, 4 };
    return arr[value];
}

double FloatTernaryOp(bool value)
{
    return value ? 3.0f : 4.0f;
}

double FloatArrayOp(bool value)
{
    static const double arr[2] = { 3.0f, 4.0f };
    return arr[value];
}

int main()
{
    const size_t MAX_LOOP = 1000000000;
    
    int64_t sum = 0;
    double sum_f = 0;

    timer stopwatch;

    stopwatch.start("IntTernaryOp");
    sum = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntTernaryOp(k % 2);
    }
    stopwatch.stop(sum);

    stopwatch.start("IntArrayOp");
    sum = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntArrayOp(k % 2);
    }
    stopwatch.stop(sum);

    stopwatch.start("FloatTernaryOp");
    sum_f = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatTernaryOp(k % 2);
    }
    stopwatch.stop(sum_f);

    stopwatch.start("FloatArrayOp");
    sum_f = 0;
    for (size_t k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatArrayOp(k % 2);
    }
    stopwatch.stop(sum_f);

    return 0;
}

In Visual C# code, static local variable is not permitted so the lookup array is declared as a static member variable inside the class. C# does not allow casting integer bool to integer (to be used as array index), so I use an integer instead.

C#
static Int64[] arrInt64 = new Int64[] { 3, 4 };
static Double[] arrDouble = new Double[] { 3.0, 4.0 };

static Int64 IntTernaryOp(Int64 value)
{
    return (value==1) ? 3 : 4;
}

static Int64 IntArrayOp(Int64 value)
{
    return arrInt64[value];
}

static double FloatTernaryOp(Int64 value)
{
    return (value == 1) ? 3.0f : 4.0f;
}

static double FloatArrayOp(Int64 value)
{
    return arrDouble[value];
}

static void Main(string[] args)
{
    const int MAX_LOOP = 1000000000;

    Int64 sum = 0;
    double sum_f = 0.0;

    Stopwatch stopWatch = new Stopwatch();
    stopWatch.Start();
    sum = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntTernaryOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("IntTernaryOp", stopWatch.Elapsed, sum);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum += IntArrayOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("IntArrayOp", stopWatch.Elapsed, sum);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum_f = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatTernaryOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("FloatTernaryOp", stopWatch.Elapsed, sum_f);

    stopWatch = new Stopwatch();
    stopWatch.Start();
    sum_f = 0;
    for (Int64 k = 0; k < MAX_LOOP; ++k)
    {
        sum_f += FloatArrayOp(k % 2);
    }
    stopWatch.Stop();
    DisplayElapseTime("FloatArrayOp", stopWatch.Elapsed, sum_f);
}

These are the benchmark results on a computer with Intel i7-8700 at 3.2Ghz with 16GB RAM.

VC++ /Ox results
       IntTernaryOp:  562ms, result:3500000000
         IntArrayOp:  523ms, result:3500000000
     FloatTernaryOp: 3972ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
G++ 7.4.0 -O3 results
       IntTernaryOp:  306ms, result:3500000000
         IntArrayOp:  519ms, result:3500000000
     FloatTernaryOp: 1030ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
Clang++ 6.0.0 -O# results
       IntTernaryOp:  585ms, result:3500000000
         IntArrayOp:  523ms, result:3500000000
     FloatTernaryOp: 1030ms, result:3.5e+09
       FloatArrayOp: 1030ms, result:3.5e+09
C# 7 Release Mode, .NET Framework 4.7.2
       IntTernaryOp: 1311ms, result:3500000000
         IntArrayOp: 1038ms, result:3500000000
     FloatTernaryOp: 2448ms, result:3500000000
       FloatArrayOp: 1036ms, result:3500000000

For the integer benchmark, the lookup table got worse performance than lookup table with G++, while in Visual C++/C# and Clang++, there is improvement. As it turns out in the floating benchmark I am looking for (in G++), improvement is only seen in Visual C++/C# while G++ and Clang++ retained the same timing for both ternary and lookup table code. I did check the assembly code at the Godbolt Compiler Explorer, none of the ternary operator is converted to use conditional move as suggested by Agner Fog's book so I guess the short branch is still with the CPU cache, therefore it makes little difference in G++ and Clang++ floating-point test. The morale of the story is never take a book's word at its value and always measure to confirm your hypothesis. In my case, I forgo this lookup micro-optimization.

History

  • 31st January, 2020: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Software Developer (Senior)
Singapore Singapore
Shao Voon is from Singapore. His interest lies primarily in computer graphics, software optimization, concurrency, security, and Agile methodologies.

In recent years, he shifted focus to software safety research. His hobby is writing a free C++ DirectX photo slideshow application which can be viewed here.

Comments and Discussions

 
QuestionBenchmarking conditions Pin
YvesDaoust4-Feb-20 3:20
YvesDaoust4-Feb-20 3:20 
QuestionNot sure if I got the purpose. Pin
Paulo Zemek31-Jan-20 16:44
mvaPaulo Zemek31-Jan-20 16:44 
By the title I see this is micro-optimization. I see lot of numbers... but did you got anything meaningful from this?

In my opinion, something you can do billions of times per second rarely needs the optimization... it is already fast enough.

Also, micro-benchmarks usually forget the most important things: Memory access.
CPUs have all kinds of memory-caching. Things can be cached to the register level, to L1, L2, L3 cache etc.
Also, either CPU specific of OS specific, there's instruction cache, data-cache etc.

When you do a micro-optimization test, you only see the results of that particular micro-optimization being run without anything else interfering... while in the real situation that thing might be getting out of cache, and then back into cache, when really needed, and by using the most "obvious" approach, that might not happen.

So... what's the real lesson here?
Even if it is possible to see an advantage in some benchmark, never ever try to do that?

AnswerRe: Not sure if I got the purpose. Pin
Shao Voon Wong2-Feb-20 11:53
mvaShao Voon Wong2-Feb-20 11:53 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.