Click here to Skip to main content
15,889,992 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I see usually in high speed coding the use of #define and inline functions in order to speed up the programs.
I made some speed tests and as result that efforts are unuseful in windows.
The #define is useful only for me in linux but I tested in an old 1GHz computer.

Here are the results:
Quote:
Windows7 Intel 3.5Ghz
Release mode:
#define = 1.009 nanoseconds/operation
function = 0.997 nanoseconds/operation
inline function = 1.031 nanoseconds/operation

Debug mode:
#define = 3.467 nanoseconds/operation
function = 11.09 nanoseconds/operation
inline function = 11.81 nanoseconds/operation

LINUX AMD 1Ghz:
#define = 10.06 nanoseconds/operation
function = 14.28 nanoseconds/operation
inline function = 13.20 nanoseconds/operation


What I have tried:

It was compiled in linux using: gcc -std=c++11 -lstdc++ -o prue define_timing.cpp

I tried following code:

#include <iostream>
#include <chrono>

using namespace std;

#define INLIMIT2(x,xmin,xmax) ( (x)<(xmin) ? (xmin) : ((x)<(xmax) ? (x) : ((xmax)))) 
double inlimit2(double x, double xmin, double xmax)
{
	if (x < xmin)
		return xmin;
	else if (x >= xmax)
		return xmax;
	return x;
}


inline double inlimit3(double x, double xmin, double xmax)
{
	if (x < xmin)
		return xmin;
	else if (x >= xmax)
		return xmax;
	return x;
}

int main()
{
	const int top=1000000000;
	double *x=new double[top];
	int i;
	for (i = 0; i < top; i++) x[i] = 1.01*i;
	time_t ini, fin;
	double x2 = 0.0;

	cout << "Test using #define:" << endl; x2 = 0.0;
	ini = clock();
	for (i = 0; i < top; i++) x2 += INLIMIT2(x[i], 10.0, 100.0);
	fin = clock();
	cout << "Time/op=" << 1e9 / top*(1.0*fin - ini) / CLOCKS_PER_SEC << " nanoseconds/operation" << endl;
	cout << "Result=" << x2 << endl;

	cout << "Test using function:" << endl; x2 = 0.0;
	ini = clock();
	for (i = 0; i < top; i++) x2 += inlimit2(x[i], 10.0, 100.0);
	fin = clock();
	cout << "Time/op=" << 1e9 / top*(1.0*fin - ini) / CLOCKS_PER_SEC << " nanoseconds/operation" << endl;
	cout << "Result=" << x2 << endl;

	cout << "Test using inline function:" << endl; x2 = 0.0;
	ini = clock();
	for (i = 0; i < top; i++) x2 += inlimit3(x[i], 10.0, 100.0);
	fin = clock();
	cout << "Time/op=" << 1e9 / top*(1.0*fin - ini) / CLOCKS_PER_SEC << " nanoseconds/operation" << endl;
	cout << "Result=" << x2 << endl;

	delete x;
	cout << "=== END ===" << endl; getchar();
	return 1;
}
Posted
Updated 6-Jul-17 1:23am

Modern compilers treat the inline keyword as a suggestion. They may create a function with that keyword as inline or not and may also create functions without that keyword as inline.

If you really want to know what the compiler is doing you have to inspect the assembly output (GCC option -S and /FA option with the MS compiler).

It also depends on the compiler optimisations which are usually disabled with debug builds. Therefore, and due to additional code that is inserted with debug build like array bound checks, it makes usually no sense to do performance measurings with debug builds and compare the results with those from a release build.

When using macros, you will have inline code. So you might use macros instead of functions to force inlining.

But be aware of the drawbacks of macros (prone for unwanted side effects, no type checking, worse code readability).

When not using macros you can still ensure that the compiler generates more inline code by optimising for speed instead of size (MS compiler: /O2; GCC: do not not use -Os; use -O2 or -O3).

The GCC will not create inline functions when no optimisation is specified and only very few (if any) with -O and -O1. Because you have not specified any optimisation option, your Linux build has not used inline functions (the time difference may be due to uncertainty). I suggest to repeat your measurements after building with -O2 or -O3.
 
Share this answer
 
It depends on compiler optimizations, I suppose.
As a general advice, use the cleanest code you can and apply 'hand-crafted' optimization only whenever a speed-up is required (and possibly reached).
 
Share this answer
 
Comments
Javier Luis Lopez 6-Jul-17 4:11am    
Could depends also on the CPU or GPU?
Some CPUs allows making operations in parallel, so perhaps some optimizations avoids or allows using SSE, SSE2 or any other
CPallini 6-Jul-17 4:17am    
Yes.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900