Click here to Skip to main content
15,887,175 members
Articles / Programming Languages / C
Article

Single Thread vs Dual Thread

Rate me:
Please Sign up or sign in to vote.
2.90/5 (39 votes)
9 Oct 20042 min read 101.6K   357   13   47
An introduction to one of the optimization methods for Intel Dual Xeon HT technology.

Sample Image - PI/PI.jpg

Contents

Objective

My objective of posting this article is to share some simple optimisation method. That's it!

Introduction

This article is demonstrating how the dual thread would perform better than single thread especially in Intel dual Xeon with HT technology. The demo that I post here is a simple PI calculation as an example of computational expensive function. Both PI functions will run at 100% CPU utilization.

Requirement

Very simple. In order to see significant improvement, you need to practice it on dual-CPU machine.

Code

The demo code is very simple. Here's the sample starts..

SingleThreadPI and DualThreadPI will do exactly the same thing, same result, but different timing. The main difference is that the DualThreadPI function is written by using Data Parallelization method.

#include "stdafx.h"

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <conio.h>
#include <string.h>
#include <time.h> 

#define NUM_THREADS 2 
 
HANDLE thread_handles[NUM_THREADS];
CRITICAL_SECTION hUpdateMutex;
static long num_steps = 100000 * 1000;
double step;
double global_sum = 0.0; 
 
int SingleThreadPI();
int DualThreadPI();
void ThreadPI(void *arg); 
 
int main(int argc, char* argv[])
{
  // Single Thread
  SingleThreadPI(); 
 
  // Dual Thread
  DualThreadPI(); 
 
  getchar(); 
  return 0;
} 
 
int SingleThreadPI()
{
  int start, end, total;
  start = clock(); 
  int i;
  double x, pi, sum = 0.0;
  step = 1.0/(double) num_steps;
 
  for (i=1;i<= num_steps; i++)
  {
    x = (i-0.5)*step;
    sum = sum + 4.0/(1.0+x*x);
  } 
 
  pi = step * sum;
  printf("\nSingleThreadPI\n");
  printf(" pi is %f \n",pi); 
  end = clock();
  total = end - start;
  printf("%d\n", total); 
 
  return 0;
} 
 
int DualThreadPI()
{
  int start, end, total;
  start = clock(); 
  double pi;
  int i;
  unsigned long threadID;
  int threadArg[NUM_THREADS]; 
 
  for(i=0; i<NUM_THREADS; i++)
    threadArg[i] = i+1; 
 
  InitializeCriticalSection(&hUpdateMutex); 
 
  for (i=0; i<NUM_THREADS; i++)
  {
    thread_handles[i] = CreateThread(0, 0, 
        (LPTHREAD_START_ROUTINE) ThreadPI, &threadArg[i], 0, &threadID);
  } 
 
  // wait until both threads end its processing
  WaitForMultipleObjects(NUM_THREADS, thread_handles, TRUE,INFINITE); 
 
  pi = global_sum * step;
  printf("\nDualThreadPI\n");
  printf(" pi is %f \n",pi); 
  end = clock();
  total = end - start;
  printf("%d\n", total); 
 
  return 0;
} 
 
void ThreadPI(void *arg)
{
  int i, start;
  double x, sum = 0.0; 
  start = *(int *) arg; // arg is actually the thread ID, i in the loop.
  step = 1.0/(double) num_steps; 
 
  // NUM_THREADS in this case is 2
  // so that thread ID=1, ThreadPI will compute odd number in the loop
  //         thread ID=2, ThreadPI will compute even number in the loop
  for (i=start;i<= num_steps; i=i+NUM_THREADS)
  {
    x = (i-0.5)*step;
    sum = sum + 4.0/(1.0+x*x);
  } 
 
  // try to make atomic statement, otherwise...
  EnterCriticalSection(&hUpdateMutex);
  global_sum += sum;
  LeaveCriticalSection(&hUpdateMutex);
}

Test Performance

Based on profiling, this type of optimisation, as expected, running on uni-processor machine will not have significant improvement. Unless run on dual-processor machine, definitely you will see big difference between single thread and dual thread. here's the live sample...

Machine -> Pentium 4 (2.4GHz, 1GB Memory)

Profile: Function timing, sorted by time
Date:    Sun Oct 10 02:23:09 2004 

Program Statistics
------------------
    Command line at 2004 Oct 10 02:23: "G:\j2\net\code project\article - 
         Optimise For Intel HT Technology\InLocal\PI\Release\PI" 
    Total time: 5342.205 millisecond
    Time outside of functions: 38.445 millisecond
    Call depth: 2
    Total functions: 8
    Total hits: 3
    Function coverage: 37.5%
    Overhead Calculated 4
    Overhead Average 4 
Module Statistics for pi.exe
----------------------------
    Time in module: 5303.760 millisecond
    Percent of time in module: 100.0%
    Functions in module: 8
    Hits in module: 3
    Module function coverage: 37.5% 
        Func          Func+Child           Hit
        Time   %         Time      %      Count  Function
---------------------------------------------------------
    2246.857  42.4     5303.760 100.0        1 _main (pi.obj)
    1620.240  30.5     1620.240  30.5        1 SingleThreadPI(void) (pi.obj)
    1436.662  27.1     1436.662  27.1        1 DualThreadPI(void) (pi.obj) 


Contributed by Nigel
Machine -> Dell Precision 530, Pentium Xeon (2 x 2.0GHz, 512MB Memory)

Profile: Function timing, sorted by time
Date: Sat Oct 09 16:58:43 2004


Program Statistics
------------------
Command line at 2004 Oct 09 16:58: "D:\somewhere\PI_demo\PI\Debug\PI" 
Total time: 10076.987 millisecond
Time outside of functions: 11.837 millisecond
Call depth: 2
Total functions: 7
Total hits: 3
Function coverage: 42.9%
Overhead Calculated 976
Overhead Average 976

Module Statistics for pi.exe
----------------------------
Time in module: 10065.150 millisecond
Percent of time in module: 100.0%
Functions in module: 7
Hits in module: 3
Module function coverage: 42.9%

Func Func+Child Hit
Time % Time % Count Function
---------------------------------------------------------
7175.142 71.3 10065.150 100.0 1 _main (pi.obj)
1931.827 19.2 1931.827 19.2 1 SingleThreadPI(void) (pi.obj)
958.182 9.5 958.182 9.5 1 DualThreadPI(void) (pi.obj)

Machine -> Pentium Xeon (2 x 2.8GHz, 512MB Memory)

(Pending...ToBeContinued...)
This one will have significant improvement. Anyone could volunteer? 

Finally

Actually I've seen its performance in my office already (dual Xeon 2.8GHz, it doubles the performance), just that now I am at home, I only can provide my pc's performance (has a bit improvement), the rest will let yourself to prove it... *wink*

Learning is fun =)

Recommendation/Tips

After you have read the article, probably you may want to try it out in your application. Assume you have put in lots of effort, after implement, as the result, you don't get the result you expected. Please don't jump at me. You might target wrongly in your application. Also, may be, you've tested your application on single-CPU machine, not much improvement you will get. See Requirement. This article will only show some light in the tunnel for you, it won't get you far. Here are some tips for you. To take advantage of this type of optimisation, my practice is to do profiling of my application, then target for frequently used function and especially computational expensive like compression, image filtering process, and other morphology process.. etc.

In large scale server application, its hardware probably may have at least 2-4 CPUs per machine, but I have never tried it before. Most likely I will not try it in near future as well, because I am not into that field. But you may want to try it! If the result is good, please don't mind to share the experience over here.. Good luck!

Link

Intel Web Site

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

A list of licenses authors might use can be found here


Written By
f2
Software Developer (Senior)
Singapore Singapore
He started programming in dBase, pascal, c then assembly. Actively working on image processing algorithm and customised vision applications. His major actually is more on control engineering, motion control, machine vision & satistics.
He did like to work on many projects that require careful analytical method.
He can be reached at albertoycc@hotmail.com.

Comments and Discussions

 
GeneralThis is useful for Multi Core Processors Pin
sanun9-Mar-07 22:14
sanun9-Mar-07 22:14 
GeneralRe: This is useful for Multi Core Processors Pin
f210-Mar-07 3:01
f210-Mar-07 3:01 
GeneralProfiling is fairly pointless Pin
Ralph Walden11-Oct-04 4:42
Ralph Walden11-Oct-04 4:42 
Simple logic will dictate that most multiple-thread applications will run faster on a dual-processor machine (whether dual CPU or HyperThreaded). How much faster, though, is heavily dependent on what the threads are actually doing. Just because you have two threads and two processors to run them does NOT mean that your threads will actually be running all the time. Certain operations are going to be serialized -- meaning one thread will be stopped while waiting for the other thread. At the lowest level, the hardware will not allow both processors to write to the exact same memory at the exact same time. At a higher level, if both threads are using the base heap, then all memory allocations/freeing/etc. will be serialized. Unless you specifically call asynchronous API's, your file i/o is going to be serialized. On top of all that, sometimes you are going to have to lock your own threads. If both threads can read AND alter the state of a shared variable, then you are going to have to put a critical section around access to that variable which may block your other thread until access is done. Last, but not least, if the threads aren't identical, or they are identical but the data they are processing is not, then one thread will finish sooner then the other. If your main process cannot continue until both threads are done, then as soon as one thread finishes, you are effectively running a single CPU system again.

There are, of course, ways to optimize all of the above to get as much performance as possible out a dual-process system. Unfortunately, none of those methods are covered by this article. There's also ways to write your code such that it runs differently on a single processor system then on a dual processor system -- again something not touched on by this article. Realizing that a multi-thread application "can" run faster on a dual processor system is the first step. But the real challenge is how to fully take advantage of that -- something that goes far beyond running a profiler to see if your program is faster or not.
GeneralRe: Profiling is fairly pointless Pin
f211-Oct-04 5:53
f211-Oct-04 5:53 
GeneralRe: Profiling is fairly pointless Pin
WREY11-Oct-04 7:16
WREY11-Oct-04 7:16 
GeneralRe: Profiling is fairly pointless Pin
f211-Oct-04 8:16
f211-Oct-04 8:16 
General?? Re: Profiling is fairly pointless Pin
f211-Oct-04 8:56
f211-Oct-04 8:56 
GeneralRe: ?? Re: Profiling is fairly pointless Pin
f211-Oct-04 9:22
f211-Oct-04 9:22 
GeneralRe: ?? Re: Profiling is fairly pointless Pin
WREY11-Oct-04 10:17
WREY11-Oct-04 10:17 
GeneralRe: ?? Re: Profiling is fairly pointless Pin
f211-Oct-04 18:30
f211-Oct-04 18:30 
GeneralYou posted that already yesterday ... Pin
WREY11-Oct-04 19:38
WREY11-Oct-04 19:38 
GeneralRe: You posted that already yesterday ... Pin
f211-Oct-04 22:10
f211-Oct-04 22:10 
GeneralRe: You posted that already yesterday ... Pin
WREY12-Oct-04 4:26
WREY12-Oct-04 4:26 
GeneralRe: You posted that already yesterday ... Pin
f212-Oct-04 6:06
f212-Oct-04 6:06 
GeneralYou are an EVOLUTIONARY MISTAKE !! Pin
WREY12-Oct-04 7:30
WREY12-Oct-04 7:30 
GeneralCall yourself chemical waste! Pin
f212-Oct-04 15:12
f212-Oct-04 15:12 
GeneralProof that you are an EVOLUTIONARY MISTAKE!! Pin
WREY13-Oct-04 0:19
WREY13-Oct-04 0:19 
GeneralRe: Proof that you are an EVOLUTIONARY MISTAKE!! Pin
f213-Oct-04 6:04
f213-Oct-04 6:04 
GeneralRe: Proof that you are an EVOLUTIONARY MISTAKE!! Pin
WREY13-Oct-04 7:38
WREY13-Oct-04 7:38 
GeneralWREY, you are dead!! Pin
f213-Oct-04 20:18
f213-Oct-04 20:18 
GeneralWRONG!!! &quot;LOW LIFE&quot;. I am still ALIVE, Pin
WREY16-Oct-04 7:05
WREY16-Oct-04 7:05 
GeneralWREY is trying to wakeup from dead.. Pin
f216-Oct-04 7:50
f216-Oct-04 7:50 
GeneralI was NEVER dead, &quot;LOW LIFE&quot; !!! Pin
WREY16-Oct-04 8:35
WREY16-Oct-04 8:35 
GeneralRe: I was NEVER dead, "LOW LIFE" !!! Pin
f216-Oct-04 23:07
f216-Oct-04 23:07 
GeneralA very poorly written article. Pin
WREY10-Oct-04 6:30
WREY10-Oct-04 6:30 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.