Click here to Skip to main content
15,889,848 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello everyone,

I have a computer vision algorithm in which the convolution algorithm must be run in parallel on multicore processors using openMP. The code is of the form as shown below.

C++
void convolve(Bitmap &src,Kernel &kernel, Bitmap &dst)
   #pragma omp parallel for
   for(int y = 0; y < height; ++y){
      for(int x = 0; x <width; ++x){

          kernel.response(src,x,y,dst);
      }
   }


The kernel is an interface

C++
class Kernel {

public: virtual ~Kernel();

public: virtual void response(Bitmap &src, int x, int y, Bitmap &dst) = 0;

};


The problem is that implementations of kernel are not known by the compiler and they can be complex. So is this code capable of being parallelised using openMP?

If I run the code it actually runs slower than the serial version and it visibly stalls when running real time image recognition NOTE: this is not the case when openMP is disabled. I'am using visual studio express 2013 with "/openmp" enabled

EDIT:

Example below shows a Kernel implementation for an algorithm to change bitmap to gray scale. The convolution operator is inherently a parallel problem as is most computer vision algorithms, yet parallelizing such algorithms with shared memory is hard, especially due to race conditions. Maybe I should consider using GPU's for doing hardware acceleration of my algorithms NOTE: the code runs in real - time even on a single core, but I wanted to speed things up for mobile platforms, as the code is to be run on mobile devices for Augment Reality apps, thus multicore is the way to go.

Mostly if I use "omp parallel for shared(dst)" the code runs fast with minimal stalling but I feel like the race conditions are still there, is there a hardware implementation for avoiding false sharing and race conditions without using the expensive "critical"? I tried "atomic" but it's only for primitives such as addition operations. And why assignment operator is not supported by "atomic"?
I also just recently came across openMP and was excited about it until these issues came around :-(

C++
// Kernel for changing bitmap to gray scale
class GrayScale: public Kernel {
 
public: void response(Bitmap &src, int x, int y, Bitmap &dst)
{
Pixel &sp = src.pData[x + y*src.width];
 
Pixel &dp = dst.pData[x + y*dst.width];
 
unsigned char gray = static_cast<unsigned char>(0.3*sp.red + 0.5*sp.green + 0.2*sp.blue);
 
#pragma omp critical
{
dp.red = dp.green = dp.blue = gray;
}
}
}; 


where

C++
// Pixel data for ARGB_8888 bitmap format
struct Pixel {
unsigned char red;
unsigned char green;
unsigned char blue;
unsigned char alpha;
};
Posted
Updated 19-Dec-14 11:21am
v4

1 solution

Your for loop is probably too fine grained.

Try breaking the work up into chunks.

C++
void convolve_inner(Bitmap &src,Kernel &kernel, Bitmap &dst, int low, int high)
{
   // process one stripe or band of image data.
   for(int y = low; y < high; ++y)
   {
      for(int x = 0; x < width; ++x)
      {
         kernel.response(src, x, y, dst);
      }
   }
}

void convolve(Bitmap &src, Kernel &kernel, Bitmap &dst)
{
   int chunk = (width + 7) / 8;
   #pragma omp parallel for
   for(int low = 0; low < width; low += chunk)
   {
      int high = low + chunk;
      if (high > width) high = width;
      convolve_inner(src, kernel, dst, low, high);
   }
}
 
Share this answer
 
Comments
BupeChombaDerrick 19-Dec-14 14:40pm    
Actually the bitmap data is just a 1D array of pixels, so the data is just one big strip.How about the race conditions with the bitmaps? Especially the "dst" bitmap needs to be synchronized. I tried using critical within the implementations of "Kernel" when writing to the bitmap but yielded no benefits. For example

// Kernel for changing bitmap to gray scale
class GrayScale: public Kernel {

public: void response(Bitmap &src, int x, int y, Bitmap &dst)
{
Pixel &sp = src.pData[x + y*src.width];

Pixel &dp = dst.pData[x + y*dst.width];

unsigned char gray = static_cast<unsigned char="">(0.3*sp.red + 0.5*sp.green + 0.2*sp.blue);

#pragma omp critical
{
dp.red = dp.green = dp.blue = gray;
}
}
};

where

// Pixel data for ARGB_8888 bitmap format
struct Pixel {
unsigned char red;
unsigned char green;
unsigned char blue;
unsigned char alpha;
};
[no name] 19-Dec-14 16:09pm    
You may want to update your question.

Is there a risk of the multiple response function threads reading or writing the same pixel? I did not see any in the sample response code above.

What are typical values of width and height?

You describe the data as one big strip. If the strips are relatively small (a few thousand pixels), there may not be much value in adding MP support. You'll spend more cycles on context switches than actual work.

If the images are relatively large - like a million pixels, working on chunks of 64 k should work well.
BupeChombaDerrick 19-Dec-14 16:55pm    
Typical values are width = 640, height = 480, thus about 300K pixels, the Kernel interface has two bitmaps as arguments the one designated as "dst" is mostly writeonly and "src" is readonly. Thus bitmap pixels for "dst" are expected to be modified by the threads, so one thing is that shared arrays of data like this are hard to parallelize but yet they are inherently parallel.
[no name] 22-Dec-14 12:45pm    
Did you try the "chunked" approach above?

If you assign each pixel row to a thread, you'll have 480 context switches for a 640x480 image.

If you operate on chunks (say 48 or 96 rows, for example), you'll only need 5 or 10 context switches. Unless you have 50 or 100 core CPU, a chunked approach will be better.
BupeChombaDerrick 22-Dec-14 16:26pm    
I'am currently looking at http://staff.city.ac.uk/~sbbh653/publications/OpenMP_SPM.pdf they seem to be using the full image without a "chunked" approach. Only adding a "#pragma omp parallel for" should have been sufficient. After adding "shared(src,dst)" it seems to work but don't know why.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900