Click here to Skip to main content
15,887,683 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
Hello world! I am currently trying to develop a good way for bitmap-matrix , matrix - bitmap operation. I used lockbits on bitmap and things were pretty fast. But recently i learned about OPENMP and it got me thinking whether it is possible to use it on IO operation like this. When i tried "omp for" as shown bellow i get into trouble with some random colors in imported/exported images. I understand theory about parallel computing but in real code i struggle. Can anybody help with an working implementation of OPEN MP for this case ?

(this is windows::forms c++ app,currently for 24 bit bitmap)

C#
int i,j,pos;
int width=bitmap->Width;
int height=bitmap->Height;

float **mat;
mat=new float*[width];
for(i=0;i<width;i++)
    mat[i]=new float [height];

BitmapData ^bitmapData;
bitmapData=gcnew BitmapData();
System::Drawing::Rectangle a(0,0,width,height);
bitmap->LockBits(a, System::Drawing::Imaging::ImageLockMode::ReadOnly, bitmap->PixelFormat, bitmapData);
unsigned char *ptr=(unsigned char*)bitmapData->Scan0.ToPointer();

int stride=bitmapData->Stride;
//#pragma omp parallel for private(j)
for (int i=0;i<width;i++)
    for(int j=0;j<height;j++)
    {
        pos=(j*stride)+(i*3);
        mat[i][j]=(float)(ptr[pos]+ptr[pos+1]+ptr[pos+2])/3;
    }
bitmap->UnlockBits(bitmapData);
return mat;
Posted

You're sharing the single 'pos' variable among all of your threads.

You'll get random values, as each thread is sharing the same location and writing and reading from it. Who knows what value it will have by the time the thread gets to the next line? And even in that line, the value can change in mid execution.

Try this.

C++
int pos=(j*stride)+(i*3);


You might get some speed improvement trying it like this.

C++
#pragma omp parallel for private(i)
for (int i=0;i<width;i++)
{
    unsigned char * LinePtr = ptr + (j * stride);

    for(int j=0;j<height;j++)
    {
        mat[i][j]=(float)((int) *(LinePtr++) + *(LinePtr++) + *(LinePtr++))/3;
    }
}


There is some overhead in spawning the threads. So rather than spawn a thread for each pixel, spawn one for each line. They'll then get more calculations done between the spawn and the destructon.

By precalculating the line, we divide the number of multiplications by height.
Increments are also faster than adds. And by removing the j variable from the inner loop, it gives the compiler more room to optimize. Now it only has two variables to manipulate, 'mat' and 'LinePtr'.

Also notice that I cast the pixels to an int before adding. Adding unsigned chars can lead to value wrapping at the 255 boundary.
 
Share this answer
 
v7
Thanks , you cleared it very good. I am new to unmanaged code to but it seem there is a little error in your fix ... just in case somebody reads this i believe it should be like. But the idea of increments and line pointers is brilliant thanks a lot.
C#
#pragma omp parallel for private(i)
for (i=0;i<height;i++)
{
    unsigned char * LinePtr = ptr + (i * stride);

    for(int j=0;j<width;j++)
    {
        mat[j][i]=(float)((int) *(LinePtr++) + (int)*(LinePtr++) + (int)*(LinePtr++))/3;
    }
}
 
Share this answer
 
Comments
JackDingler 22-Nov-11 16:58pm    
Good catch. :)
JackDingler 22-Nov-11 17:09pm    
Typecasting each element to an (int) is unecessary. The first typecast determines the data type that the equation inside the parenthesis, will evaluate too,
JackDingler 22-Nov-11 17:57pm    
This little lookup 'mat[j][i] = ' seems a little heavy too, for an inner loop...
thanks for the typecasting hint. about that dereferencing ...

C#
float * arrptr=mat[i];
        for(int j=0;j<width;j++)
        {
            *(arrptr++)=(float)((int) *(LinePtr++) + *(LinePtr++) + *(LinePtr++))/3;
        }


i was thinking sth like that would be good idea but in that case i would presumably have to change internal dimensions of float ** matrix to [width][height] do you have som better suggestion ?

i am doing stuff like gauss blur and laplace of this so keeping it in one dimensional array would really mess things up. But in my code i have a lot of double even triple for's with dereferencing mat[i][j][u][v] etc ... i consider it to be slow but do i have a better choice ??
 
Share this answer
 
Comments
JackDingler 22-Nov-11 19:41pm    
You can keep it as a 2D array.
Your change is what I had in mind.
Your main optimizations should be in the innermost loop. After all, the code executing there is happening (height * width) times. The array lookup and the multiply aren't worth optimizing out of the outer loop.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900