Need help parallelizing the C++ code using openmp

Question

1.00/5 (1 vote)

See more:

I am new to openmp. If anyone could suggest some more changes to get this program work faster, it would be very helpful. In this program values.txt is a file containing 60 crore ( 600 millions) lines with each line having a 32 bit permutation of 16 0's and 16 1's.
Explanation-

It would be very tough to explain here but I will try.
I have generated all of the 32 bit permutations of 16 0's and 16 1's line by line in a text file, values.txt.
Eg-
00000000000000001111111111111111
00000000000000010111111111111111
00000000000000011011111111111111
00000000000000011101111111111111
00000000000000011110111111111111
00000000000000011111011111111111
00000000000000011111101111111111
00000000000000011111110111111111
00000000000000011111111011111111
and so on....

Let us consider that each of the line of the text file is a boolean function.
I need to check for the reversibility of this function in a domain.

For this I picked up the first line from the text file and stored it into a column matrix of dimension 32x1, matrix a[][].

inside the nested for loops I am basically generating the domain values in form of a 3x3 matrix for which I need to check for the reversibility of the function.
I created a matrix g[][] of dimension 3x3 that is going to store the binary representation of all no. from 1 to 2^9. eg-
for 0 matrix g would look like-
0 0 0
0 0 0
0 0 0

for 1, matrix g would be-

0 0 0
0 0 0
0 0 1

for 2 matrix g would be

0 0 0
0 0 0
0 1 0

and so on upto 2^9.

for each matrix generated above from 0 to 2^9, I am computing a new matrix u[][] of dimension 3x3 based on my function.
This is done by reading 5 adjacent values to each element of the matrix.

for eg- consider g matrix to be
0 0 0
0 1 1
1 0 0

I pickup the first element,i.e,g[0][0], compute a new value for it using the five adjacent values(top value,left value,element itself,right value,below value) namely g[2][0],g[0][2],g[0][0],g[0][1],g[1][0]. These 5 no. combinely represent a binary no. I calculate its decimal equivalent and the decimal value corresponds to the row no. of matrix a[][] with which I have to update the vale of u[0][0].
I will repeat the above process for every element of g and will finally have a u matrix of 3x3.

this complete process was for one matrix, that it matrix corresponding to 0.
Like this for every g[][] matrix from 0 to 2^9, I will create 2^9 matrices.

At any point of time if for two matrices g[][], matrix u[][] happens to be same I abort the function, reading the second line of text file and again begin the above process, i.e., I am not interested with functions that result in duplicate matrices. If all of the 2^9 matrices happen to be different, I write the value of the corresponding function(line from text file) into another text file.

So therefore,summing up, I need to create a total of 60 crore* 2^9 matrices for the overall computation.

The thing is that for a particular function from the text files,the 2^9 matrices are calculated individually. If somehow I could parallelize them, I would lessen the computation time greatly... and there is where I need help.

I hope you got the method.
Also actually I used three nested loops because this was a sample program. In actual I donot need to calculate 2^9 matrices but a total of 2^128 matrices. My actual g matrix would be of order 16x8 and same will be u matrix. And since there is no 128 bit datatype and openMP only support integer datatype in standard library, I thought of using nested loops.

New program Updated:

C++

#include <algorithm>
#include <fstream>
#include <iostream>
#include <string>
#include <math.h>
using namespace std;
#include <boost/multiprecision/cpp_int.hpp>
using namespace boost::multiprecision;
#include <boost/lexical_cast.hpp>
#include <cctype>
#include <boost/assign/list_of.hpp>
#include <set>
#include <stdint.h>
#include <omp.h>
#define convertToString(x) #x
using namespace boost::assign;

int main()
{
    int xyz=0;
//    omp_set_dynamic(0);
//    omp_set_num_threads(1);
//    Dec2Bin in;

    ifstream infile;
    infile.open("values.txt");
    ofstream outfile;
    outfile.open("haha.txt");
    short a[32][1];
    while(!infile.eof())
    {
        string STRING;
        getline(infile,STRING);
        set<string> SET;
        int count=0;


        for(int i=0;i<32;i++)
        {
                a[i][0]=STRING.at(i)-'0';
        }


        int g[9];
        int u[9];
        char buffer[10];
        buffer[9] = 0;
        uint16_t f = 0;

        int max = (int)pow(2,3);


        for(int r=0;r<max && count!=1;r++)
        {
           for(int s=0;s<max && count!=1;s++)
           {
              for(int t=0;t<max && count!=1;t++)
              {
                for(int i = 0; i < 9; ++i)
                {
                   g[i] = (f & (1 << (8 - i))) != 0;
                }
                ++f;

                u[0]=a[(g[6]*2*2*2*2)+(g[2]*2*2*2)+(g[0]*2*2)+(g[1]*2)+(g[3]*1)][0];
                u[1]=a[(g[7]*2*2*2*2)+(g[0]*2*2*2)+(g[1]*2*2)+(g[2]*2)+(g[4]*1)][0];
                u[2]=a[(g[8]*2*2*2*2)+(g[1]*2*2*2)+(g[2]*2*2)+(g[0]*2)+(g[5]*1)][0];
                u[3]=a[(g[0]*2*2*2*2)+(g[5]*2*2*2)+(g[3]*2*2)+(g[4]*2)+(g[6]*1)][0];
                u[4]=a[(g[1]*2*2*2*2)+(g[3]*2*2*2)+(g[4]*2*2)+(g[5]*2)+(g[7]*1)][0];
                u[5]=a[(g[2]*2*2*2*2)+(g[4]*2*2*2)+(g[5]*2*2)+(g[3]*2)+(g[8]*1)][0];
                u[6]=a[(g[3]*2*2*2*2)+(g[8]*2*2*2)+(g[6]*2*2)+(g[7]*2)+(g[0]*1)][0];
                u[7]=a[(g[4]*2*2*2*2)+(g[6]*2*2*2)+(g[7]*2*2)+(g[8]*2)+(g[1]*1)][0];
                u[8]=a[(g[5]*2*2*2*2)+(g[7]*2*2*2)+(g[8]*2*2)+(g[6]*2)+(g[2]*1)][0];

                
                for(int i = 0; i < 9; ++i)
                {
                   buffer[i] = '0' + u[i];
                }
                if(!SET.insert(::std::string(buffer)).second)
                {
                   count = 1;
                }
             }
          }
        }

        if(count==0)
        {
           /* xyz++;
            if(xyz>3)
            break; */
            outfile<<STRING<<"\n";
            cout<<STRING<<"\n";
        }


    }
        infile.close();
        outfile.close();
        return 0;
    }

What I have tried:

I tried a lot parallelizing this code and ended up with some modifications but I still feel I am not getting the desired timing.

Posted 7-May-16 21:19pm

Member 12509578

Updated 10-May-16 9:35am

v13

Add a Solution

Comments

Patrice T 8-May-16 3:43am

What the goal of the program ? with example
By the way: how much is the gain of multi-threads ?

Member 12509578 8-May-16 8:59am

It would be very tough to explain here but I will try.
I have generated all of the 32 bit permutations of 16 0's and 16 1's line by line in a text file, values.txt.
Eg-
00000000000000001111111111111111
00000000000000010111111111111111
00000000000000011011111111111111
00000000000000011101111111111111
00000000000000011110111111111111
00000000000000011111011111111111
00000000000000011111101111111111
00000000000000011111110111111111
00000000000000011111111011111111
and so on....

Let us consider that each of the line of the text file is a boolean function.
I need to check for the reversibility of this function in a domain.

For this I picked up the first line from the text file and stored it into a column matrix of dimension 32x1, matrix a[][].

inside the nested for loops I am basically generating the domain values in form of a 3x3 matrix for which I need to check for the reversibility of the function.
I created a matrix g[][] of dimension 3x3 that is going to store the binary representation of all no. from 1 to 2^9. eg-
for 0 matrix g would look like-
0 0 0
0 0 0
0 0 0

for 1, matrix g would be-

0 0 0
0 0 0
0 0 1

for 2 matrix g would be

0 0 0
0 0 0
0 1 0

and so on upto 2^9.

for each matrix generated above from 0 to 2^9, I am computing a new matrix u[][] of dimension 3x3 based on my function.
This is done by reading 5 adjacent values to each element of the matrix.

for eg- consider g matrix to be
0 0 0
0 1 1
1 0 0

I pickup the first element,i.e,g[0][0], compute a new value for it using the five adjacent values(top value,left value,element itself,right value,below value) namely g[2][0],g[0][2],g[0][0],g[0][1],g[1][0]. These 5 no. combinely represent a binary no. I calculate its decimal equivalent and the decimal value corresponds to the row no. of matrix a[][] with which I have to update the vale of u[0][0].
I will repeat the above process for every element of g and will finally have a u matrix of 3x3.

this complete process was for one matrix, that it matrix corresponding to 0.
Like this for every g[][] matrix from 0 to 2^9, I will create 2^9 matrices.

At any point of time if for two matrices g[][], matrix u[][] happens to be same I abort the function, reading the second line of text file and again begin the above process, i.e., I am not interested with functions that result in duplicate matrices. If all of the 2^9 matrices happen to be different, I write the value of the corresponding function(line from text file) into another text file.

So therefore,summing up, I need to create a total of 60 crore* 2^9 matrices for the overall computation.

The thing is that for a particular function from the text files,the 2^9 matrices are calculated individually. If somehow I could parallelize them, I would lessen the computation time greatly... and there is where I need help.

I hope you got the method.

Patrice T 8-May-16 9:17am

Use Improve question to update your question.
So that every will pay attention to this ingormation.

1 solution

Add a Solution

Add your solution here

Treat my content as plain text, not as HTML

Preview 0

…

Existing Members

Sign in to your account

...or Join us

Download, Vote, Comment, Publish.

Your Email
Password
Forgot your password?

Your Email
This email is in use. Do you need your password?
Optional Password

I have read and agree to the Terms of Service and Privacy Policy
Please subscribe me to the CodeProject newsletters

When answering a question please:

Read the question carefully.
Understand that English isn't everyone's first language so be lenient of bad spelling and grammar.
If a question is poorly phrased then either ask for clarification, ignore it, or edit the question and fix the problem. Insults are not welcome.
Don't tell someone to read the manual. Chances are they have and don't get it. Provide an answer or move on to the next question.

Let's work to help developers, not make them feel stupid.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Patrice T · Answer 1 · 2016-05-08T09:38:00

Before using threads, you should start by optimizing your code.

Just a few optimizations:
- Everywhere, you use convertToString(0), replace with '0'. Same for convertToString(1) and '1'.

Wait ...
you have 3 nested loops

C++

for(int r=0;r<(int)pow(2,3);r++)
{
    for(int s=0;s<(int)pow(2,3);s++)
    {

        for(int t=0;t<(int)pow(2,3);t++)
        {

Why do you have this ? the 3 variables r, s and t are not used inside the loops. I can't see any reason for this.

I think a complete rewrite is in order.

By the way: how much is the actual gain of multi-threads ?

[Update]
I think you are on the wrong way.
1) if your project match a known problem, give the name, give it and reference.
2) you should explain the real project and show real code.
3) for now forget about Openmp.
4) get the real code optimized for 1 thread.
5) then you can think of multi-threading.

Expect only a few hints. This kind on optimization for multi-threading is professional job and professionals get paid for this kind of thing (and not peanuts) because it is not even for every professional.
The question by itself is out of the scope of a quick answer, it takes hours to analyze what you have done and device the right optimizations.

[Update]
There is more optimizations that are possible, but you have a bigger problem.
No computer in the word can get track of 2^128 matrix, they don't have enough memory, HDD included, computer farm included, data centers included.

To verify, try to run a full sized program with an entry value that satisfy the conditions. the program will crash with out of memory.