Click here to Skip to main content
15,886,741 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
I am new to openmp. If anyone could suggest some more changes to get this program work faster, it would be very helpful. In this program values.txt is a file containing 60 crore ( 600 millions) lines with each line having a 32 bit permutation of 16 0's and 16 1's.
Explanation-


It would be very tough to explain here but I will try.
I have generated all of the 32 bit permutations of 16 0's and 16 1's line by line in a text file, values.txt.
Eg-
00000000000000001111111111111111
00000000000000010111111111111111
00000000000000011011111111111111
00000000000000011101111111111111
00000000000000011110111111111111
00000000000000011111011111111111
00000000000000011111101111111111
00000000000000011111110111111111
00000000000000011111111011111111
and so on....

Let us consider that each of the line of the text file is a boolean function.
I need to check for the reversibility of this function in a domain.

For this I picked up the first line from the text file and stored it into a column matrix of dimension 32x1, matrix a[][].

inside the nested for loops I am basically generating the domain values in form of a 3x3 matrix for which I need to check for the reversibility of the function.
I created a matrix g[][] of dimension 3x3 that is going to store the binary representation of all no. from 1 to 2^9. eg-
for 0 matrix g would look like-
0 0 0
0 0 0
0 0 0

for 1, matrix g would be-

0 0 0
0 0 0
0 0 1

for 2 matrix g would be

0 0 0
0 0 0
0 1 0

and so on upto 2^9.

for each matrix generated above from 0 to 2^9, I am computing a new matrix u[][] of dimension 3x3 based on my function.
This is done by reading 5 adjacent values to each element of the matrix.

for eg- consider g matrix to be
0 0 0
0 1 1
1 0 0

I pickup the first element,i.e,g[0][0], compute a new value for it using the five adjacent values(top value,left value,element itself,right value,below value) namely g[2][0],g[0][2],g[0][0],g[0][1],g[1][0]. These 5 no. combinely represent a binary no. I calculate its decimal equivalent and the decimal value corresponds to the row no. of matrix a[][] with which I have to update the vale of u[0][0].
I will repeat the above process for every element of g and will finally have a u matrix of 3x3.

this complete process was for one matrix, that it matrix corresponding to 0.
Like this for every g[][] matrix from 0 to 2^9, I will create 2^9 matrices.

At any point of time if for two matrices g[][], matrix u[][] happens to be same I abort the function, reading the second line of text file and again begin the above process, i.e., I am not interested with functions that result in duplicate matrices. If all of the 2^9 matrices happen to be different, I write the value of the corresponding function(line from text file) into another text file.

So therefore,summing up, I need to create a total of 60 crore* 2^9 matrices for the overall computation.


The thing is that for a particular function from the text files,the 2^9 matrices are calculated individually. If somehow I could parallelize them, I would lessen the computation time greatly... and there is where I need help.

I hope you got the method.
Also actually I used three nested loops because this was a sample program. In actual I donot need to calculate 2^9 matrices but a total of 2^128 matrices. My actual g matrix would be of order 16x8 and same will be u matrix. And since there is no 128 bit datatype and openMP only support integer datatype in standard library, I thought of using nested loops.

New program Updated:
C++
#include <algorithm>
#include <fstream>
#include <iostream>
#include <string>
#include <math.h>
using namespace std;
#include <boost/multiprecision/cpp_int.hpp>
using namespace boost::multiprecision;
#include <boost/lexical_cast.hpp>
#include <cctype>
#include <boost/assign/list_of.hpp>
#include <set>
#include <stdint.h>
#include <omp.h>
#define convertToString(x) #x
using namespace boost::assign;

int main()
{
    int xyz=0;
//    omp_set_dynamic(0);
//    omp_set_num_threads(1);
//    Dec2Bin in;

    ifstream infile;
    infile.open("values.txt");
    ofstream outfile;
    outfile.open("haha.txt");
    short a[32][1];
    while(!infile.eof())
    {
        string STRING;
        getline(infile,STRING);
        set<string> SET;
        int count=0;


        for(int i=0;i<32;i++)
        {
                a[i][0]=STRING.at(i)-'0';
        }


        int g[9];
        int u[9];
        char buffer[10];
        buffer[9] = 0;
        uint16_t f = 0;

        int max = (int)pow(2,3);


        for(int r=0;r<max && count!=1;r++)
        {
           for(int s=0;s<max && count!=1;s++)
           {
              for(int t=0;t<max && count!=1;t++)
              {
                for(int i = 0; i < 9; ++i)
                {
                   g[i] = (f & (1 << (8 - i))) != 0;
                }
                ++f;

                u[0]=a[(g[6]*2*2*2*2)+(g[2]*2*2*2)+(g[0]*2*2)+(g[1]*2)+(g[3]*1)][0];
                u[1]=a[(g[7]*2*2*2*2)+(g[0]*2*2*2)+(g[1]*2*2)+(g[2]*2)+(g[4]*1)][0];
                u[2]=a[(g[8]*2*2*2*2)+(g[1]*2*2*2)+(g[2]*2*2)+(g[0]*2)+(g[5]*1)][0];
                u[3]=a[(g[0]*2*2*2*2)+(g[5]*2*2*2)+(g[3]*2*2)+(g[4]*2)+(g[6]*1)][0];
                u[4]=a[(g[1]*2*2*2*2)+(g[3]*2*2*2)+(g[4]*2*2)+(g[5]*2)+(g[7]*1)][0];
                u[5]=a[(g[2]*2*2*2*2)+(g[4]*2*2*2)+(g[5]*2*2)+(g[3]*2)+(g[8]*1)][0];
                u[6]=a[(g[3]*2*2*2*2)+(g[8]*2*2*2)+(g[6]*2*2)+(g[7]*2)+(g[0]*1)][0];
                u[7]=a[(g[4]*2*2*2*2)+(g[6]*2*2*2)+(g[7]*2*2)+(g[8]*2)+(g[1]*1)][0];
                u[8]=a[(g[5]*2*2*2*2)+(g[7]*2*2*2)+(g[8]*2*2)+(g[6]*2)+(g[2]*1)][0];

                
                for(int i = 0; i < 9; ++i)
                {
                   buffer[i] = '0' + u[i];
                }
                if(!SET.insert(::std::string(buffer)).second)
                {
                   count = 1;
                }
             }
          }
        }

        if(count==0)
        {
           /* xyz++;
            if(xyz>3)
            break; */
            outfile<<STRING<<"\n";
            cout<<STRING<<"\n";
        }


    }
        infile.close();
        outfile.close();
        return 0;
    }


What I have tried:

I tried a lot parallelizing this code and ended up with some modifications but I still feel I am not getting the desired timing.
Posted
Updated 10-May-16 9:35am
v13
Comments
Patrice T 8-May-16 3:43am    
What the goal of the program ? with example
By the way: how much is the gain of multi-threads ?
Member 12509578 8-May-16 8:59am    
It would be very tough to explain here but I will try.
I have generated all of the 32 bit permutations of 16 0's and 16 1's line by line in a text file, values.txt.
Eg-
00000000000000001111111111111111
00000000000000010111111111111111
00000000000000011011111111111111
00000000000000011101111111111111
00000000000000011110111111111111
00000000000000011111011111111111
00000000000000011111101111111111
00000000000000011111110111111111
00000000000000011111111011111111
and so on....

Let us consider that each of the line of the text file is a boolean function.
I need to check for the reversibility of this function in a domain.

For this I picked up the first line from the text file and stored it into a column matrix of dimension 32x1, matrix a[][].

inside the nested for loops I am basically generating the domain values in form of a 3x3 matrix for which I need to check for the reversibility of the function.
I created a matrix g[][] of dimension 3x3 that is going to store the binary representation of all no. from 1 to 2^9. eg-
for 0 matrix g would look like-
0 0 0
0 0 0
0 0 0

for 1, matrix g would be-

0 0 0
0 0 0
0 0 1

for 2 matrix g would be

0 0 0
0 0 0
0 1 0

and so on upto 2^9.

for each matrix generated above from 0 to 2^9, I am computing a new matrix u[][] of dimension 3x3 based on my function.
This is done by reading 5 adjacent values to each element of the matrix.

for eg- consider g matrix to be
0 0 0
0 1 1
1 0 0

I pickup the first element,i.e,g[0][0], compute a new value for it using the five adjacent values(top value,left value,element itself,right value,below value) namely g[2][0],g[0][2],g[0][0],g[0][1],g[1][0]. These 5 no. combinely represent a binary no. I calculate its decimal equivalent and the decimal value corresponds to the row no. of matrix a[][] with which I have to update the vale of u[0][0].
I will repeat the above process for every element of g and will finally have a u matrix of 3x3.

this complete process was for one matrix, that it matrix corresponding to 0.
Like this for every g[][] matrix from 0 to 2^9, I will create 2^9 matrices.

At any point of time if for two matrices g[][], matrix u[][] happens to be same I abort the function, reading the second line of text file and again begin the above process, i.e., I am not interested with functions that result in duplicate matrices. If all of the 2^9 matrices happen to be different, I write the value of the corresponding function(line from text file) into another text file.

So therefore,summing up, I need to create a total of 60 crore* 2^9 matrices for the overall computation.


The thing is that for a particular function from the text files,the 2^9 matrices are calculated individually. If somehow I could parallelize them, I would lessen the computation time greatly... and there is where I need help.

I hope you got the method.

Patrice T 8-May-16 9:17am    
Use Improve question to update your question.
So that every will pay attention to this ingormation.

1 solution

Before using threads, you should start by optimizing your code.

Just a few optimizations:
- Everywhere, you use convertToString(0), replace with '0'. Same for convertToString(1) and '1'.

Wait ...
you have 3 nested loops
C++
for(int r=0;r<(int)pow(2,3);r++)
{
    for(int s=0;s<(int)pow(2,3);s++)
    {

        for(int t=0;t<(int)pow(2,3);t++)
        {

Why do you have this ? the 3 variables r, s and t are not used inside the loops. I can't see any reason for this.

I think a complete rewrite is in order.

By the way: how much is the actual gain of multi-threads ?

[Update]
I think you are on the wrong way.
1) if your project match a known problem, give the name, give it and reference.
2) you should explain the real project and show real code.
3) for now forget about Openmp.
4) get the real code optimized for 1 thread.
5) then you can think of multi-threading.

Expect only a few hints. This kind on optimization for multi-threading is professional job and professionals get paid for this kind of thing (and not peanuts) because it is not even for every professional.
The question by itself is out of the scope of a quick answer, it takes hours to analyze what you have done and device the right optimizations.

[Update]
There is more optimizations that are possible, but you have a bigger problem.
No computer in the word can get track of 2^128 matrix, they don't have enough memory, HDD included, computer farm included, data centers included.

To verify, try to run a full sized program with an entry value that satisfy the conditions. the program will crash with out of memory.
 
Share this answer
 
v4
Comments
Member 12509578 8-May-16 16:43pm    
Actually I used three nested loops because this was a sample program. In actual I donot need to calculate 2^9 matrices but a total of 2^128 matrices. My actual g matrix would be of order 16x8 and same will be u matrix. And since there is no 128 bit datatype and openMP only support integer datatype in standard library, I thought of using nested loops.
and I am not a good programmer in C++, I considered your suggestion of using buffer but was not getting the same output because I need a 9 bit binary representaion and it was returning just 2 bit.
If you could please implement the dec2bin function, I will be grateful.
Member 12509578 9-May-16 20:00pm    
Modified and optimised my code and updated in my question. Hope to get suggestions on parallelisation if no more changes are required.
Patrice T 9-May-16 22:08pm    
There is more optimizations that are possible, but you have a bigger problem.
No computer in the word can get track of 2^128 matrix, they don't have enough memory, HDD included, computer farm included, data centers included.

To verify, try to run a full sized program with an entry value that satisfy the conditions. the program will crash with out of memory.
Member 12509578 10-May-16 6:25am    
I am trying to implement it on a super-computer(HPC) with 100TB hard disks and 512gb ram. Still not possible??
Patrice T 10-May-16 7:04am    
Simply sizing
2^10= 1024 saying 1000
Even 1000 terabytes= 10^15 bytes = 2^50 bytes= 2^53 bits
you want to distinguish 2^128 matrix and 10 times your disk is only 2^53

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900