Click here to Skip to main content
15,891,529 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
hi,

The 2nd variable passed(Here B(float *)) to CUDA Kernel function gives Garbage value.:confused: Need suggestions.

KERNEL Function:
__global__ fmultiply(float* A, float *B, float *C)
{
      int idx = blockIdx.x*blockDim.x + threadIdx.x;
      //B[idx] gives Garbage Value here..      
      C[idx] = A[idx]*B[idx];
}


Main File:


int N = 10; //Array Containing Maximum of 10 elements
size_t size = N*sizeof(float);
...
cudaMalloc((**void)&a_d, size);
cudaMalloc((**void)&b_d, size);
cudaMalloc((**void)&c_d, size);

...

cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, b_h, size, cudaMemcpyHostToDevice);

int threadsPerBlock = 256;
int noOfBlocks = (N/threadsPerBlock);

//Calling Kernel Function
fmultiply<<<threadsPerBlock, noOfBlock>>>(a_d, b_d, c_d);

cudaMemcpy(c_d, c_h, size, cudaMemcpyDeviceToHost);
......

cudaFree(a_d);
cudaFree(b_d);
cudaFree(c_d);
Posted

Please observe the content of b_d
before the call of fmultiply(..) too :)

You could also lock the processing of the function
by a critical section,
but there is no any modification of B there...

...probably you will need to find other "changers" in the kernel... :)
 
Share this answer
 
v3
Yes i tried this:
__global__ fmultiply(float* A, float *B, float *C)
{      
      int idx = blockIdx.x*blockDim.x + threadIdx.x;      
        
      if(idx<N)
          C[idx] = A[idx]*B[idx];
}


But still no Hit.:mad:

Is it the correct way to access variables inside Kernel?
 
Share this answer
 
v2

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900