Often when this happens, it is due to your kernel launch parameters. When debugging, print these values to check if they are within the bounds of your hardware.
A second way of debugging is by using the
cudaGetLastError()
function (
documentation). It returns a
cudaError_t
, which you can pass to
cudaGetErrorString()
(
documentation).
E.g.
cout << "Blocks: " << blocks << ", Threads: " << threads << '\n';
kernel<<< blocks, threads >>>(params);
cout << "Error: " << cudaGetErrorString(cudaGetLastError()) << '\n';