Click here to Skip to main content
15,867,870 members
Articles / Programming Languages / CUDA
Article

Part 6: Primitive Restart and OpenGL Interoperability

2 Apr 2012CPOL6 min read 64.9K   11   6
This sixth article in a series on portable multithreaded programming using OpenCL™ where Rob Farber discusses how to calculate data in OpenCL™ and render it with OpenGL within the same application.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Topics covered include configuration, buffer interoperability, and the use of primitive restart to accelerate GPU rendering.  The previous article, part 5, of this series, demonstrated how to tie computation to data in a multi-device, multi-GPU environment.

Introduction

OpenGL is a common graphical programming API (Application Programming Interface) that is standards based, cross-language, and cross-platform. It can be used to create applications that can render 2D and 3D images on most visualization hardware. OpenGL and OpenCL interoperability can greatly accelerate both data generation as well as data visualization. Basically, the OpenCL application maps the OpenGL buffers so they can be modified by massively-parallel kernels running on the GPU. This keeps the data on the GPU and avoids costly PCIe bus transfers.

Primitive restart is a new feature added in the OpenGL 3.1 specification. In short, primitive restart lets the programmer define a numeric value  that acts as a token that tells the OpenGL state machine to restart an OpenGL rendering instruction to begin with the next data item. This has multiple advantages:

  • For the programmer, this means that multiple lines, triangle fans and irregular meshes can be rendered with a single command.
  • The developers of the OpenGL state machine can optimize their code so the test for the primitive restart token happens on the GPU thus eliminating host/GPU communications bottlenecks across the PCIe bus. 
  • Increase rendering performance can be obtained by arranging the data to achieve the highest reuse of the cache in the texture units.
  • Higher quality images can be created by alternating the direction of tessellation as noted in the primitive restart specification and illustrated in the following two figures:

Description: http://i.cmpnet.com/ddj/images/article/2010/1005/100527cuda18_f6.gif

Two triangle strips with artifacts

Description: http://i.cmpnet.com/ddj/images/article/2010/1005/100527cuda18_f7.gif

Data rendered as four triangle fans (the center marked with a filled circle)

This article is not intended as an OpenGL tutorial. There are a number of excellent books and tutorials available on the Internet. An excellent one is the NeHe tutorial series. The following application is intended to demonstrate OpenCL and OpenGL interoperability and the use of primitive restart.

Building the Application

Copy and paste the complete source listing for gltest.cpp into a file by that name. The complete source is provided at the end of this article after the walk through. Similarly, copy and paste the source for the OpenCL kernel, sinewave.cl.

The source can be compiled under Linux with the following command:

g++  -I
$ATISTREAMSDKROOT/include -L $ATISTREAMSDKROOT/lib/x86_64 gltest.cpp -lglut
-lGLEW -lOpenCL -o gltest

The application runs on the GPU by default. The program runs on the CPU when anything is specified on the command-line. This simple application requires that the OpenCL kernel be in the file sinewave.cl in the same directory.

./gltest # running on the GPU
./gltest CPU # This will run the application on the CPU

By default, the application starts by rendering a surface. Use the mouse to rotate and zoom the image. Notice that the colors as well as the shape of the image changes with time to show repeated computation by the OpenCL kernel.

Pressing ‘D’ or ‘d’ on the keyboard cycles the rendering mode from a surface, to a collection of points, to a set of lines, and finally back to a surface again as can be seen in the following three images.

Image 3

Default rendering as a surface

Image 4

The surface rendered as a set of colored points

Image 5

The surface rendered as a set of colored lines

Pressing ‘q’ exits the program.

A Walk Through the gltest.cpp Source Code

The initial section of the code specifies the include files, constants and a set of global variables.

C++
//Code by Rob Farber
#include <iostream>
#include <fstream>
using namespace std;
 
#include <CL/cl.h>
#include <CL/cl_gl.h>
#include <GL/glew.h>
#include <GL/glut.h>
#ifndef _WIN32
#include <GL/glxew.h>
#endif //!_WIN32
 
#define WIDTH  1408
#define HEIGHT 1024
 
// Globals used in the program
const unsigned int      mesh_width = 128, mesh_height = 128;
const unsigned int RestartIndex = 0xffffffff;
 
cl_platform_id          platform;         
cl_device_id            device;
cl_context              context;
cl_command_queue        queue;
cl_program              program;
cl_kernel               kernel;
size_t                  kernelsize;
size_t                  global[] = {mesh_width, mesh_height};

har                    *pathname = NULL;
char                    *source = NULL;

A VBO (Vertex Buffer Object) is an OpenGL memory buffer in the high speed memory of the GPU. The following source code shows that gltest.cpp utilizes two VBOs, one that describes the coordinates of a set of vertices for the surface and another that describes the color associated with each vertex. VBOs can also store information such as normals, texcoords, indicies, and other data.

C++
// Globals associated with the position vbo
const unsigned int p_vbo_size = mesh_width*mesh_height*4*sizeof(float); 
GLuint  p_vbo;
cl_mem  p_vbocl;
 
// Globals associated with the color vbo
const unsigned int c_vbo_size = mesh_width*mesh_height*4*sizeof(unsigned char); 
GLuint  c_vbo;
cl_mem  c_vbocl;

Primitive restart requires that a set of indices be used to point to each location of the data (or to the primitive restart token). The qIndicies vector holds these indices.

C++
// Globals associated with the indices for primitive restart
GLuint* qIndices=NULL;
int qIndices_size = 5*(mesh_height-1)*(mesh_width-1);

This code utilizes the OpenGL Utility Toolkit GLUT, which is a portable toolkit that runs under UNIX, Windows and other operating systems. GLUT provides an easy API to specify callbacks to functions that handle mouse, window, keyboard and other events. The following section provides some global variables for those callbacks as well as forward references to the functions.

C++
float   anim = 0.0;
int drawMode=GL_TRIANGLE_FAN; // the default draw mode
const char* drawStr="fan";
const char* platformString="notset";

// Globals associated with the mouse controls
int mouse_old_x, mouse_old_y;
int mouse_buttons = 0;
float rotate_x = 0.0, rotate_y = 0.0;
float translate_z = -2.5;
 
// Forward references for the GLUT callbacks
void display();
void motion(int x, int y);
void mouse(int button, int state, int x, int y);
void keyboard(unsigned char key, int x, int y);
void initgl(int argc, const char** argv);

For convenience, the title of the window is used to report important characteristics such as the device (CPU or GPU) and drawing mode. This information is specified with the following helper function:

// helper routine to set the window title
void setTitle()
{
  char title[256];
  sprintf(title, "GL Interop Wrapper: mode %s device %s",
         drawStr, platformString);  
  glutSetWindowTitle(title);
}

The first part of main initializes the window and provides a very simple command-line processing to select the CPU or the GPU.

C++
int main(int argc, const char **argv) 
{
  initgl(argc, argv);
  
  clGetPlatformIDs(1, &platform, NULL);
  if(argc > 1) {
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, NULL);
    platformString = "CPU";
  } else {
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
    platformString = "GPU";
  }

The OpenCL context and queue is created. Note that the context must be tied to the OpenGL context as highlighted  by the #ifdef preprocessor statements. Otherwise, nothing will be displayed.

C++
  // It is necessary to add the gl context to the properties or
  // nothing will display
#ifdef _WIN32
  HGLRC glCtx = wglGetCurrentContext();
#else //!_WIN32
  GLXContext glCtx = glXGetCurrentContext();
#endif //!_WIN32
  
  cl_context_properties props[] = { CL_CONTEXT_PLATFORM, 
                  (cl_context_properties)platform,
#ifdef _WIN32
                  CL_WGL_HDC_KHR, (intptr_t) wglGetCurrentDC(),
#else //!_WIN32
                  CL_GLX_DISPLAY_KHR, (intptr_t) glXGetCurrentDisplay(),
#endif //!_WIN32
                  CL_GL_CONTEXT_KHR, (intptr_t) glCtx, 0};
  
  // Create the context and the queue
  context = clCreateContext(props, 1, &device, NULL, NULL, NULL);
  
  queue = clCreateCommandQueue(context, device, 0, NULL);

Both the position and color VBOs are created. Note that the call to glBindBuffer allocates the space on the device. VBOs are mapped into the OpenCL memory space with a call to clCreateFromGLBuffer, which returns a pointer to global memory that can be passed to the OpenCL kernel.

C++
// create position p_vbo
glGenBuffers(1, &p_vbo);
glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
// initialize buffer object
glBufferData(GL_ARRAY_BUFFER, p_vbo_size, 0, GL_DYNAMIC_DRAW);
// create OpenCL buffer from GL VBO
p_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, p_vbo, NULL);

// create color c_vbo (very similar to the position vbo)
glGenBuffers(1, &c_vbo);
glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
glBufferData(GL_ARRAY_BUFFER, c_vbo_size, 0, GL_DYNAMIC_DRAW);
c_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, c_vbo, NULL);

Even though this is a C-API OpenCL program, C++ was used to load the program source. The OpenCL program is then built, the kernel arguments are specified, and the indices for primitive restart are defined.

C++
// For convenience use C++ to load the program source into memory
ifstream file("sinewave.cl");
string prog(istreambuf_iterator<char>(file), (istreambuf_iterator<char>()));
file.close();
const char* source = prog.c_str();
const size_t kernelsize = prog.length()+1;
program = clCreateProgramWithSource(context, 1, (const char**) &source,
                               &kernelsize, NULL);

// Build the program executable
int err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
if (err != CL_SUCCESS) {
  size_t len;
  char buffer[2048];

  cerr << "Error: Failed to build program executable!" << endl;
  clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
                     sizeof(buffer), buffer, &len);
  cerr << buffer << endl;
  exit(1);
}

// Create the compute kernel in the program
kernel = clCreateKernel(program, "sinewave", &err);
if (!kernel || err != CL_SUCCESS) {
  cerr << "Error: Failed to create compute kernel!" << endl;
  exit(1);
}

// Set the kernel arguments. Note argument 3 is set in display
clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&p_vbocl);
clSetKernelArg(kernel, 1, sizeof(unsigned int), &mesh_width);
clSetKernelArg(kernel, 2, sizeof(unsigned int), &mesh_height);
clSetKernelArg(kernel, 4, sizeof(cl_mem), (void*)&c_vbocl);

// Generate the indices for primitive restart
// allocate and assign trianglefan indicies
qIndices = (GLuint *) malloc(qIndices_size*sizeof(GLint));
int index=0;
for(int i=1; i < mesh_height; i++) {
  for(int j=1; j < mesh_width; j++) {
    qIndices[index++] = (i)*mesh_width + j;
    qIndices[index++] = (i)*mesh_width + j-1;
    qIndices[index++] = (i-1)*mesh_width + j-1;
    qIndices[index++] = (i-1)*mesh_width + j;
    qIndices[index++] = RestartIndex;
  }
}

The main method completes by setting the window title and calling the GLUT main loop, which does not exit.

C++
  setTitle(); 
  glutMainLoop();
}

The initgl method performs all the work of setting up the window, registering the callbacks, and defining some initial viewing conditions.

C++
// setup the window and assign callbacks
void initgl(int argc, const char** argv) 
{
  glutInit(&argc, (char**)argv);
  glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
  glutInitWindowPosition (glutGet(GLUT_SCREEN_WIDTH)/2 - WIDTH/2, 
                       glutGet(GLUT_SCREEN_HEIGHT)/2 - HEIGHT/2);
  glutInitWindowSize(WIDTH, HEIGHT);
  glutCreateWindow("");
  
  glutDisplayFunc(display);       // register GLUT callback functions
  glutKeyboardFunc(keyboard);
  glutMouseFunc(mouse);
  glutMotionFunc(motion);
 
  glewInit();
  
  glClearColor(0.0, 0.0, 0.0, 1.0);
  glDisable(GL_DEPTH_TEST);
  
  glViewport(0, 0, WIDTH, HEIGHT);
  glMatrixMode(GL_PROJECTION);
  glLoadIdentity();
  gluPerspective(10000.0, (GLfloat)WIDTH / (GLfloat)HEIGHT, 0.1, 10.0);
  
  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
  glMatrixMode(GL_MODELVIEW);
  glLoadIdentity();
  return;
}

The display method is where all the work happens. Basically, the OpenGL buffer objects are acquired, and the kernel is queued. The OpenGL buffers are released once the kernel completes.

C++
// This method is called everytime the screen is redisplayed. No
// optimization is performed as it recalculates the kernel every time.
void display() 
{
  anim += 0.01f;
  
  // map OpenGL buffer object for writing from OpenCL
  glFinish();
  clEnqueueAcquireGLObjects(queue, 1, &p_vbocl, 0,0,0);
  clEnqueueAcquireGLObjects(queue, 1, &c_vbocl, 0,0,0);
  
  // Set arg 3 and queue the kernel
  clSetKernelArg(kernel, 3, sizeof(float), &anim);
  clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global, NULL, 0, 0, 0);
  
  // queue unmap buffer object
  clEnqueueReleaseGLObjects(queue, 1, &c_vbocl, 0,0,0);
  clEnqueueReleaseGLObjects(queue, 1, &p_vbocl, 0,0,0);
  clFinish(queue);
  
  // clear graphics
  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

The newly computed data is then rendered according to the drawing mode after which display cleans up. Note that primitive restart is enabled with the call to glEnableClientState and that the unique index is specified to the OpenGL state machine with glPrimitiveRestartIndexNV. Only a single glDrawElements call is made to render all the triangle fans.

C++
  // Apply the image transforms
  glMatrixMode(GL_MODELVIEW);
  glLoadIdentity();
  glTranslatef(0.0, 0.0, translate_z);
  glRotatef(rotate_x, 1.0, 0.0, 0.0);
  glRotatef(rotate_y, 0.0, 1.0, 0.0);

  //render from the p_vbo
  glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
  glVertexPointer(4, GL_FLOAT, 0, 0);
  glEnableClientState(GL_VERTEX_ARRAY);

  // enable colors from the c_vbo
  glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
  glColorPointer(4, GL_UNSIGNED_BYTE, 0, 0);
  glEnableClientState(GL_COLOR_ARRAY);

  // draw points, lines or triangles according to the user keyboard input
  switch(drawMode) {
  case GL_LINE_STRIP:
    for(int i=0 ; i < mesh_width*mesh_height; i+= mesh_width)
      glDrawArrays(GL_LINE_STRIP, i, mesh_width);
    break;
  case GL_TRIANGLE_FAN:
    glPrimitiveRestartIndexNV(RestartIndex);
    glEnableClientState(GL_PRIMITIVE_RESTART_NV);
    glDrawElements(GL_TRIANGLE_FAN, qIndices_size, GL_UNSIGNED_INT, qIndices);
    break;
  default:
    glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);
    break;
  }
  
  // handle housekeeping and redisplay
  glDisableClientState(GL_COLOR_ARRAY);
  glDisableClientState(GL_VERTEX_ARRAY);
  glutSwapBuffers();
  glutPostRedisplay();
}

The keyboard callback handles keyboard input. Pressing ‘D’ or ‘d’ cycles the drawing mode between rendering a surface, to a bunch of points, then to lines and back to a surface. Pressing a ‘q’ or ESC terminates the application.

C++
// Keyboard events handler for GLUT
void keyboard(unsigned char key, int x, int y)
{
  switch(key) {
  case('q') :
  case(27) :
    exit(0);
  break;
  case 'd':
  case 'D':
    switch(drawMode) {
    case GL_POINTS: drawMode = GL_LINE_STRIP; drawStr = "line"; break;
    case GL_LINE_STRIP: drawMode = GL_TRIANGLE_FAN; drawStr = "fan"; break;
    default: drawMode=GL_POINTS; drawStr = "points"; break;
    }
  } 
  setTitle();
  glutPostRedisplay();
}

Following are the mouse handlers.

C++
// Mouse event handler for GLUT
void mouse(int button, int state, int x, int y)
{
  if (state == GLUT_DOWN) {
    mouse_buttons |= 1<<button;
  } else if (state == GLUT_UP) {
    mouse_buttons = 0;
  }
  
  mouse_old_x = x;
  mouse_old_y = y;
  glutPostRedisplay();
}
 
// Motion event handler for GLUT
void motion(int x, int y)
{
  float dx, dy;
  dx = x - mouse_old_x;
  dy = y - mouse_old_y;
  
  if (mouse_buttons & 1) {
    rotate_x += dy * 0.2;
    rotate_y += dx * 0.2;
  } else if (mouse_buttons & 4) {
    translate_z += dy * 0.01;
  }
  
  mouse_old_x = x;
  mouse_old_y = y;
}

The complete source for gltest.cpp follows:

C++
//Code by Rob Farber
#include <iostream>
#include <fstream>
using namespace std;
 
#include <CL/cl.h>
#include <CL/cl_gl.h>
#include <GL/glew.h>
#include <GL/glut.h>
#ifndef _WIN32
#include <GL/glxew.h>
#endif //!_WIN32
 
#define WIDTH  1408
#define HEIGHT 1024
 
// Globals used in the program
const unsigned int      mesh_width = 128, mesh_height = 128;
const unsigned int RestartIndex = 0xffffffff;
 
cl_platform_id          platform;         
cl_device_id            device;
cl_context              context;
cl_command_queue        queue;
cl_program              program;
cl_kernel               kernel;
size_t                  kernelsize;
size_t                  global[] = {mesh_width, mesh_height};
char                    *pathname = NULL;
char                    *source = NULL; 
 
// Globals associated with the position vbo
const unsigned int p_vbo_size = mesh_width*mesh_height*4*sizeof(float); 
GLuint  p_vbo;
cl_mem  p_vbocl;
 
// Globals associated with the color vbo
const unsigned int c_vbo_size = mesh_width*mesh_height*4*sizeof(unsigned char); 
GLuint  c_vbo;
cl_mem  c_vbocl;
 
// Globals associated with the indices for primitive restart
GLuint* qIndices=NULL;
int qIndices_size = 5*(mesh_height-1)*(mesh_width-1);
float   anim = 0.0;
int drawMode=GL_TRIANGLE_FAN; // the default draw mode
const char* drawStr="fan";
const char* platformString="notset";
 
// Globals associated with the mouse controls
int mouse_old_x, mouse_old_y;
int mouse_buttons = 0;
float rotate_x = 0.0, rotate_y = 0.0;
float translate_z = -2.5;
 
// Forward references for the GLUT callbacks
void display();
void motion(int x, int y);
void mouse(int button, int state, int x, int y);
void keyboard(unsigned char key, int x, int y);
void initgl(int argc, const char** argv);
 
// helper routine to set the window title
void setTitle()
{
  char title[256];
  sprintf(title, "GL Interop Wrapper: mode %s device %s",
         drawStr, platformString);  
  glutSetWindowTitle(title);
}
 
int main(int argc, const char **argv) 
{
  initgl(argc, argv);
  
  clGetPlatformIDs(1, &platform, NULL);
  if(argc > 1) {
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_CPU, 1, &device, NULL);
    platformString = "CPU";
  } else {
    clGetDeviceIDs(platform, CL_DEVICE_TYPE_GPU, 1, &device, NULL);
    platformString = "GPU";
  }
  
  // It is necessary to add the gl context to the properties or
  // nothing will display
#ifdef _WIN32
  HGLRC glCtx = wglGetCurrentContext();
#else //!_WIN32
  GLXContext glCtx = glXGetCurrentContext();
#endif //!_WIN32
  
  cl_context_properties props[] = { CL_CONTEXT_PLATFORM, 
                  (cl_context_properties)platform,
#ifdef _WIN32
                  CL_WGL_HDC_KHR, (intptr_t) wglGetCurrentDC(),
#else //!_WIN32
                  CL_GLX_DISPLAY_KHR, (intptr_t) glXGetCurrentDisplay(),
#endif //!_WIN32
                  CL_GL_CONTEXT_KHR, (intptr_t) glCtx, 0};
  
  // Create the context and the queue
  context = clCreateContext(props, 1, &device, NULL, NULL, NULL);
  
  queue = clCreateCommandQueue(context, device, 0, NULL); 
  
  // create position p_vbo
  glGenBuffers(1, &p_vbo);
  glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
  // initialize buffer object
  glBufferData(GL_ARRAY_BUFFER, p_vbo_size, 0, GL_DYNAMIC_DRAW);        
  // create OpenCL buffer from GL VBO
  p_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, p_vbo, NULL);
 
  // create color c_vbo (very similar to the position vbo)
  glGenBuffers(1, &c_vbo);
  glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
  glBufferData(GL_ARRAY_BUFFER, c_vbo_size, 0, GL_DYNAMIC_DRAW);        
  c_vbocl = clCreateFromGLBuffer(context, CL_MEM_WRITE_ONLY, c_vbo, NULL);
  
  // For convenience use C++ to load the program source into memory
  ifstream file("sinewave.cl");
  string prog(istreambuf_iterator<char>(file), (istreambuf_iterator<char>()));
  file.close();
  const char* source = prog.c_str();
  const size_t kernelsize = prog.length()+1;
  program = clCreateProgramWithSource(context, 1, (const char**) &source,
                                 &kernelsize, NULL);
 
  // Build the program executable
  int err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
  if (err != CL_SUCCESS) {
    size_t len;
    char buffer[2048];
    
    cerr << "Error: Failed to build program executable!" << endl;
    clGetProgramBuildInfo(program, device, CL_PROGRAM_BUILD_LOG,
                       sizeof(buffer), buffer, &len);
    cerr << buffer << endl;
    exit(1);
  }
  
  // Create the compute kernel in the program
  kernel = clCreateKernel(program, "sinewave", &err);
  if (!kernel || err != CL_SUCCESS) {
    cerr << "Error: Failed to create compute kernel!" << endl;
    exit(1);
  }
  
  // Set the kernel arguments. Note argument 3 is set in display
  clSetKernelArg(kernel, 0, sizeof(cl_mem), (void*)&p_vbocl);
  clSetKernelArg(kernel, 1, sizeof(unsigned int), &mesh_width);
  clSetKernelArg(kernel, 2, sizeof(unsigned int), &mesh_height);
  clSetKernelArg(kernel, 4, sizeof(cl_mem), (void*)&c_vbocl);
  
  // Generate the indices for primitive restart
  // allocate and assign trianglefan indicies 
  qIndices = (GLuint *) malloc(qIndices_size*sizeof(GLint));
  int index=0;
  for(int i=1; i < mesh_height; i++) {
    for(int j=1; j < mesh_width; j++) {
      qIndices[index++] = (i)*mesh_width + j; 
      qIndices[index++] = (i)*mesh_width + j-1; 
      qIndices[index++] = (i-1)*mesh_width + j-1; 
      qIndices[index++] = (i-1)*mesh_width + j; 
      qIndices[index++] = RestartIndex;
    }
  }
  setTitle(); 
  glutMainLoop();
}
 
// setup the window and assign callbacks
void initgl(int argc, const char** argv) 
{
  glutInit(&argc, (char**)argv);
  glutInitDisplayMode(GLUT_RGBA | GLUT_DOUBLE);
  glutInitWindowPosition (glutGet(GLUT_SCREEN_WIDTH)/2 - WIDTH/2, 
                       glutGet(GLUT_SCREEN_HEIGHT)/2 - HEIGHT/2);
  glutInitWindowSize(WIDTH, HEIGHT);
  glutCreateWindow("");
  
  glutDisplayFunc(display);       // register GLUT callback functions
  glutKeyboardFunc(keyboard);
  glutMouseFunc(mouse);
  glutMotionFunc(motion);
 
  glewInit();
  
  glClearColor(0.0, 0.0, 0.0, 1.0);
  glDisable(GL_DEPTH_TEST);
  
  glViewport(0, 0, WIDTH, HEIGHT);
  glMatrixMode(GL_PROJECTION);
  glLoadIdentity();
  gluPerspective(10000.0, (GLfloat)WIDTH / (GLfloat)HEIGHT, 0.1, 10.0);
  
  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);
  glMatrixMode(GL_MODELVIEW);
  glLoadIdentity();
  return;
}
 
// This method is called everytime the screen is redisplayed. No
// optimization is performed as it recalculates the kernel every time.
void display() 
{
  anim += 0.01f;
  
  // map OpenGL buffer object for writing from OpenCL
  glFinish();
  clEnqueueAcquireGLObjects(queue, 1, &p_vbocl, 0,0,0);
  clEnqueueAcquireGLObjects(queue, 1, &c_vbocl, 0,0,0);
  
  // Set arg 3 and queue the kernel
  clSetKernelArg(kernel, 3, sizeof(float), &anim);
  clEnqueueNDRangeKernel(queue, kernel, 2, NULL, global, NULL, 0, 0, 0);
  
  // queue unmap buffer object
  clEnqueueReleaseGLObjects(queue, 1, &c_vbocl, 0,0,0);
  clEnqueueReleaseGLObjects(queue, 1, &p_vbocl, 0,0,0);
  clFinish(queue);
  
  // clear graphics
  glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);     
 
  // Apply the image transforms
  glMatrixMode(GL_MODELVIEW);
  glLoadIdentity();
  glTranslatef(0.0, 0.0, translate_z);
  glRotatef(rotate_x, 1.0, 0.0, 0.0);
  glRotatef(rotate_y, 0.0, 1.0, 0.0);
 
  //render from the p_vbo
  glBindBuffer(GL_ARRAY_BUFFER, p_vbo);
  glVertexPointer(4, GL_FLOAT, 0, 0);
  glEnableClientState(GL_VERTEX_ARRAY);
  
  // enable colors from the c_vbo
  glBindBuffer(GL_ARRAY_BUFFER, c_vbo);
  glColorPointer(4, GL_UNSIGNED_BYTE, 0, 0);
  glEnableClientState(GL_COLOR_ARRAY);
  
  // draw points, lines or triangles according to the user keyboard input
  switch(drawMode) {
  case GL_LINE_STRIP:
    for(int i=0 ; i < mesh_width*mesh_height; i+= mesh_width)
      glDrawArrays(GL_LINE_STRIP, i, mesh_width);
    break;
  case GL_TRIANGLE_FAN:
    glPrimitiveRestartIndexNV(RestartIndex);
    glEnableClientState(GL_PRIMITIVE_RESTART_NV);
    glDrawElements(GL_TRIANGLE_FAN, qIndices_size, GL_UNSIGNED_INT, qIndices);
    break;
  default:
    glDrawArrays(GL_POINTS, 0, mesh_width * mesh_height);
    break;
  }
  
  // handle housekeeping and redisplay
  glDisableClientState(GL_COLOR_ARRAY);
  glDisableClientState(GL_VERTEX_ARRAY);
  glutSwapBuffers();
  glutPostRedisplay();
}
 
// Keyboard events handler for GLUT
void keyboard(unsigned char key, int x, int y)
{
  switch(key) {
  case('q') :
  case(27) :
    exit(0);
  break;
  case 'd':
  case 'D':
    switch(drawMode) {
    case GL_POINTS: drawMode = GL_LINE_STRIP; drawStr = "line"; break;
    case GL_LINE_STRIP: drawMode = GL_TRIANGLE_FAN; drawStr = "fan"; break;
    default: drawMode=GL_POINTS; drawStr = "points"; break;
    }
  } 
  setTitle();
  glutPostRedisplay();
}
 
// Mouse event handler for GLUT
void mouse(int button, int state, int x, int y)
{
  if (state == GLUT_DOWN) {
    mouse_buttons |= 1<<button;
  } else if (state == GLUT_UP) {
    mouse_buttons = 0;
  }
  
  mouse_old_x = x;
  mouse_old_y = y;
  glutPostRedisplay();
}
 
// Motion event handler for GLUT
void motion(int x, int y)
{
  float dx, dy;
  dx = x - mouse_old_x;
  dy = y - mouse_old_y;
  
  if (mouse_buttons & 1) {
    rotate_x += dy * 0.2;
    rotate_y += dx * 0.2;
  } else if (mouse_buttons & 4) {
    translate_z += dy * 0.01;
  }
  
  mouse_old_x = x;
  mouse_old_y = y;
}

Following is the complete source code for sinewave.cl. Note that it operates on __global vectors, which also happen to be OpenGL buffers.

C++
__kernel void sinewave(__global float4* pos, unsigned int width, 
                     unsigned int height, float time, __global uchar4* color)
{
  unsigned int x = get_global_id(0);
  unsigned int y = get_global_id(1);
  
  // calculate uv coordinates
  float u = x / (float) width;
  float v = y / (float) height;
  u = u*2.0f - 1.0f;
  v = v*2.0f - 1.0f;
  
  // calculate simple sine wave pattern
  float freq = 4.0f;
  float w = sin(u*freq + time) * cos(v*freq + time) * 0.5f;
  
  // write output vertex
  pos[y*width+x] = (float4)(u, w, v, 1.0f);
  color[y*width+x] = (uchar4) (
                            (uchar) 255.f *0.5f*(1.f+sin(w+x)),
                            (uchar) 255.f *0.5f*(1.f+sin(x)*cos(y)),
                            (uchar) 255.f *0.5f*(1.f+sin(w+time/10.f)), 0 );
}

Summary

GPGPU devices are powerful visualization as well as computational devices. Utilizing the OpenCL graphical interoperability capabilities is a fantastic way to speed visualization applications and utilize the full capabilities of these devices. Comparing the speed of visualization on the CPU vs. the GPU with the example code from this article is one way to see the difference.

Primitive restart is an excellent way to further speed visualization on GPU devices by keeping both command tokens and data on the GPU to avoid PCIe bus transfers. Further, it gives the developer tremendous flexibility in handling irregular meshes and other challenging visualization tasks. In addition to speed, primitive restart can also generate higher quality graphics.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
Rob Farber is a senior scientist and research consultant at the Irish Center for High-End Computing in Dublin, Ireland and Chief Scientist for BlackDog Endeavors, LLC in the US.

Rob has been on staff at several US national laboratories including Los Alamos National Laboratory, Lawrence Berkeley National Laboratory, and at Pacific Northwest National Laboratory. He also served as an external faculty member at the Santa Fe Institute, co-founded two successful start-ups, and has been a consultant to Fortune 100 companies. His articles have appeared in Dr. Dobb's Journal and Scientific Computing, among others. He recently completed a book teaching massive parallel computing.

Comments and Discussions

 
QuestionThanks! But where is the part1 ~ part5? Pin
xrg_soft@163.com13-Feb-12 22:48
xrg_soft@163.com13-Feb-12 22:48 
AnswerRe: Thanks! But where is the part1 ~ part5? Pin
manythreads14-Feb-12 6:50
manythreads14-Feb-12 6:50 
GeneralRe: Thanks! But where is the part1 ~ part5? Pin
xrg_soft@163.com14-Feb-12 17:34
xrg_soft@163.com14-Feb-12 17:34 
GeneralRe: Thanks! But where is the part1 ~ part5? Pin
manythreads14-Feb-12 17:42
manythreads14-Feb-12 17:42 
GeneralRe: Thanks! But where is the part1 ~ part5? Pin
Sean Ewington15-Feb-12 5:53
staffSean Ewington15-Feb-12 5:53 
QuestionThanks Pin
Debojyoti Majumder14-Jul-11 19:38
Debojyoti Majumder14-Jul-11 19:38 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.