Introduction
Similar to this, we implement FT using OpenGL4.3+ via Compute shader. This article hopes to introduce readers to OpenGL compute shader by implementing 1D FT, later versions may have 2D, 3D FT implementations.
Background
Like before, this article will not introduce FT. You will require a Linux Box to run the sample with X server installed (For Windows users, you will have to rewrite OpenGL initialization using WGL).
For Linux users, more details are available here.
For Windows, refer to this link.
Also, your video drivers are required to support OpenGL v4.3+. I have tested the sample on Ubuntu 14.04 with latest AMD catalyst drivers on AMD E350 (1st generation APU).
Using the Code
The code is built using QT5, readers may require qmake
to build the executable in addition to the GNU tool set.
The code given below talks about OpenGL initialization and shut down.
mainw = XCreateSimpleWindow(display, root, 100,1,1300,700,0,0,0); XMapWindow(display, mainw);
ctx = glXCreateContext(display, vi, 0, GL_TRUE);
glXMakeCurrent (display, mainw, ctx);
glXMakeCurrent(display, None, NULL);
glXDestroyContext(display, ctx);
XDestroyWindow(display, mainw);
XCloseDisplay(display);
In most cases, (like in my current role involving OpenSceneGraph), the OpenGL context is ready for you and you may not need the above code.
I have also used smart buffer objects (akin to smart pointers), this is easier for OGL resource management.
struct glBuffer_
{
GLuint m_;
operator GLuint(){return m_;}
glBuffer_(GLuint i):m_(i){}
~glBuffer_(){glDeleteBuffers(1,&m_);}
};
Now comes the best part, this is an FT implementation and not FFT, the below code is written in GLSL and must be compiled using OpenGL v4.3+.
"#version 430\n",
"layout(std430, binding=0) buffer a {vec2 com_in[];};",
"layout(std430, binding=1) buffer b {vec2 com_ou[];};",
"layout (local_size_x = 5) in;",
"\
vec2 exp1(vec2 v)\
{\
vec2 v1;\
v1.x=pow(exp(1.0),v.x)*cos(v.y);\
v1.y=pow(exp(1.0),v.x)*sin(v.y);\
return v1;\
}\
vec2 mul1(vec2 a,vec2 b)\
{\
vec2 r;\
r.x=a.x*b.x-a.y*b.y;\
r.y=a.x*b.y+a.y*b.x;\
return r;\
}\
\
uniform int nSize;\
uniform int iForwardT;\
void main(){\
vec2 ii;\
ii.x=0;\
ii.y=1;\
if(gl_GlobalInvocationID.x>=nSize) return;\
\
float k=gl_GlobalInvocationID.x;\
com_ou[gl_GlobalInvocationID.x].y=com_ou[gl_GlobalInvocationID.x].x=0;\
for(int j=0;j<nSize;++j)\
{\
vec2 t;t.y=0;t.x=k*j*-2*3.141593/nSize;\
if(1==iForwardT) t.x=-t.x; /*for Inverse FT*/\
\
com_ou[gl_GlobalInvocationID.x]=
com_ou[gl_GlobalInvocationID.x]+mul1(com_in[j],exp1(mul1(t,ii)));\
}\
if(1==iForwardT) com_ou[gl_GlobalInvocationID.x]=com_ou[gl_GlobalInvocationID.x]/nSize;\
}"
I have used 2 Shader storage buffers, binding =0 for input, binding =1 for output, iForwardT set to zero results in Forward FT, set to one results in inverse FT.
Also, there is a CPU implementation of the same (a lot slower though).
Points of Interest
This may not be the fastest FT you have seen, but if you would like to see it compared with other libraries, feel free to drop me a message with your speed/accuracy results. Also for large data sets, you may experience larger rounding errors.
History
- 22nd February, 2018: Initial version
- 21/July/2022, added Compute shader for finding out prime numbers