Click here to Skip to main content
15,868,016 members
Everything / OpenCL

OpenCL

OpenCL

Great Reads

by Nick Kopp
This article builds upon the earlier High Performance Queries: GPU vs. PLINQ vs. LINQ and ports this to also support OpenCL devices and adds benchmarking so you can easily compare performance.
by tugrulGtx
Multi-device OpenCL load balancer and pipeliner for C# in few lines of code.
by John Michael Hauck
It has never been easier for C# desktop developers to write code that takes advantage of the amazing computing performance of modern graphics cards. In this post I will share some techniques for solving a simple (but still interesting) image analysis problem. Source Code https://www.assembla.com/co
by Matt Scarpino
Using GPU Acceleration to Compute Ray-Triangle Intersection

Latest Articles

by aroman
In this post I explore Lattice Boltzmann methods and build a related project
by tugrulGtx
Header-only C++ tool that supports basic array-like usage pattern and uses multiple graphics cards in system as storage with LRU caching
by tugrulGtx
Accessing VRAM-cached nucleotide sequences in FASTA formatted files (*.fna, *.faa) by index
by Arthur V. Ratz
In this article I will thoroughly discuss about the several aspects of using the revolutionary new Intel® oneAPI HPC Toolkit to deliver a modern code that implements a parallel “stable” sort

All Articles

Sort by Score

OpenCL 

16 Sep 2013 by Nick Kopp
This article builds upon the earlier High Performance Queries: GPU vs. PLINQ vs. LINQ and ports this to also support OpenCL devices and adds benchmarking so you can easily compare performance.
5 Oct 2017 by tugrulGtx
Multi-device OpenCL load balancer and pipeliner for C# in few lines of code.
22 May 2013 by John Michael Hauck
It has never been easier for C# desktop developers to write code that takes advantage of the amazing computing performance of modern graphics cards. In this post I will share some techniques for solving a simple (but still interesting) image analysis problem. Source Code https://www.assembla.com/co
12 Sep 2013 by Matt Scarpino
Using GPU Acceleration to Compute Ray-Triangle Intersection
16 Feb 2016 by Max R McCarty
OWASP's #6 most vulnerable security risk has to do with keeping secrets secret.
16 Sep 2013 by Nick Kopp
Ultra high quality frequency domain image rotation on a GPU.
2 Mar 2021 by tugrulGtx
Accessing VRAM-cached nucleotide sequences in FASTA formatted files (*.fna, *.faa) by index
1 Jun 2017 by Intel
This paper introduces Intel software tools recently made available to accelerate deep learning inference in edge devices (such as smart cameras, robotics, autonomous vehicles, etc.) incorporating Intel® Processor Graphics solutions across the spectrum of Intel SOCs.
22 May 2013 by John Michael Hauck
Some ad hoc performance test results for a simple program written in C# as obtained from my current desktop computer: Dell Precision T3600, 16GB RAM, Intel Xeon E5-2665 0 @ 2.40GHz, NVidia GTX Titan.
14 Aug 2014 by Android on Intel
The standard API for 3D graphics on Android is OpenGL ES, which is the most widely used 3D graphics API on all mobile devices today.
28 Aug 2018 by DaveAuld
The pursuit of Serenity, it's new build time!
18 Feb 2019 by Apriorit Inc, ruksovdev
A detailed description of an FPGA-specific framework called ISE Design Suite, and the main steps you need to take in order to create a VGA driver using FPGA
3 Jan 2014 by OriginalGriff
I think to be honest that your teachers are right: as you say CUDA / OpenCL are not easy, and despite tutorials being available, I don't think you will be able to do anything that you would find interesting and that would be acceptable to your teachers in the time you have left to do it....
18 Aug 2017 by Intel
This paper addresses how the Smart Video (SV) system architecture is increasing in complexity and evolving into new industries and use cases.
10 Oct 2017 by Intel
The Face Access Control application is one of a series of IoT reference implementations aimed at instructing users on how to develop a working solution for a particular problem.
25 Jan 2018 by Intel
This tutorial will walk you through the basics of using the Deep Learning Deployment Toolkit's Inference Engine (included in the Intel® Computer Vision SDK).
13 Feb 2018 by Intel
Intel just released Intel® System Studio 2018, an all-in-one, cross-platform, comprehensive tool suite for system and IoT device application development.
21 Jul 2021 by aroman
In this post I explore Lattice Boltzmann methods and build a related project
2 Jan 2014 by Member 10501094
Hello,I am student on high school (not University) and technically i dont study programming but i can do my project on programming too. My teachers suggested me to make some useless sorting algorithms programs but that not really something i would like to do. I would like to create some...
2 Aug 2014 by Bartlomiej Filipek
How to start optimizing the particle system code.
20 Jan 2015 by Android on Intel
This tutorial will guide you through writing a native “Hello World” Android* app in Visual Studio* through the IDE Integration feature of Intel® INDE 2015.
1 Dec 2015 by Android on Intel
Using OpenCL™ 2.0 Read-Write Images
11 Apr 2017 by Intel
As IoT demand drives increases in data volume, a more powerful processor is required, as well as additional storage.
11 Apr 2017 by Intel
Digital displays and signs are all around you. You may have seen them cropping up at shopping centers and doctors’ offices. From video walls, to AR fitting mirrors, to ordering menus, digital signs are pervasive and are becoming a part of everyday shopping experience.
29 Aug 2017 by Jochen Arndt
The speed doubling in release mode is not sourced by parallel processing. It is sourced by the compiler optimising the code in release mode and omitting additional checks which are done in debug builds. You have to explicitly write code for parallel processing. Which method is finally faster...
7 May 2020 by Arthur V. Ratz
In this article I will thoroughly discuss about the several aspects of using the revolutionary new Intel® oneAPI HPC Toolkit to deliver a modern code that implements a parallel “stable” sort
13 Feb 2018 by Intel
The SDK includes components to develop applications: IDE integration, offline compiler, debugger, and other tools.
17 Oct 2021 by Richard MacCutchan
I am not an OpenCL expert but looking at your code I suspect it is because you are passing a local address to a function that expects a global: // your function definition inline float Euclidean_distance(__global int* restrict array_point_A,...
19 Dec 2023 by M Imran Ansari
Creating a flowchart for OpenCL code involves representing the logical flow of the program with steps. Note that OpenCL programming typically involves both host (CPU) and device (GPU) code flow. Check the below link as guide and try to build of...
10 Jan 2024 by OriginalGriff
Um. No. That's not a diagram of how your app should work: it's a mishmash of an app and instructions to produce it - and if your assignment wants you to produce a parallel based app, then you need to examine the assignment carefully and identify...
17 Sep 2010 by manythreads
Curious about GPGPU programming? Read Rob Farber’s Massively Parallel Programming series. Learn how to get more from your CPU, GPU, APU, DSP, and more.
27 Oct 2010 by manythreads
In his second tutorial, GPGPU expert Rob Farber discusses OpenCL™ memory spaces and the OpenCL memory hierarchy, and how to start thinking in terms of work items and work groups. This tutorial also provides a general example to facilitate experimentation with a variety of OpenCL kernels.
6 Jan 2011 by manythreads
In his third tutorial, GPGPU expert Rob Farber will introduce the OpenCL™ execution model and discuss how to coordinate computations among the work items in a work group
10 Mar 2011 by manythreads
Read Rob Farber’s Massively Parallel Programming series. This fourth article in a series on portable multithreaded programming using OpenCL™ will discuss the OpenCL™ runtime and demonstrate how to perform concurrent computations among the work queues of heterogeneous devices.
24 May 2011 by manythreads
This fifth article in a series on portable multithreaded programming using OpenCL™ Rob Farber discusses OpenCL™ buffers and demonstrates how to tie computation to data in a multi-device, multi-GPU environment.
2 Apr 2012 by manythreads
This sixth article in a series on portable multithreaded programming using OpenCL™ where Rob Farber discusses how to calculate data in OpenCL™ and render it with OpenGL within the same application.
28 Jul 2011 by caglarozbek89
Is there anybody who can give me a direction to write a code that calculate the pi number by using OpenCL..If you have any Pi calculator sample code, please share with me..Thanks for your great interest..
29 Jul 2011 by YvesDaoust
The first question is "How many decimal places?" 1000000, 10000000, 100000000, 1000000000... ? This will influence the choice of an algorithm.Then there is the issue of parallelizing the algorithm. Possibly by just by parallelizing the extended-precision arithmetic primitives (+, -, *, /,...
29 Jul 2011 by YvesDaoust
Then parallelizing the extended-precision arithmetic is pointless.I recommend using the Machin formula: 16 arctg(1/5) - 4 arctg(1/239),where the arctg are evaluated using McLaurin expansion arctg(x) = Sum (-1)^i x^(2i+1)/(2i+1).(compute the powers of x by...
4 Aug 2011 by Ken Domino
Please visit my web site @ http://domemtech.com/?p=669[^] This page from my blog is a cursory comparison of CUDA vs. OpenCL using just your example, estimating the value of pi. Basically, the solution is via numerical integration using the Composite Simpson's Rule. The solution uses IEEE...
15 Aug 2011 by caglarozbek89
Is there any one who can give me some hints about writing the OpenCL internet checksum algorithm..Or who has the OpenCL code of this algorithm??
15 Aug 2011 by Richard MacCutchan
Is this[^] what you are looking for?
2 Oct 2011 by Richard MacCutchan
Here[^] is a CodeProject article to get you started. You may find that a further search of the articles will yield even more.
29 Jan 2013 by Dilan Shaminda
Hi, i want to know what are the techniques that i can follow in order to reduce noises in webcam images in low,high and normal lighting conditions.I am going to extract the features in the image.Any suggestions? Thank you
29 Jan 2013 by Sergey Alexandrovich Kryukov
I can advise a couple of cross-platform C++ libraries which certainly offer noise reduction algorithms:OpenCV. Please see:http://en.wikipedia.org/wiki/OpenCV[^],http://opencv.org/[^].CImg. Please see:http://cimg.sourceforge.net/[^].Both libraries are open source.Good...
18 May 2013 by John Michael Hauck
“Programming Massively Parallel Processors (second edition)” by Kirk and Hwu is a very good second book for those interested in getting started with CUDA.
3 Jan 2014 by CPallini
You might write a useful parallelized sorting algorithm.
9 May 2014 by rasheds
Hey. I trying to get my OpenCL project compile in Ubuntu. I have a Core i5 and AMD HD 5660 which are both compatible. when I execute the following code cl_int status; // Retrieve the number of platforms cl_uint numPlatforms = 0; status = clGetPlatformIDs(0, NULL,...
10 May 2014 by rasheds
hi. so i found the solution. it seems there is a order according to which u should set up opencl support. even when u are using intel processor, first thing we have to do is install AMDAPP SDK 2.9.than restart PCrun `clinfo` to confirm that CPU is being detected.than for GPU, install...
16 Oct 2014 by Tim_Duncan
In this article, we’ll explore the strides Adobe engineers have made over the last few years to enhance Photoshop using OpenGL* and OpenCL™ to increase hardware utilization.
23 Oct 2014 by pi19404
Dense Motion Estimation based on Polynomial expansion IntroductionIn this article we will look at dense motion estimation based on polymonial repsentation of image.The polynomial basis representation of the image is obtained by approximating the local neighborhood of image us
14 Aug 2014 by Sergey Alexandrovich Kryukov
You misunderstand something very basic in computing. First, OpenCL is not a language, this is a framework: http://en.wikipedia.org/wiki/OpenCL[^].So, first, the C language remains C language. Second: high-level languages languages produce the same code, for kernel or not, does not matter. In...
18 Aug 2014 by kyleK89
I'm trying to make kernel.cl with opencvfor (i = 0; i
8 Sep 2014 by Tamas Ruszkai
I usie a C# wrapper called CLoo to use the OpenCL API. The openCL platform I use is the Intel CPU.When I run the official Intel sample code (a C/C++ application) then in the VS2010 IntelOpenCL plugin windows (Tools/Cod builder-OpenCL Debugger) I can see the command queue, the API call...
8 Sep 2014 by Richard MacCutchan
Your question would probably get answered faster by using the CLoo support links[^].
6 Nov 2014 by Maxim_Shevtsov
This article is an overview of the OpenCL support provided in System Analyzer and Platform Analyzer on the Windows* OS
6 Nov 2014 by Colleen Culbertson
This article, aimed at developers, will provide a glimpse into this 64-bit, multi-core SOC processor, and gives an overview of the available Intel® technologies, including Intel® HD Graphics 5300.
5 Dec 2014 by TERENCE S
The Intel SDK for OpenCL Applications provides a rich mix of OpenCL extensions and optional features that are designed for developers who want to utilize all resources available on Intel CPUs. This article focuses on device fission, available as a feature in this SDK.
2 Feb 2015 by Android on Intel
This tutorial will guide you through Intel® INDE 2015 installation and demonstrate how to develop native Android* applications that target either x86 based or ARM based processors.
12 Feb 2015 by Optimistic76
Hello ,I'm trying to integrate OpenCascade Framework in Qt application but I'm facing real problems.I tried to embed opencascade code in QOpenGLWidget but nothing shows.does anyone have some examples to show me?thank you
18 Mar 2015 by Intel
This tutorial demonstrates how to share surfaces between OpenCL™ and DirectX 11 with Intel ® Processor Graphics on Microsoft Windows, using the surface sharing extension in OpenCL.
7 Apr 2015 by Android on Intel
The intention of this guide is to provide quick steps to create, build, debug, and analyze OpenCL™ applications with the OpenCL™ Code Builder, a part of Intel® Integrated Native Development Environment (Intel® INDE)
26 May 2015 by Intel
In this article we are going to demonstrate how to optimize Single precision floating General Matrix Multiply (SGEMM) kernels for the best performance on Intel® Core™ Processors with Intel® Processor Graphics.
16 Sep 2015 by Intel
In this article, we will introduce the components of INDE and show how developers can use them to create new applications and optimize existing applications. To start with Intel® INDE provides support for IDE integration.
1 Oct 2015 by Android on Intel
This article walks through an example Android application that offloads image processing using OpenCL™ and RenderScript programming languages.
22 Nov 2015 by John Michael Hauck
It has never been easier for C# desktop developers to write code that takes advantage of the amazing computing performance of modern graphics cards.
1 Mar 2016 by Android on Intel
In this article we are going to do a walkthrough of how to do CPU-bound offline analysis of the workflow.
21 Mar 2016 by KarstenK
If you really need that installation you must reinstall it. Install the newest SDK which fulfills your needs.Tip 1: You shouldnt touch the registry, but first run once the official deinstallation.Tip 2: deactivate the UAC or set it at a low level for the installation.
19 Apr 2016 by Android on Intel
In this guide, we will show a variety of tools to use as well as features in the Unity software that can help you enhance the performance of your Unity project.
1 Jun 2016 by Android on Intel
Intel® System Studio 2017 Beta has been released. This is the Beta program page which guides you further on Intel® System Studio 2017 Beta new features and enhanced usability experience.
14 May 2017 by Mahdi Nejadsahebi
Have a good timei have a problem in OpenCL 1.2.Look, i have an array as global in the kernel and the group size is 1000.The problem is that the atomic_add() function doesn't work correctly.My kernel code is :buffer[3] = 100;atomic_add(&buffer[3], 1);if i create 1000...
26 Mar 2017 by Richard MacCutchan
See OpenCL 2.0 Atomics Overview[^].
14 May 2017 by tugrulGtx
``` buffer[3] = 100; ``` is a race condition. Do it from CPU host side or another kernel.
14 Aug 2017 by Intel
Intel is uniquely positioned for AI development—the Intel’s AI Ecosystem offers solutions for all aspects of AI by providing a unified front end for a variety of backend technologies, from hardware to edge devices.
29 Aug 2017 by Javier Luis Lopez
I have done a VS2013 project to test opencl at github OpenCL dir: GitHub - jlopez2022/cpp_utils: Example of c++ programs[^] In that example I calculated differential rms of a big vector (200mega size), then on CPU and debug mode it calculated at 100 Megaops/data At CPU and release mode the...
31 Aug 2017 by Javier Luis Lopez
To obtain data resume from results like following code: int k = get_global_id(0); double result=d[k]*d[k]; It must be used reductions that is very difficult to perform and reduces the code cleariness as said in following link:...
31 Aug 2017 by Javier Luis Lopez
I tried the posted code. My idea was to obtain partial sums of input data on array rms, then make barriers (GLOBAL and LOCAL) to wait until all rms[k] are filled, then sum all them to obtain the media value. I placed some printf to advises if there are errors in the calculus. I obtained errors...
31 Aug 2017 by Intel
Intel® GO™ SDK Offers Automotive Solution Developers an Integrated Solutions Environment
4 Sep 2017 by Javier Luis Lopez
I am not very happy with this solution: Opencl 1.2 does not allow synchronize across all work groups, as I stated, so it must be going out of kernel and enter in a new one to use data from all work items. If somebody know how to do it in the new openCL 2.x standard I would appreciate it. ...
13 Sep 2017 by Javier Luis Lopez
It is possible to copy float arrays to float4? I do not know if a float4 array elements can be aligned with a float array to copy them. I tried this but failed to compile: What I have tried: #define WD2 WIDTH/4 __global float A[WIDTH*HEIGHT]; ... __local float4 B[WD2];...
13 Sep 2017 by Javier Luis Lopez
Finally vloadn works, but unfortunately I have to copy to only one vector, not an array of them: __global float* imagen0 ... long pix = get_global_id(0); if (pix==0) { float16 vv=vload16(0,imagen0); printf("===GPU vv: %6v16f \n",vv); } if (pix
18 Sep 2017 by Intel
OpenCL™ Drivers and Runtimes for Intel® Architecture
24 Oct 2017 by Intel
The Intel® Computer Vision SDK is a new software development package for development and optimization of computer vision and image processing pipelines for Intel System-on-Chips (SoCs).
23 Feb 2018 by Member 13680125
Boosting Efficiency and Performance for Automotive, Networking, and Cloud Computing
3 Apr 2018 by Intel
The Retail Workshop: Hands on Learning with Intel®-based Retail Solutions
16 May 2018 by Javier Luis Lopez
It is very hard to use the GPU because the user has to do memory segmentation and transfer, the use of local memory and in the most applications very low performance increase 10-20x is reached. In other hand using multithreads is easy and fast. It would be better use 1280 threads in parallel...
16 May 2018 by KarstenK
It is depending on what you want to do. Even multithreading isnt optimal, when a lot of short threads are running because multithreading means also overhead in the CPU. Graphical output and low level computations are best done on GPU, computations also when the usage of the GPU leads to less...
11 Nov 2018 by Javier Luis Lopez
I program in GPUs using OpenCL but I would be happy with a easier system to parallelize the program What of them implies less code to be changed to introduce in GPU? C++ amp and Trust allows run several functions sequentially inside the GPU before returning results? What I have tried: I made...
11 Nov 2018 by tugrulGtx
but a lot of work must be done added to the parallelizing the algorithm that is the system preparision, buffer ctreation, data transfers If you are re-writing whole OpenCL stages for every new programming work, then you need to shorten the writing part by using OOP such as creating an object...
10 Feb 2019 by tugrulGtx
OpenCL 2.0 already has reductions for thread groups. work_group_reduce_add() this command adds all participant threads' elements into single value and broadcasts it to all participant threads. Its execution time will be comparable to an optimized custom algorithm so it can get automatically...
23 Jun 2020 by PontiacGTX
I am trying to compile an opencl project where I expect an output buffer to be assigned through a cl_mem object but when clEnqueueReadBuffer executes the std::vector items in the array aren't assigned the source code for the host in c++...
15 Sep 2020 by Member 12087553
Hello, I have been reviewing OpenCL Library in C# for what will use in my project. But I don't know how to get calculated data from OpenCL Library. public static void RunGPU() { try { EasyCL cl = new EasyCL() { ...
15 Sep 2020 by Richard MacCutchan
You will most likely get a quicker response at OpenCL Overview - The Khronos Group Inc[^]
5 Oct 2020 by Member 12087553
I have been developing a project. it is developed with C#, OpenCL.NET. Aforge, OpenCV. I have a big problem. it is I can't release Memory on OpenCL.NET. I tried release() and dispose() but there weren't released. So I need the method that can...
28 Aug 2021 by Richard MacCutchan
Please try to do your own research: OpenCL coding - Google Search[^]
29 Aug 2021 by OriginalGriff
X_train is a parameter, and is declared as: __global int * restrict X_train __kernel void KNN_classifier(__global int * restrict X_train, __global int * restrict Y_train, __global int * restrict data_point, int k) The method you are...
29 Aug 2021 by Richard MacCutchan
Look at your declaration of Euclidean_distance : inline float Euclidean_distance(int * array_point_A, int * array_point_B) { // both parameters are pointers to arrays And now the call: array_dist[i] = Euclidean_distance(X_train[i], ...
3 Sep 2021 by prother123
I have wrote the kernel code that will be run on FPGA device. Currently, I am writing the host code which will be run on CPU. In the last meeting with my professor, he told me that the data (arrays in my case) in the host should not be located in...