CUDA

Great Reads

CodeProject.AI Server: AI the easy way.

by CodeProject

Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language

Nimbus SDK: A Tiny Framework for C++ Real-Time Volumetric Cloud Rendering, Animation and Morphing

by Carlos Jiménez de Parga

A reusable Visual C++ framework for real-time volumetric cloud rendering, animation and morphing

H.264 CUDA Encoder DirectShow Filter in C#

by Maxim Kartavenkov

Article describes how to make H.264 Video Encoder DirectShow Filter using NVIDIA encoder API in C#

CUDA Programming Model on AMD GPUs and Intel CPUs

by Nick Kopp

This article builds upon the earlier High Performance Queries: GPU vs. PLINQ vs. LINQ and ports this to also support OpenCL devices and adds benchmarking so you can easily compare performance.

Latest Articles

CodeProject.AI Server: AI the easy way.

by CodeProject

Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language

How to Move from CUDA Math Library Calls to oneMKL

by Robert Mueller-Albrecht

Using the Intel® oneAPI Math Kernel Library SYCL API

CudaPAD

by Ryan Scott White

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of your Cuda code.

Nimbus SDK: A Tiny Framework for C++ Real-Time Volumetric Cloud Rendering, Animation and Morphing

by Carlos Jiménez de Parga

A reusable Visual C++ framework for real-time volumetric cloud rendering, animation and morphing

All Articles

top

CUDA

A Brief Test on the Code Efficiency of CUDA and Thrust

27 Jun 2010 by Wayne Wood

Verify the execution efficiency of a short CUDA program when using the library thrust

VC9.0

C++

Win7

A Brief Test on the Efficiency of a .NET 4.0 Parallel Code Example

1 Jul 2010 by Wayne Wood

Verify the execution efficiency of a series of short .NET 4.0 parallel programming samples

A C# SMTP server (receiver)

17 Sep 2012 by ObiWan_MCC

A C# SMTP server (receiver).

A Neural Network on GPU

13 Mar 2008 by billconan, kavinguy

This article describes the implementation of a neural network with CUDA.

Accelerating Neural Networks with Binary Arithmetic

3 May 2017 by Intel

In this blog post, we highlight one particular class of low precision networks named binarized neural networks (BNNs), the fundamental concepts underlying this class, and introduce a Neon CPU and GPU implementation.

CUDA

hardware

machine-learning

Avoiding the Trials and Tribulations of CUDA Development on Windows

8 Sep 2010 by Dan Buskirk

Understanding the organization of a Visual Studio project for CUDA development

Base64 Encoding on a GPU

16 Sep 2013 by Nick Kopp

Performing base64 encoding on a graphics processing unit using CUDAfy.NET (CUDA in .NET).

CodeProject.AI Server: AI the easy way.

29 Feb 2024 by CodeProject

Version 2.6.2. Our fast, free, self-hosted Artificial Intelligence Server for any platform, any language

CUDA

artificial-intelligence

Convert Xilinx FPGA/CPLD to C Source

28 May 2011 by grilialex

Flow and tools to convert Xilinx bitstreams to C source code for programming FPGA/CPLD

CUDA Programming Model on AMD GPUs and Intel CPUs

16 Sep 2013 by Nick Kopp

This article builds upon the earlier High Performance Queries: GPU vs. PLINQ vs. LINQ and ports this to also support OpenCL devices and adds benchmarking so you can easily compare performance.

CUDA

nvidia

OpenCL

CudaPAD

28 Mar 2023 by Ryan Scott White

CudaPAD is a PTX/SASS viewer for NVIDIA Cuda kernels and provides an on-the-fly view of your Cuda code.

Deep Learning on Windows: A Getting Started Guide

14 Sep 2016 by Mike Lanzetta

In this post, I'll walk you through how to get one of the most popular toolkits up and running on Windows, and run through and explain some fun examples.

DirectShow Filters Development Part 3: Transform Filters

15 Mar 2011 by Roman Ginzburg

A text overlay filter and a JPEG/JPEG2000 encoder using transform filters.

Distributed Computing in Small and Medium Sized Offices

25 Oct 2010 by hax_

Introduction to the open-source hxGrid library for distributed computing. Main benefits of the library: cluster uses only idle time of Windows 2000/XP/Vista workstation (no dedicated workstations required); easy to use; free.

Extended GMM for Background Subtraction on GPU

10 Jan 2011 by phoaivu

GPU Implementation of Extended Gaussian mixture model for Background Subtraction

Facial biometric authentication on your connected devices

22 Jul 2016 by Afzaal Ahmad Zeeshan

In this post, I am going to walk you through creating your own central hub to allow your connected devices to authenticate people using facial recognition system.

HTML

CUDA

.NET

artificial-intelligence

Fast Image Blurring with CUDA

10 Sep 2009 by ChaoJui

High performance and good quality of image blurring

Faster JPEG2000 Viewer

6 Jan 2014 by Adam Wojnar

Simple .jp2/.j2k viewer using Kakadu executables demonstration pack for decoding

GCN Assembler for AMD GPUs

17 Apr 2016 by Ryan Scott White

an assembler/compiler for AMD’s GCN (Generation Core Next Architecture) Assembly Language

Getting Started with Intel® Software Optimization for Theano and Intel® Distribution for Python

4 May 2017 by Intel

Theano is a Python library developed at the LISA lab to define, optimize, and evaluate mathematical expressions, including the ones with multi-dimensional arrays (numpy.ndarray)

GPGPU on Accelerating Wave PDE

13 Oct 2012 by Alesiani Marco

A Wave PDE simulation using GPGPU capabilities

GPGPU Papyrus Demo

22 May 2013 by John Michael Hauck

It has never been easier for C# desktop developers to write code that takes advantage of the amazing computing performance of modern graphics cards. In this post I will share some techniques for solving a simple (but still interesting) image analysis problem. Source Code https://www.assembla.com/co

CUDA

nvidia

OpenCL

GPU Computing Using CUDA, Eclipse, and Java with JCuda

21 Sep 2013 by Mark H Bishop

Tutorial: GPU computing with JCuda and Nsight (Eclipse)

GPU-Quicksort in OpenCL 2.0: Nested Parallelism and Work-Group Scan Functions

20 Jan 2015 by Android on Intel

This tutorial shows how to use two powerful features of OpenCL™ 2.0: enqueue_kernel functions that allow you to enqueue kernels from the device and work_group_scan_exclusive_add and work_group_scan_inclusive_add

H.264 CUDA Encoder DirectShow Filter in C#

16 Jul 2012 by Maxim Kartavenkov

Article describes how to make H.264 Video Encoder DirectShow Filter using NVIDIA encoder API in C#

High Performance Queries: GPU vs. PLINQ vs. LINQ

16 Sep 2013 by Nick Kopp

How to get 30x performance increase for queries by using your Graphics Processing Unit (GPU) instead of LINQ and PLINQ.

High-performance finite elements with C#

25 Jul 2016 by Igor Gribanov

Performing linear static analysis on a tetrahedral mesh with a little bit of help from a third-party solver.

How to Build OpenCV for Windows with CUDA

2 Nov 2018 by Vangos

This post will show you how to build OpenCV for Windows with CUDA.

How to Get Started with TensorFlow

24 Oct 2017 by Packt Publishing

In this section, we'll take our first steps in using the low-level TensorFlow API.

How to Move from CUDA Math Library Calls to oneMKL

27 Jun 2023 by Robert Mueller-Albrecht

Using the Intel® oneAPI Math Kernel Library SYCL API

HyCuda, a Hybrid Framework Code-Generator for CUDA

18 Dec 2013 by Joren Heit

A Hybrid Framework Code-Generator for CUDA

CUDA

C++

templates

Image Filters Using CPU, GPU, and C++ AMP

10 Jul 2012 by Kerem Kat

Process webcam images on the CPU and GPU with OpenCV, CUDA and C++ AMP

Implementing Parallel Scalable Distribution Counting Algorithm (DCA) with CUDA 8.0 Runtime API

9 Dec 2016 by Arthur V. Ratz

In this article, we'll demonstrate an approach the allows to increase the performance (up to 600%) of the code that implements the conventional distribution counting algorithm (DCA) using NVIDIA CUDA 8.0 Runtime API

Improving the Performance of Mask R-CNN Using TensorRT

10 Dec 2018 by Apriorit Inc, Vadym Zhernovyi

The experience of improving Mask R-CNN performance six to ten times by applying TensorRT

CUDA

artificial-intelligence

Install Cuda and Use Managed Code with VS 2008 Express on Windows 7-x64

24 Dec 2012 by Mark H Bishop

Getting Cuda started on a VS Express budget

Introduction to Keras

22 Jun 2020 by Thomas Daniels

In this article, let’s dive into Keras, a high-level library for neural networks.

CUDA

Python

artificial-intelligence

Large Scale Machine Learning using NVIDIA CUDA

9 Mar 2012 by Adnan Boz

From spam filters to movie recommendation and face detection, nowadays machine learning algorithms are used everywhere to make the machine think for us. But, running these algorithms require high computation power and in most cases supercomputers. This is where the 500 core GPUs step in...

Learn How To Do Alphablending with CUDA

1 Sep 2009 by ChaoJui

Image processing with a burst of performance from CUDA

Learning Modern OpenGL

20 Sep 2015 by Bartlomiej Filipek

A little guide about modern OpenGL and why it gives us so much value.

Loop Unrolling over Template Arguments

10 May 2010 by Kevin Drzycimski

Unroll loops at compile time, deduced by a template argument.

Nightmare on (Overwh)Elm Street: The 64-bit Calling Convention

13 May 2017 by CMalcheski

64-bit calling convention

Nimbus SDK: A Tiny Framework for C++ Real-Time Volumetric Cloud Rendering, Animation and Morphing

3 Apr 2022 by Carlos Jiménez de Parga

A reusable Visual C++ framework for real-time volumetric cloud rendering, animation and morphing

odeint v2 - Solving ordinary differential equations in C++

19 Oct 2011 by headmyshoulder

odeint v2 - Solving ordinary differential equations in C++

OWASP #6 Preventing Sensitive Data Exposure in ASP.NET – Part 1

16 Feb 2016 by Max R McCarty

OWASP's #6 most vulnerable security risk has to do with keeping secrets secret.

Parallel Computations in C#

1 Oct 2008 by Andrew Kirillov

This article describes the implementation of parallel computations using plain C#.

Parallel Radix Sort on the GPU using C++ AMP

9 Feb 2013 by Debdatta Basu

Examine the various approaches to implementing Radix sort on the GPU

Parallel Scalable Burrows-Wheeler Transformation (BWT) Using Intel® Threading Building Blocks and OpenMP Libraries

2 May 2017 by Arthur V. Ratz

This article is a practical guide on using Intel® Threading Building Blocks (TBB) and OpenMP libraries for C++ based on the example of delivering parallel scalable code that implements Burrows-Wheeler Transformation (BWT) algorithm.

Parallelised Monte Carlo Algorithms #1

26 May 2014 by CatchExAs

How to make best use of current technology for computationally intensive applications?

Part 6: Primitive Restart and OpenGL Interoperability

2 Apr 2012 by manythreads

This sixth article in a series on portable multithreaded programming using OpenCL™ where Rob Farber discusses how to calculate data in OpenCL™ and render it with OpenGL within the same application.

Permutations with CUDA and OpenCL

12 Apr 2016 by Shao Voon Wong

Finding lexicographical permutations on GPU

Port a CUDA App to oneAPI and DPC++ in 5 Minutes

10 Nov 2020 by Jeremy C. Ong

A quick 5-minute introduction to porting a CUDA app to Data Parallel C++ (DPC++)

Pure .NET DirectShow Filters in C#

13 Oct 2012 by Maxim Kartavenkov

Article describes how to make DirectShow Filters in .NET, it consist of BaseClasses and couple of samples

QOR Architecture Aspect

11 Jul 2013 by Matthew Faithfull

Querysoft Open Runtime: Architecture compatibility aspect.

Raytracing From CUDA to SYCL 2020 via DPC++

16 Jan 2021 by Shao Voon Wong

How to convert a code from parallel C++ ray-tracing code to CUDA, then to SYCL 2020 via Intel® DPC++

Setting Up Your Android Development Environment

3 Aug 2014 by Sushil Sh.

How to setup android development enviornment using eclipse and Android studio.

CUDA

Android

Mobile

Small class representing DateTime in seconds elapsed since "01 Jan, 0001 00:00:00"

24 Jun 2005 by Philippe Kirsanov

A small class representing DateTime in seconds elapsed since "01 Jan, 0001 00:00:00".

Solving ordinary differential equations with OpenCL in C++

26 Jul 2012 by headmyshoulder, Denis Demidov

This article shows how ordinary differential equations can be solved with OpenCL. In detail it shows how odeint - a C++ library for ordinary differential equations - can be adapted to work with VexCL - a library for OpenCL. The resulting performance is studied on two examples.

The Implementation of Model Constraints in .NET - II

24 Aug 2003 by Alex Mikunov

Runtime MSIL Code Instrumentation and .NET Metadata Extensions

Theano Machine Learning on a GPU on Windows 10

30 Nov 2016 by Dino Konstantopoulos

Running Theano with an Nvidia 1070 GPU on Windows 10, with CUDA 8 and Visual Studio 2015

Thoughts About Threads

18 Sep 2017 by Intel

TotalView includes a set of tools that provide scientific and academic developers with controlover processes and thread execution, along with deep visibility into program states and data.

Time Series Analysis in C#.NET: Part II

21 May 2012 by Jeff B. Cromwell

Granger Causality in both R and C#.NET with open source libraries.

CUDA

Windows

artificial-intelligence

Visual-Studio

Design