oneDPL Empowers Your C++ Application for Cross-Device Parallel Programming with SYCL

Robert Mueller-Albrecht

0/5 (0 vote)

Jun 26, 2023

CPOL

6 min read

2351

This article gives you a brief introduction of oneDPL, including a glimpse of its component APIs, and walks you through a preliminary code sample showing a practical implementation of the library.

The concept of heterogenous computing puts a lot more horsepower at the fingertips of present-day developers. By distributing various workloads to multiple varied computing cores such as CPUs, GPUs and FPGAs, heterogenous computing applications achieve better performance and higher energy efficiency.

The SYCL* programming model allows you to leverage heterogeneous compute for C++ applications based on open industry standards. It provides you with the freedom to target desirable combinations of architectures for new as well as legacy C++ code. While using diverse hardware, you can further enhance the performance of applications by concurrent execution of operations across those devices.

If you are aiming to achieve data parallelism with a fine blend of C++ and SYCL features, Intel’s open source implementation of SYCL within the oneAPI standard framework is ready to assist! With the combined advantages of standard C++ and SYCL, this implementation is called oneDPL and it unlocks Intel’s high-level direct programming language for parallel programming across accelerated architectures. It incorporates not only standard C++ constructs such as lambda functions and templates, but also SYCL specifications.

This blog gives you a brief introduction of oneDPL, including a glimpse of its component APIs, and walks you through a preliminary code sample showing a practical implementation of the library.

What is oneDPL?

Short for Intel® oneAPI DPC++ Library (DPC++ is the oneAPI implementation of SYCL), oneDPL is an extension of the C++ STL for parallel and cross-device programming that allows you to create heterogeneous solutions with accelerated SYCL kernels on CPUs, GPUs, and FPGAs, achieving maximum productivity and high performance across these architectures. Adding parallelized C++17 algorithms and accelerated SYCL kernels on top of the C++ STL API, it can achieve performance speed-up for the same applications that you worked on using core C++ programming.

oneDPL allows you to do cross-architecture programming with custom iterators and algorithm extensions of Parallel STL (PSTL) and Boost.Compute* parallel-computing libraries, and its integration with Intel® DPC++ Compatibility Tool enables easy migration of CUDA* application code to SYCL code.

SYCL code implemented using oneDPL components is compiled by the Intel® oneAPI DPC++/C++ Compiler, a SYCL-compatible compiler based on Clang* that is needed to take full advantage of oneDPL’s offload extensions. (Note, if compilers such as GCC are used, certain limitations and complications are encountered.)

oneDPL Components

oneDPL basically implements C++ algorithms library using SYCL. It enables parallel programming and use of SYCL kernels through two types of component APIs:

Parallel API
SYCL Kernels API

Let us know more about these APIs.

Parallel API

This API provides a set of parallel algorithms and execution policies of standard C++ and SYCL extensions. A sequential execution of algorithms is handled by the standard C++ execution policies, whereas vectorized and parallel executions follow SYCL implementation of execution policies.

Learn more about execution policies followed by oneDPL algorithms.

The utility API also provides special iterators such as counting_iterator, zip_iterator and permutation_iterator. As an extension of the C++17 blocking parallel algorithms, oneDPL provides some asynchronous algorithms with non-blocking behavior. In support of the C++20 Ranges library, oneDPL provides range factories¹ and range adapters². Additionally, oneDPL supports range-based algorithms and some miscellaneous algorithms.

SYCL Kernels APIs

oneDPL provides a subset of the standard C++ APIs that are tested to be used with SYCL kernels. Some of them are std::swap, std::lower_bound, std::upper_bound, std::log, std::copy and std::transform. The complete list is long.

Check out the complete list of SYCL kernel APIs.

oneDPL also allows random number generation (RNG) using SYCL kernels. It provides engines for generating unsigned integer sequences of random numbers (pseudo-random numbers). Completing its RNG support, it provides random number distributions that allocate the outputs of random number engines according to a statistical probability density function.

In addition, oneDPL provides classes that define utility function objects such as identity, minimum and maximum. These function objects implement standard C++ functions such as std::identity, std::less and std::greater respectively.

Explore the details of oneDPL utility function object classes.

oneDPL Code Example: Stable Sort

oneDPL functions can be used to implement a variety of algorithms such gamma correction, convex hull property, dot product, and random number generation. Some of these applications are readily available for you in the form of GitHub code samples.

Now that you have an idea about available oneDPL component APIs, let us have a look at a comprehensive code sample from oneDPL GitHub repository that shows how stable sorting by key can be performed using oneDPL API functions. It sorts two sequences – one of keys and the other of values, where keys are compared but both the entities are swapped.

The stable sorting code sample uses special iterators counting_iterator and zip_iterator included in the <oneapi/dpl/iterator> header. This and other necessary header files for standard algorithms and execution policies are also included as follows:

#include <oneapi/dpl/algorithm> 
#include <oneapi/dpl/execution> 
#include <oneapi/dpl/iterator>

Two separate buffers – one for the keys and one for the values are created as:

sycl::buffer<int> keys_buf{n};   //keys buffer 
sycl::buffer<int> vals_buf{n};   //values buffer

Where “n” is defined as an integer constant (here 1000000) large enough to accommodate a heavy bulk of key-values pairs. For simplicity, let us consider n=10 for the time being so that you can better understand the sorting inputs and expected outputs.

A oneDPL helper function oneapi::dpl::begin passes the oneDPL buffers of keys and values to oneDPL algorithms used further in the sorting process.

auto keys_begin = oneapi::dpl::begin(keys_buf); 
auto vals_begin = oneapi::dpl::begin(vals_buf);

Algorithms are made to follow oneDPL default execution policy:

auto policy = oneapi::dpl::execution::dpcpp_default;

Using a counting_iterator and std::copy method, the values buffer is filled with numbers starting from 0 to (n-1). Whereas the keys buffer is filled with {n, n, n-2, n-2, n-4, n-4,…,6, 6, 4, 4, 2, 2}.

For n=10, the buffers contents will be as follows:

Keys - {10, 10, 8, 8, 6, 6, 4, 4, 2, 2}

Values - {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

The stable sorting by key technique sorts both the above sequences based on ascending order of the keys buffer content. For multiple keys with the same content, ascending order of the values is followed. Thus, the expected sorted output for n=10 will look like:

Keys: {2, 2, 4, 4, 6, 8, 8, 10, 10}

Values: {8, 9, 6, 7, 4, 5, 2, 3, 0, 1}

In general, for any value of n (say n=1000000 as taken in the code sample), following is the expected output:

Keys: {2, 2, 4, 4, n-2, n-2, n, n}

Values: {n-2, n-1, n-4, n-3,…,2, 3, 0, 1}

Using the make_zip_iterator function, a zip_iterator is constructed that wraps the iterators for keys and values buffers.

auto zipped_begin = oneapi::dpl::make_zip_iterator(keys_begin, vals_begin);

The standard C++ function std::stable_sort then applies stable sorting by key algorithm to both the iterators keys_begin and vals_begin. It processes both these iterators wrapped in the zipped_begin iterator as key-value pairs.

std::stable_sort(policy, zipped_begin, zipped_begin+n, [](auto lhs, auto rhs) {return std::get<0>(lhs) < std::get<0>(rhs);});

This is how oneDPL can be used for stable sorting by key. Check out the source code and complete code sample for more information.

Are you interested in trying your hands on more of such practical applications of oneDPL? Visit the GitHub code samples and dive deeper into this amazing library!

Next Steps

Get started with oneDPL today and acquaint yourself with its robust capabilities to incorporate both standard C++ and SYCL features. Check out the resources given below and head on to develop cross-architecture SYCL applications with oneDPL!

Useful Resources

Get The Software

You can install stand-alone version of oneDPL or download it as part of the oneAPI Base Toolkit. You can also play with oneDPL workloads across Intel® CPUs and GPUs on Intel® Developer Cloud.

Acknowledgement

We would like to thank Chandan Damannagari for his contribution to this blog.