Click here to Skip to main content
15,885,767 members
Articles / General Programming / Performance
Article

Improve AI Application Performance and Portability with Automatic Device Plugin

Rate me:
Please Sign up or sign in to vote.
4.20/5 (4 votes)
23 Jun 2022CPOL7 min read 3.4K   3   1
New features in the Intel® OpenVINO™ toolkit enable you to easily optimize for throughput or latency, and help you to “write once, deploy anywhere.”

This article is a sponsored article. Articles such as these are intended to provide you with information on products and services that we consider useful and of value to developers

Image 1

One of the challenges of AI has been catering for the wide range of compute devices that might be used for inference. The OpenVINO™ toolkit helps to speed up AI applications, by providing a runtime that takes advantage of hardware features in the processor, GPU, or Vision Processing Unit (VPU). It abstracts away the complexity of writing for different architectures, while enabling developers to unlock the performance features of their target platform.

Now, with the 2022.1 release of OpenVINO, the Automatic Device Plugin (AUTO) makes it easy to target different devices. It can automatically select the most suitable target device, and configure it appropriately to prioritize either latency or throughput. The new plugin also includes a feature to accelerate first inference latency.

What Is AUTO in the OpenVINO Toolkit?

The Automatic Device Plugin (or AUTO for short) is a new virtual proxy device in OpenVINO that doesn’t bind to a specific type of hardware device.

When you choose AUTO as your target platform, OpenVINO automatically discovers the accelerators and hardware features of the platform, and chooses which one to use to achieve your goals. You provide hints in the configuration API to tell OpenVINO to optimize for latency or throughput, depending on the application.

The benefits of using AUTO include:

  • Faster application development: Using AUTO, there is no need for the application to include the logic for detecting, choosing, or configuring the compute devices.
  • Improved application portability: Because applications do not need to include dedicated code for a particular device, or even select devices from a list, the application is more portable. Not only is it easier to run the application on other platforms today, but it also enables the application to take advantage of new generations of hardware as they become available.
  • Faster application startup: AUTO enables applications to start quickly by using the CPU while other target platforms, such as GPUs, are still loading the AI network. When the network has loaded, AUTO can switch inference over to the GPUs.
  • Uses hints instead of configuration: Using AUTO, there’s no need to provide device-specific configuration. Instead, you can express performance hints to prioritize either latency or throughput. AUTO takes care of choosing the best device. OpenVINO provides configurations for developers to choose from, for example to use multiple cores in parallel or use a large queue.

How to Configure AUTO in OpenVINO

AUTO is part of OpenVINO’s core functionality. To use it, either choose “AUTO” as the device name, or leave the device name out.

Here’s a C++ example:

C++
// Load a network to AUTO using the default list of device candidates.

// The following lines are equivalent:

ov::CompiledModel model0 = core.compile_model(model);

ov::CompiledModel model1 = core.compile_model(model, "AUTO");

Here’s a Python example:

Python
compiled_model0 = core.compile_model(model=model)
compiled_model1 = core.compile_model(model=model, device_name="AUTO")

Specify the Devices to Use

AUTO has an option to choose only from your preferred devices. For example, the following code shows the scenario where the CPU and GPU are the only two devices acceptable for network execution.

Here’s a C++ example:

C++
// Specify the devices and the priority to be used by AUTO in its selection process.
// The following lines are equivalent (GPU is 1st priority to be used):

ov::CompiledModel model3 = core.compile_model(model, "AUTO:GPU,CPU");
ov::CompiledModel model4 = core.compile_model(model, "AUTO", ov::device::priorities("GPU,CPU"));

Here’s a Python example:

Python
compiled_model3 = core.compile_model(model=model, device_name="AUTO:GPU,CPU")
compiled_model4 = core.compile_model(model=model, device_name="AUTO", config={"MULTI_DEVICE_PRIORITIES": "GPU,CPU"})

Provide Performance Hints

You can also, optionally, give AUTO a performance hint of either “latency” or “throughput.” AUTO then chooses the best hardware device and configuration to achieve your goal.

Here’s a C++ example:

C++
// Load a network to AUTO with Performance Hints enabled.

// To use the "throughput" mode:

ov::CompiledModel compiled_model = core.compile_model(model, "AUTO:GPU,CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));// or the "latency" mode:ov::CompiledModel compiledModel1 = core.compile_model(model, "AUTO:GPU,CPU", ov::hint::performance_mode (
ov::hint::PerformanceMode::LATENCY));

Here’s a Python example:

Python
# To use the "throughput" mode:

compiled_model = core.compile_model(model=model, device_name="AUTO", config={"PERFORMANCE_HINT":"THROUGHPUT"})

# or the "latency" mode:

Using the googlenet-v1 model on an Intel® CoreTM i7 processor , we found that using a throughput hint with an integrated GPU delivers twice the frames per second (FPS) performance compared to a latency hint¹. In contrast, using the latency hint with the GPU delivered more than 10 times lower latency than the throughput hint¹.

Note that the hints do not require device-specific settings, and are also completely portable between compute devices. That means more performance, with fewer code changes, and with less expert knowledge required.

To achieve the throughput results, the device is configured for higher utilization, for example with increased batch size, and more threads and streams. For the latency scenario, the size of the task queue and parallelization are reduced to achieve a faster turn-around time.

A full Python example of AUTO usage with performance hints is available in this OpenVINO notebook.

How Does AUTO Choose Devices?

When you use the 2022.1 release of OpenVINO, AUTO selects devices in the order shown in Table 1, depending on whether the device is available and can support the precision of the AI model. The device is selected only once when the AI network is loaded. The CPU is the default fallback device.

Image 2

Table 1: How AUTO prioritizes compute device for use with AI networks, depending on the availability of the device and the AI network’s precision

Figure 1 shows how AUTO acts as a proxy device, and selects the most appropriate device for an AI network to run on.

Image 3

Figure 1: AUTO acts as a proxy device between the applications and the devices.

Accelerating First Inference Latency

One of the key performance benefits of AUTO is in accelerating first inference latency (FIL).

Compiling the OpenCL graph to GPU-optimized kernels takes a few seconds. This initialization time may not be tolerable for some applications, such as face-based authentication.

Using a CPU would provide the shortest FIL, because the OpenVINO graph representations can be JIT-compiled quickly for the CPU. However, the CPU may not be the best platform to meet the developer’s throughput or latency goals after startup.

To speed up the FIL, AUTO uses the CPU as the first inference device until the GPU is ready (see Figure 2). The FIL with AUTO is close to that of a CPU device, even though the CPU does the inference in addition to the network compilation for the GPU. With AUTO, we have seen FIL reduced by more than 10 times compared to only using a GPU1.

Note, however, that the throughput on the CPU may be worse than with a GPU. For real-time applications where you need to meet a throughput target, the initial period of slow inference may not be acceptable. It might be preferable to wait for the model to load on the GPU. In many cases, it is recommended to use model/kernel caching for faster model loading.

Image 4

Figure 2: AUTO cuts first inference latency (FIL) by running inference on the CPU until the GPU is ready.

Debugging Using Information AUTO Exposes

If there are execution problems, AUTO provides information on exceptions and error values. Should the returned data not be enough for debugging purposes, more information may be acquired using ov::log::Level.

All major performance calls of both the runtime and AUTO are instrumented with Instrumentation and Tracing Technology (ITT) APIs. For more information, see the documentation on OpenVINO profiling and the Intel® VTune™ Profiler User Guide.

Future Releases of AUTO

In future releases, AUTO will provide more performance hints and will balance workloads at the system level, for example, by offloading the inferences of one neural network to multiple hardware devices (similar to the Multi-Device Plugin).

Summary

In summary, using the new AUTO device plugin in OpenVINO:

  • Developers don’t need to update their application logic to use the advanced features and capabilities provided by Intel’s new platforms and new versions of OpenVINO.
  • Developers can enjoy optimized performance with faster time to market.

For more information, see the AUTO documentation.

Resources

Notices and Disclaimers

Performance varies by use, configuration, and other factors. Learn more at www.intel.com/PerformanceIndex" www.intel.com/PerformanceIndex​

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available ​updates. See backup for configuration details. No product or component can be absolutely secure.​​​

Intel technologies may require enabled hardware, software or service activation.​​​​​​​​

The products described may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request.​​

¹Test Configuration

Configuration one: Intel® Core™ i7–10710U Processor with DDR4 2*16 GB at 2666MHz, Integrated GPU, OS: Windows 10 Enterprise Version 10.0.19042 Build 19042, Microsoft Visual Studio Community 2019 Version 16.11.8, Intel(R) UHD Graphics driver Version 30.0.101.1191, OpenVINO 2022.1 (zip file download), googlenet-v1 network model. Tested with notebook 106-auto-device.

Configuration two: Intel® Core™ i7–1165G7 Processor with DDR4 2*16 GB at 4,266 MHz, Integrate GPU, Intel® Iris® Xe MAX Graphics, OS: Windows 10 Enterprise Version 10.0.19042 Build 19042, Microsoft Visual Studio Community 2019 Version 16.11.10, Intel(R) Iris® Xe Graphics driver Version 30.0.101.1003 (Integrated GPU), Intel(R) Iris® Xe MAX Graphics driver Version 30.0.101.1340 (Discrete GPU), OpenVINO 2022.1 (zip file download), googlenet-v1 network model. Tested with CPP benchmark_app inside OpenVINO 2022.1.

The tests were conducted by Intel on April 20, 2022.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions