Click here to Skip to main content
15,887,294 members
Articles / XGBoost
Article

Faster XGBoost, LightGBM, and CatBoost Inference on the CPU

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
18 Dec 2023CPOL3 min read 2.6K   2  
Accelerate Your XGBoost, LightGBM, and CatBoost Inference Workloads with Intel® oneAPI Data Analytics Library (oneDAL)

This article is a sponsored article. Articles such as these are intended to provide you with information on products and services that we consider useful and of value to developers

Despite the recent popularity of generative AI, using gradient boosting on decision trees remains the best method for dealing with tabular data. It offers better accuracy compared to many other techniques, even neural networks. Many people use XGBoost*, LightGBM, and CatBoost gradient boosting to solve various real-world problems, conduct research, and compete in Kaggle competitions. Although these frameworks give good performance out of the box, prediction speed can still be improved. Considering that prediction is possibly the most important stage of the machine learning workflow, performance improvements can be quite beneficial.

The following example shows how to convert your models to get significantly faster predictions with no quality loss (Table 1).

Python
import daal4py as d4p 
d4p_model = d4p.mb.convert_model(xgb_model) 
d4p_prediction = d4p_model.predict(test_data) 

Image 1

Table 1. Comparing daal4py inference performance to stock XGBoost*, LightGBM, and CatBoost. The color coding indicates the best and worst cases in each column, with higher values indicating better performance.

Updates to oneDAL

It has been a few years since our last article on accelerated inference (Improving the Performance of XGBoost and LightGBM Inference), so we thought it was time to give an update on changes and improvements since then:

  1. Simplified API and alignment with gradient boosting frameworks
  2. Support for CatBoost models
  3. Support for missing values
  4. Performance improvements

Simplified API and Alignment with Gradient Boosting Frameworks

Now you have only convert_model() and predict() methods. You can easily integrate this into existing code with minimal changes:

Image 2

You can also convert the trained model to daal4py:

XGBoost:

BAT
d4p_model = d4p.mb.convert_model(xgb_model)

LightGBM:

BAT
d4p_model = d4p.mb.convert_model(lgb_model)

CatBoost:

BAT
d4p_model = d4p.mb.convert_model(cb_model)

Estimators in the scikit-learn* style are also supported:

Python
from daal4py.sklearn.ensemble import GBTDAALRegressor 
reg = xgb.XGBRegressor() 
reg.fit(X, y) 
d4p_predt = GBTDAALRegressor.convert_model(reg).predict(X)

The updated API lets you use XGBoost, LightGBM, and CatBoost models all in one place. You can also use the same predict() method for both classification and prediction, just like you do with the main frameworks. You can learn more in the documentation.

Support for CatBoost Models

Adding CatBoost model support in daal4py significantly enhances the library's versatility and efficiency in handling gradient boosting tasks. CatBoost, short for categorical boosting, is renowned for its exceptional speed and performance, and with daal4py acceleration, you can go even faster. (Note: Categorical features are not supported for inference.) With this addition, daal4py covers three of the most popular gradient boosting frameworks.

Support for Missing Values

Missing values are a common occurrence in real-world data sets. This can happen for a number of reasons, such as human error, data collection issues, or even the natural limitations of an observation mechanism. Ignoring or improperly handling missing values can lead to skewed or biased analyses, which ultimately degrade the performance of machine learning models.

Missing values support is now introduced in daal4py 2023.2 version. You can use models trained on data containing missing values and use data with missing values on inference. This results in more accurate and robust predictions while streamlining the data preparation process for data scientists and engineers.

Performance Improvements

Many additional optimizations have been added to oneDAL since our last performance comparisons (Table 2).

Image 3

Table 2. Performance improvements in version 2023.2 compared to 2023.1 and stock XGBoost*

All tests were conducted using scikit-learn_bench running on an AWS* EC2 c6i.12xlarge instance (containing Intel® Xeon® Platinum 8375C with 24 cores) with the following software: Python* 3.11, XGBoost 1.7.4, LightGBM 3.3.5, CatBoost 1.2, and daal4py 2023.2.1.

Your contributions are welcome to extend coverage for new cases and other improvements.

How to Get daal4py

daal4py is the Python interface of the oneDAL library and is available on PyPi, conda-forge, and conda main channels. This is a fully open-sourced project, and we welcome any issues, requests, or contributions to be done via GitHub repo. To install the library from PyPi, simply run:

BAT
pip install daal4py

Or the conda-forge variant:

BAT
conda install -c conda-forge daal4py --override-channels

These results show that by making a simple change in your code, you can make your gradient-boosting data analysis tasks much faster on your current Intel® hardware. These improvements in performance can lower computing costs and help you get results more quickly — all without sacrificing quality.

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
You may know us for our processors. But we do so much more. Intel invents at the boundaries of technology to make amazing experiences possible for business and society, and for every person on Earth.

Harnessing the capability of the cloud, the ubiquity of the Internet of Things, the latest advances in memory and programmable solutions, and the promise of always-on 5G connectivity, Intel is disrupting industries and solving global challenges. Leading on policy, diversity, inclusion, education and sustainability, we create value for our stockholders, customers and society.
This is a Organisation

42 members

Comments and Discussions

 
-- There are no messages in this forum --