Click here to Skip to main content
15,867,964 members
Articles / Artificial Intelligence / Machine Learning
Article

Introduction to pyDAAL

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
1 Jun 2017CPOL2 min read 4.4K   2  
This paper shows how the python API of the Intel® Data Analytics Acceleration Library (Intel® DAAL) tool works. First, we explain how to manipulate data using the pyDAAL programming interface and then show how to integrate it with python data manipulation/math APIs.

This article is a sponsored article. Articles such as these are intended to provide you with information on products and services that we consider useful and of value to developers

Finally, we demonstrate how to use pyDAAL to implement a simple Linear Regression solution for a prediction problem.

Data Science is a new recent field that put together lots of concepts of other areas such as: Data mining, Data Analysis, Data modeling, Data Prediction, Data Visualization and so on. The need for performing such tasks as quickly as possible has become the main issue in today's data solutions. With that in mind, the Intel DAAL, is a highly optimized library whose goal is to provide a full solution for data analytics targeting today's highly parallel systems such as Intel® Xeon Phi™ processors.

Intel DAAL delivers solutions for many steps of a data analytics pipeline, such as pre-processing, data transformations, dimensionality reduction, data modeling, prediction, and several drivers for reading and writing in most of the common data formats. A summary of all features inside the library can be seen in Figure 1.

Figure 1. Main algorithms delivered by Intel® Data Analytics Acceleration Library

As can be seen in Figure 1, all APIs are compatible with C++, Java*, and Python* (a recent addition available from version 2017 beta). Many of the algorithms implemented inside the tool can be executed in 3 main modes:

  • Batch: in this mode, the processing occurs in a serial way, e.g., the training algorithm is executed in a single node sequentially;
  • Distributed: as the name suggests, in this processing mode, the dataset must be split and distributed among the computing nodes. The algorithm then calculate partial solutions and, at the last step, unifies such solutions; and
  • Online: in this processing mode, the data is considered as being a continuous stream. The processing occurs by building incremental models, and, at the end, building a full model from the partial models.

More on the processing modes, together with additional details on Data Management and how to use pyDAAL to implement a simple Linear Regression solution for a prediction problem are covered in this whitepaper.

Source available on GitHub

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
You may know us for our processors. But we do so much more. Intel invents at the boundaries of technology to make amazing experiences possible for business and society, and for every person on Earth.

Harnessing the capability of the cloud, the ubiquity of the Internet of Things, the latest advances in memory and programmable solutions, and the promise of always-on 5G connectivity, Intel is disrupting industries and solving global challenges. Leading on policy, diversity, inclusion, education and sustainability, we create value for our stockholders, customers and society.
This is a Organisation

42 members

Comments and Discussions

 
-- There are no messages in this forum --