Azure Arc Machine Learning Part 3: Inference Anywhere with Azure Arc-Enabled Machine Learning

MehreenTahir

5.00/5 (2 votes)

Feb 4, 2022

CPOL

7 min read

5872

In this article we learn to deploy our model and enable inference anywhere with Azure Arc-enabled ML.

Download source - 2.4 MB

In the previous article of the series, we learned how to train a machine learning (ML) model on an Arc-enabled Kubernetes cluster. Once the model completes training, we should ideally be able to deploy and run it where it makes the most sense for our use case.

We might run it on-premises, but we might also run it on Azure. One of the significant advantages of using Arc-enabled ML is that it doesn’t force us to choose one over the other. Instead, it allows us to run our model and enable inference anywhere, whether on the edge, on-premises or in Azure.

In this article, we’ll demonstrate deploying a model locally, then in Azure. You can view the full project code on GitHub.

Deploying an ML Model Using a Managed Online Endpoint

Online endpoints enable us to deploy and run our ML models in a scalable and managed way. They work with powerful CPU and GPU-enabled machines in Azure, allowing us to serve, scale, secure, and monitor our models without the overhead of setting up and managing the underlying infrastructure. The endpoints quickly process small requests and provide near real-time responses.

Prerequisites

To deploy our model using managed online endpoints, we must ensure we have:

An active Azure subscription. It’s free to sign up and access popular services for a year with $200 credit.
Installed and configured the Azure CLI
Installed the ML extension to Azure CLI by executing the following command: $ az extension add -n ml -y
An Azure resource group with contributor access
An Azure Machine Learning (AML) workspace
Docker installed on our local system since we need it for local deployment
Set the default settings for the Azure CLI

We set the default settings for the Azure CLI with the following commands to avoid passing in the values for our subscription workspace and resource group:

$ az account set --subscription <subscription ID>
$ az configure --defaults workspace=<Azure Machine Learning workspace name> group=<resource group>

Note: You should already have a resource group and AML workspace if you've been following the article series. Use the previously-created resources to set up your system for model deployment.

Once we’ve gone through the above steps, our system is ready for deployment. To deploy our model using online endpoints, we’ll need the following:

Trained model files or name and version of the model registered in our workspace
A scoring file to score the model
A deployment environment where our model runs

Let’s start by creating a model on Azure.

Creating a Model on Azure

There are many ways to create ML resources on Azure. We can choose to train and register our ML model on AML or register some previously-trained models using a CLI command. In the previous article of this series, we trained and registered the model with Azure, but we could use a CLI command.

To register a model on Azure, we need a YAML file containing the model specifications. We can use the following YAML template file to create a resource-specific YAML file.

$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json 
name: {model name}
version: 1 
local_path: {path to local model}

The above YAML file starts by specifying a schema for the model. It also specifies attributes for the Azure model such as name, version, and a path to the location of the model locally. We can change the attributes to match a particular model.

Once we have the YAML files containing the model specifications, we execute the following command to create the model on Azure.

$az ml model create -f {path to YAML file}

After we register our model on Azure, we should be able to see it in AML studio, under Models.

Creating a Scoring File

The endpoint calls a scoring file for actual scoring and prediction when invoked. This scoring file follows a prescribed structure and must have an init function and a run function. The endpoint calls the init function every time the endpoint is created or updated, while the endpoint calls the run function whenever it is invoked.

Let’s create a scoring file and take a deeper look at these methods by creating a score.py file on the local system and entering the following code for scoring the fashion-mnist-tf model.

import keras as K
import os
import sys
import numpy as np
import json
import tensorflow as tf
import logging

labels_map = {
0: 'T-Shirt',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle Boot',
}

def predict(model: tf.keras.Model, x: np.ndarray) -> tf.Tensor:
y_prime = model(x, training=False)
probabilities = tf.nn.softmax(y_prime, axis=1)
predicted_indices = tf.math.argmax(input=probabilities, axis=1)
return predicted_indices

def init():
global model
print("Executing init() method...")
if 'AZUREML_MODEL_DIR' in os.environ:
model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), './outputs/model/model.h5')
else:
model_path = './outputs/model/model.h5'


model = K.models.load_model(model_path)
print("loaded model...")

def run(raw_data):
print("Executing run(raw_data) method...")
data = np.array(json.loads(raw_data)['data'])
data = np.reshape(data, (1,28,28,1))
predicted_index = predict(model, data).numpy()[0]
predicted_name = labels_map[predicted_index]

logging.info('Predicted name: %s', predicted_name)

logging.info('Run completed')
return predicted_name

In the above code, the predict function takes the data and returns the label index with the highest probability. The init function is responsible for loading the model. The AZUREML_MODEL_DIR environment variable gives us the path to the model’s root folder in Azure. If the environment variable is not available, it loads the model from the local machine.

The run function is responsible for prediction. It takes raw_data as a parameter containing the data we pass when invoking the endpoint. The run function loads the JSON data, transforms it, calls the predict function on the data, returns the predicted results, and converts them into human-readable form by mapping the result to our defined labels.

With our scoring file in place, let’s create the environment for deployment.

Creating a Deployment Environment

The AML environment specifies the runtime and other configurations for training and prediction jobs in Azure. Azure provides various options for inference, including pre-built Docker images or base images for custom environments. Here, we use a base image provided by Microsoft and extend it manually with all the packages required for inference on Azure.

Before we create our deployment environment, we must set the endpoint name with the following command:

$ export ENDPOINT_NAME="azure-arc-demo"

We also must create a YML file containing the endpoint specifications, like schema and name. We create an endpoint.yml file as follows:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: azure-arc-demo
auth_mode: key

The next step is to define our deployment environment dependencies. Since we use a base image, we must extend the image using a Conda YAML file containing all the dependencies. We create a conda.yml file and add the following code to it:

name: test
channels:
       - conda-forge
dependencies:
       - python
       - numpy
       - pip
       - tensorflow
       - scikit-learn
       - keras
       - pip:
             - azureml-defaults
             - inference-schema[numpy-support]

This file contains all the dependencies required for inference. Note that we also added an azureml-defaults package, which we need for inference on Azure. Once in place, we create a deployment.yml file as follows:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: fashion-mnist
type: kubernetes
endpoint_name: azure-arc-demo
app_insights_enabled: true
model: 
name: fashion-mnist-tf
version: 1
local_path: "./outputs/model/model.h5"
code_configuration:
code: 
local_path: .
scoring_script: score.py
instance_type: defaultinstancetype
environment:
conda_file: ./conda.yml
image: mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04:latest

In the above code, we set a few variables like endpoint_name, model, scoring script, and conda_file, ensuring that the paths lead to the directory where our files are.

Once we’ve created all the above files, it’s time to deploy the model.

Deploying the Model Locally

Before deploying the model, we first create the endpoint by executing the following command:

$ az ml online-endpoint create --local -n azure-arc-demo -f ./fashion-mnist-deployment/endpoint.yml

Next, we deploy the model as follows:

$ az ml online-deployment create --local -n fashion-mnist --endpoint azure-arc-demo -f ./fashion-mnist-deployment/deployment.yml

The above command creates a local deployment named fashion-mnist under the azure-arc-demo endpoint. We verify the deployment with this command:

$ az ml online-endpoint show -n azure-arc-demo --local

Our deployment is successful, and we can now invoke it for inference.

Invoking the Endpoint for Inference

Once the deployment is successful, we can invoke the endpoint for inference by passing our request. But before that, we must create a file containing the input data for prediction. We can quickly get an image file from our test dataset, convert it to JSON, and pass it to the endpoint for prediction. We create a sample request using the following code.

import json
import tensorflow as tf

#Fashion MNIST Dataset CNN model development: https://github.com/zalandoresearch/fashion-mnist
from keras.datasets import fashion_mnist
from tensorflow.keras.utils import to_categorical

# load the data
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()


#no. of classes
num_classes = 10

x_test = x_test.astype('float32')
x_test /= 255
y_test = to_categorical(y_test, num_classes)

with open('mydata.json', 'w') as f:
json.dump({"data": x_test[10].tolist()}, f)

The above code loads a single image file, creates a matrix containing the image’s pixel values, adds it to the JSON dictionary with critical data, and saves it in a mydata.json file.

Now that we have our sample request, we invoke the endpoint as follows:

$ az ml online-endpoint invoke --local --name azure-arc-demo --request-file ./fashion-mnist-deployment/mydata.json

The endpoint responds with the prediction in near real-time.

Deploying the Online Endpoint to Azure

Deploying our endpoint to Azure is just as easy as deploying it to a local environment. When deploying the model locally, we used the --local flag, which directs the CLI to deploy the endpoints in the local Docker environment. Simply removing this flag allows us to deploy our model on Azure.

We execute the following command to create the endpoint in the cloud:

az ml online-endpoint create --name $ENDPOINT_NAME -f endpoints/online/managed/sample/endpoint.yml

$ az ml online-endpoint create --name azure-arc-demo -f ./fashion-mnist-deployment/endpoint.yml

After creating the endpoint, we can see it in AML studio under Endpoints.

Next, we create a deployment named fashion-mnist under the endpoint.

$ az ml online-deployment create --name fashion-mnist --endpoint azure-arc-demo –f ./fashion-mnist-deployment/deployment.yml --all-traffic

The deployment can take some time, depending on the underlying environment. Once the deployment is successful, we can invoke the model for inference as follows:

$ az ml online-endpoint invoke --name azure-arc-demo --request-file ./fashion-mnist-deployment/mydata.json

The above command returns the same output as our locally-deployed model.

Summary

In this article, we learned to deploy the model locally and on Azure. The model deployment process would be the same even if we had an Arc-enabled Kubernetes cluster on another cloud provider or at the edge. Arc-enabled machine learning allows us to seamlessly deploy and run our model regardless of hardware and location.

This article series is an introductory guide for starting with Arc-enabled machine learning. Undoubtedly, Kubernetes unlocks many doors to optimize the machine learning process. At the same time, Arc makes the whole process smooth and gives us the freedom of training and running our machine learning models on the hardware of our choice.

To pursue powerful, scalable machine learning, sign up for a free Azure Arc account.

To learn more about how to configure Azure Kubernetes Service (AKS) and Azure Arc-enabled Kubernetes clusters for training and inferencing machine learning workloads, check out Configure Kubernetes clusters for machine learning.