Cheetah Optimizer - Python Implementation

Ujwal Watgule

5.00/5 (1 vote)

Oct 1, 2023

CPOL

8 min read

10575

287

The Cheetah Optimizer is a nature-inspired metaheuristic algorithm designed to tackle complex optimization problems across multiple dimensions.

Download source code - 33.2 KB

Introduction

The Cheetah Optimizer has been implemented in MATLAB and is available on the MATLAB Central File Exchange. It offers a range of functions and has been meticulously designed to tackle complex optimization problems across multiple domains.

Recently, while working on one ML project, I wanted to enhance performance of SVM model. (To understand basic concept of SVM, you can go through my introductory article Understanding SVM published in DEV Community). I was looking for latest metaheuristic algorithm which can improve performance of SVM model. I came across Cheetah Optimizer in MATLAB Central. But given that my project is in Python and Python version is not available, I developed it in Python.

You can refer to my Medium article to understand basic concept about Metaheuristics Algorithms.

In this article, I’ll demonstrate Cheetah Optimizer implementation in Python. You can consider any dataset of your choice. For the sake of simplicity, I have considered one Diabetes dataset.

Background

In optimization and machine learning, various optimization algorithms and techniques are used to find the best solution to a given problem. Common optimization algorithms include gradient descent, genetic algorithms, particle swarm optimization, and many others. These algorithms are applied to tasks such as training machine learning models, finding the optimal configuration of parameters, and solving complex optimization problems in various domains.

You can read more about Cheetah optimizer here.

Using the Code

Below is the stepwise implementation of Cheetah Optimizer in Python.

1. Import Required Libraries

import numpy as np
import pandas as pd
from sklearn.svm import SVC  # for Support Vector Classifier
from sklearn.metrics import accuracy_score

Many of you might have been aware about the above imported libraries. However, for those who are not aware, you can refer to the below explanation.

import pandas as pd: This line imports the Pandas library and gives it the alias "pd." Pandas is a powerful data manipulation and analysis library in Python, often used for handling structured data in data science and machine learning tasks.

from sklearn.svm import SVC: Here, we are importing the SVC (Support Vector Classifier) class from the Scikit-Learn library. Scikit-Learn is a popular Python library for machine learning and provides various tools for tasks like classification, regression, clustering, etc. SVC is a class that allows you to create and train Support Vector Machine (SVM) models for classification.

import numpy as np: This line imports the NumPy library and gives it the alias "np." NumPy is another fundamental library for numerical and mathematical operations in Python. It provides support for handling arrays and matrices, which are often used in machine learning for data manipulation and mathematical operations.

from sklearn.metrics import accuracy_score: This line imports the accuracy_score function from Scikit-Learn's metrics module. accuracy_score is used to calculate the accuracy of a classification model's predictions by comparing them to the true labels. It's a common metric for evaluating the performance of classification models.

2. Read Dataset

data = pd.read_csv("diabetesdataset.csv") # change dataset path with your local path
X = data.iloc[:, 2:-1].values # Features 
y = data.iloc[:, 1].values # Labels

X = data.iloc[:, 2:-1].values: This line extracts the features from the DataFrame data and stores them in a NumPy array called X. It uses .iloc to select specific columns from the DataFrame. In this case, it selects all rows (denoted by :) and columns starting from the 3^rd column (index 2) up to the last column (excluding the last column denoted by -1). These selected columns are considered as features for the machine learning model.

y = data.iloc[:, 1].values: This line extracts the labels from the DataFrame data and stores them in a NumPy array called y. It selects all rows (denoted by :) from the 2^nd column (index 1) of the DataFrame. These selected values are considered as the labels or target variable for the machine learning model.

3. Set Target Variable

y = data['CLASS']

In machine learning tasks, we have a DataFrame with multiple columns, where one column represents the target variable (the thing we want to predict) and the others represent features (the attributes used to make predictions). In this case, y is being set to the labels (the "CLASS" column), which will typically be used as the target variable when training and evaluating machine learning models (in our case, its SVC).

4. Split Data

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

from sklearn.model_selection import train_test_split: This line imports the train_test_split function from Scikit-Learn's model_selection module. This function is commonly used to split a dataset into training and testing sets to assess the performance of machine learning models.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42): This line splits the dataset into four subsets by taking X and y parameters which are X: This is an array containing the feature data. y: This is an array which holds the target variable.

The train_test_split function takes these arrays as input and returns four subsets:

X_train: This is the subset of features used for training the machine learning model.
X_test: This is the subset of features used for testing the machine learning model.
y_train: This is the subset of labels corresponding to X_train and is used for training the model.
y_test: This is the subset of labels corresponding to X_test and is used for evaluating the model's performance.

test_size=0.2: This specifies that 20% of the data will be reserved for testing, and the remaining 80% will be used for training.

random_state=42: This sets a random seed for reproducibility. Setting random_state ensures that the same random split is generated each time you run the code. You can use any integer value for random_state.

5. Define Objective Function

def objective_function(params):
    # Ensure params contains two values
    if len(params) != 2:
        raise ValueError("params must contain two values: C and gamma")    
    C, gamma = params    

    # Create and fit an SVM model with the parameters
    model = SVC(C=C, gamma=gamma)
    model.fit(X_train, y_train)
    
    # Predict the test labels and compute the accuracy score
    y_pred = model.predict(X_test)
    score = accuracy_score(y_test, y_pred)
    
    # Return the negative score as we want to minimize it
    return -score

This objective function is used in hyperparameter tuning tasks. In our case, we want to optimize hyperparameters C and gamma for an SVM model. The goal is to find the values of C and gamma that result in the highest accuracy on the test data, or equivalently, the lowest negative accuracy score.

6. Initialize Cheetah

# Define the bounds of the parameters as a numpy array
bounds = np.array([[0.01, 100], [0.0001, 10]])
n_cheetahs = 30 
n_iterations = 50
# Initialize the cheetah positions randomly within the bounds
cheetahs = np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_cheetahs, len(bounds)))
# Initialize the best position and score
best_position = None 
best_score = -np.inf

cheetahs is a NumPy array of shape (n_cheetahs, len(bounds)). It initializes the positions of the "cheetahs" randomly within the specified parameter bounds.

np.random.uniform(bounds[:, 0], bounds[:, 1], size=(n_cheetahs, len(bounds))) generates random values for each cheetah's position within the specified bounds. Each row in the array corresponds to a cheetah, and each column corresponds to a parameter.

best_position is initially set to None, indicating that the best position found so far is unknown.

best_score is initially set to negative infinity (-np.inf) to ensure that any initial solution found in the optimization process will be considered an improvement. The goal is to maximize a score, so initializing it to negative infinity ensures that the first solution encountered will be considered the best until better solutions are found.

7. Set Constants

# The probability of choosing searching or sitting-and-waiting strategy 
alpha = 0.5 
# The probability of choosing attacking strategy 
beta = 0.5 
# The probability of leaving the prey and going back home
delta = 0.5

8. Execute Optimization Function

# Run the optimization loop
for i in range(n_iterations): 
    # Evaluate the objective function for each cheetah 
    scores = np.array([objective_function(cheetah) for cheetah in cheetahs])

This loop repeatedly evaluates the objective function for each "cheetah" in the optimization process for a specified number of iterations (n_iterations). The scores obtained for each "cheetah" at each iteration will be used to guide the optimization algorithm in updating the positions of the "cheetahs" and potentially finding a better solution as the optimization progresses.

# Update the best position and score if needed
if np.max(scores) > best_score:
    best_score = np.max(scores)
    best_position = cheetahs[np.argmax(scores)]

It checks if the maximum score obtained in the current iteration (np.max(scores)) is greater than the previous best score (best_score). If it is, then it updates best_score to the new maximum score, and best_position to the position of the "cheetah" that achieved the best score in this iteration.

# Print the current iteration and best score
print(f'Iteration {i+1}: Best score = {-best_score}')

# Create a new array to store the updated positions
new_cheetahs = np.zeros_like(cheetahs)

# Loop through each cheetah
for j in range(n_cheetahs):
    # Generate a random number between 0 and 1
    r = np.random.rand()
    
    # If r < alpha, choose searching or sitting-and-waiting strategy
    if r < alpha:
        # Generate another random number between 0 and 1
        s = np.random.rand()
        
        # If s < 0.5, choose searching strategy
        if s < 0.5:
            # Choose a random function from a pool of functions
            f = np.random.choice([np.sin, np.cos, np.tan])
            
            # Update the position by following a leader using the function
            leader = cheetahs[np.argmax(scores)]
            new_cheetahs[j] = leader + f((i+1) / n_iterations) * (leader - cheetahs[j])
        
        # Else, choose sitting-and-waiting strategy
        else:
            # Update the position by adding a random perturbation
            new_cheetahs[j] = cheetahs[j] + 
                              np.random.uniform(-0.01, 0.01, size=len(bounds))
    
    # Else if r < alpha + beta, choose attacking strategy
    elif r < alpha + beta:
        # Update the position by moving towards the best position with a random factor
        new_cheetahs[j] = cheetahs[j] + np.random.rand() * 
                          (best_position - cheetahs[j])
    
    # Else, choose leaving-the-prey-and-going-back-home strategy
    else:
        # Update the position by moving back to the initial position 
        # with some random perturbation
        new_cheetahs[j] = cheetahs[0] + 
                          np.random.uniform(-delta, delta, size=len(bounds))

Select a Strategy Based on Random Probability (alpha and beta): It generates a random number r between 0 and 1. Depending on the value of r, it selects one of three strategies:

Searching or Sitting-and-Waiting Strategy: If r < alpha, it further generates a random number s between 0 and 1. If s < 0.5, it chooses a searching strategy and updates the position using a random mathematical function applied to a leader's position. Otherwise, it chooses a sitting-and-waiting strategy.

Attacking Strategy: If alpha <= r < alpha + beta, it chooses an attacking strategy and updates the position by moving towards the best position found so far (best_position) with a random factor.

Leaving-the-Prey-and-Going-Back-Home Strategy: If r >= alpha + beta, it chooses a strategy of leaving the prey and going back home. It updates the position by moving back to the initial position (the position of the first "cheetah") with some random small value.

# Clip the position to respect the bounds

new_cheetahs[j] = np.clip(new_cheetahs[j], bounds[:, 0], bounds[:, 1])

After updating the positions, it clips the positions to ensure they remain within the specified parameter bounds (bounds[:, 0] and bounds[:, 1]).

# Replace the old positions with the new ones
cheetahs = new_cheetahs

Finally, it replaces the old positions of the "cheetahs" with the newly calculated positions stored in the new_cheetahs array.

9. Finding Best Position and Score

# Print the final best position and score
print(f'Final best position: C = {best_position[0]}, gamma = {best_position[1]}')
print(f'Final best score: {-best_score}')

cValue = best_position[0] 
gammaValue = best_position[1]

Output

10. Training SVM Model

# Import the SVC class from sklearn.svm
from sklearn.svm import SVC
# Create an SVM model with the optimal C and gamma values
model = SVC(C = cValue, gamma = gammaValue)

This line creates an instance of the SVC class and configures it with specific hyperparameters:

C: The regularization parameter, cValue, is used to control the trade-off between maximizing the margin (large C) and minimizing the classification error (small C).

gamma: The kernel coefficient, gammaValue, determines the shape of the decision boundary. A small gamma leads to a more flexible decision boundary, while a large gamma makes it more rigid.

# Fit the model on the training data
model.fit(X_train, y_train)

This line trains (fits) the SVM model on the training data. The training data consists of feature vectors stored in X_train and their corresponding labels stored in y_train. The fit method adjusts the model's parameters to learn a decision boundary that best separates the classes in the training data.

Output

11. Check for Model Accuracy

# Import the accuracy_score function from sklearn.metrics
from sklearn.metrics import accuracy_score

# Predict the labels of the test data
y_pred = model.predict(X_test)

# Calculate the accuracy of the model
accuracy = accuracy_score(y_test, y_pred)

# Print the accuracy
print(f"The accuracy of the model is {accuracy:.2f}")

Output

Conclusion

Thus we have implemented Cheetah Optimizer algorithm in Python. Through Cheetah optimizer, we have found best rather optimal values of C and gamma parameters which are required for efficient and accurate result prediction through SVM machine learning model.

References

You can learn more about Cheetah Optimizer through the below articles:

The Cheetah Optimizer (CO) - https://optim-app.com/projects/co
Cheetah Optimizer - Seyedali Mirjalili (2023). Cheetah Optimizer (https://www.mathworks.com/matlabcentral/fileexchange/130404-cheetah-optimizer), MATLAB Central File Exchange. Retrieved September 30, 2023.

You can download the source code here.

History

30^th September, 2023: Initial version