How to Calculate Focal Loss of DL Model in PyTorch?

PyTorch is used to build Deep Learning models in Artificial Intelligence using neural network algorithms. The Neural Networks are structured based on the working of neurons placed within multiple layers. These models hugely depend on the dataset as the model learns the hidden patterns or insights from the data. There are multiple evaluation methods like loss, recall, accuracy, and others to check the performance of the model on the unseen data.

The loss values are the problems in the prediction values of the model to the actual data. The actual data contains the observed values from the history. The model is used to predict the future based on the input data. Focal Loss is the method offered by the PyTorch framework to calculate the inaccuracy of the model and then optimize its performance.

Quick Outline

This guide explains the following sections:

What is Focal Loss in PyTorch
How to Calculate Focal Loss of DL Model in PyTorch

What is Focal Loss in PyTorch

Focal loss function is used to evaluate the performance of the model and the model can be made better with minor tweaks. The user can change the learning rate to make the model efficient by looking at the loss value and the focal loss also considers the complexities in the dataset. The focal loss specifically considers the imbalance classes in the data while evaluating the loss value. The mathematical representation of the focal loss in deep learning models is as follows:

1-Dt: Using the non-dominant classes of the data

?: Finds the complexities of the samples dataset?

log(Dt): Uses the Cross-Entropy loss for the dominant class

How to Calculate Focal Loss of DL Model in PyTorch

The focal loss value in PyTorch can be used to enhance the performance of the deep learning model by training it through multiple iterations. Build the neural network and train it to extract the insights from the dataset to predict better in the future with minimum loss. To learn the process of its implementation, simply go through the following steps:

Note: The Python code used to implement these examples is attached here

Step 1: Access Python Notebook

The first requirement for implementing the Python code is to access a notebook like Jupiter, Google Colab, etc. This guide uses the Google Colab notebook that can be created by clicking on the “New Notebook” button from its official website:

Step 2: Install Modules

Once the notebook is created, use the following code to install the “tensorflow_addons” module to call the focal loss method. Also, install the “prettytable” module to display the results in tabular form using the pip command:

!pip install tensorflow_addons!pip install prettytable

Step 3: Import Libraries

Now, it is required to build the neural network model and enhance its performance using the focal loss value in PyTorch:

import torch
import torch.nn as nn
import sklearn
from sklearn.model_selection import train_test_split
import pandas as pd
import numpy as np
import tensorflow_addons as tfa
import keras
import tensorflow as tf
from sklearn.preprocessing import StandardScaler
from torchvision import transforms

Import the torch library with its neural network dependency as the nn keyword to use its functions for building the DL model.
Import the sklearn library with its model_selection dependency to split the dataset into the training and testing data.
Also, use the preprocessing dependency of sklearn to import the StandardScaler to apply feature engineering or clean the data for the model.
Import other libraries like numpy, pandas, Keras, and TensorFlow to work through different steps in this guide.

Step 4: Loading Dataset

To train the model, we need to build or download the dataset from public libraries like Kaggle and other sources. The dataset used in this guide to train the neural network model is accessible from here. Click on the “Download” button from the web page and store it on the system by choosing the location on the system:

The downloaded dataset has fraud cases for credit card transactions made by European customers and it contains 248,807 rows in total with 492 fraud cases. The problem statement here can be how to find fraud transactions using the deep learning model with the focal loss function.

In the Google Colab, click on the folder button from the left panel and then hit the upload button to select the dataset from the computer:

Once the dataset is loaded to the colab session, use the read_csv() method with the path of the dataset to get its content for the model. After that, print the number of samples with the total length of the data to check the size of the dataset:

df_raw = pd.read_csv('/content/creditcard.csv')
n_samples = len(df_raw)
print(f'Num of samples: {n_samples}')

Use the data frame to get the positive samples from the dataset with their ratio according to the whole dataset:

df_pos = df_raw[df_raw['Class'] == 1]
n_pos_samples = len(df_pos)
pos_ratio = 100 * n_pos_samples / n_samples
print(f'Num of positive samples: {n_pos_samples} ({pos_ratio:.2f}% of total)')

The above code displays total number of fraud transactions in the dataset with its percentage or ratio from the dataset as displayed in the screenshot below:

Step 5: Create the Training, Testing, and Validation Data

Before training data, it is better to clean the data from the empty cells or hugely diverse values to make the data in a normalized form:

df_cleaned = df_raw.copy()
df_cleaned.pop('Time')
df_cleaned['log-amount'] = np.log(df_cleaned.pop('Amount') + 0.001)

df_train, df_test = train_test_split(df_cleaned, test_size = 0.2, shuffle = True)
df_train, df_valid = train_test_split(df_train, test_size = 0.2, shuffle = True)
print(f'Size of training data: {len(df_train)}')
print(f'Size of validation data: {len(df_valid)}')
print(f'Size of test data: {len(df_test)}')

Take the raw data and make its copy and apply changes to the copy so that the original dataset stays the same.
Now, use the copied dataset to remove particular fields using the pop() method and save the file.
After that, split the cleaned data in the train_test_split() and store the training and testing datasets.
Get the valid dataset which is 20 percent of the training dataset to make sure that the model predicts properly.
Print the length of all the datasets made while splitting the cleaned file which is training, testing, and validation sets:

Step 6: Applying Feature Engineering

Once the datasets are created, simply get their labels or features (Field/column names in the dataset) from the training, testing, and validation sets:

labels_train = np.array(df_train.pop('Class'))
labels_valid = np.array(df_valid.pop('Class'))
labels_test = np.array(df_test.pop('Class'))
features_train = np.array(df_train)
features_valid = np.array(df_valid)
features_test = np.array(df_test)

Call the StandardScaler() method in the scalar variable to transform all the features extracted from the datasets created previously:

scaler = StandardScaler()
features_train = scaler.fit_transform(features_train)
features_valid = scaler.transform(features_valid)
features_test = scaler.transform(features_test)

features_train = np.clip(features_train, -5, 5)
features_valid = np.clip(features_valid, -5, 5)
features_test = np.clip(features_test, -5, 5)
n_features = features_train.shape[-1]

Apply the StandardScaler() method to normalize the data set by removing the deviation from the datasets.
Call the fit_transform() method with the feature_train variable to get the mean and standard deviation values.
Fit the scalar to the training data and transform the datasets to store the features of the training, testing, and valid data.
The transformation of all the sets using the scalar to make sure that all datasets are scaled consistently.
After that, call the clip() method with all the datasets to normalize their values within the range from -5 to 5.
Lastly, store the number of features for the training data within the specific dimensions using the n_features variable.

Step 7: Applying Loss Functions

Use the tensorflow_addons library to use the SigmoidFocalCrossEntropy() method with the values of alpha and gamma as the arguments:

opt = tf.keras.optimizers.Adam(learning_rate = 1e-3)
metrics = [
    tf.keras.metrics.TruePositives(name = 'tp'),
  tf.keras.metrics.FalseNegatives(name = 'fn'),
    tf.keras.metrics.Recall(name = 'recall')
]
losses = [
    tfa.losses.SigmoidFocalCrossEntropy(gamma = 2, alpha = 4)
]
loss_names = [
    'focal loss'
]
logs_loss = []
logs_recall = []

Use the Adam() method with the learning rate to enhance the performance of the model using the optimizers dependency of the Keras library.
Get the metrics of the dataset like important features for the loss value like true positive, false negative, and recall values.
True positive(TF) values are true values in the input data and the model also predicted them as positive.
Whereas, false negatives (FN) are true originally but the model predicted them as false.
After that, use the “tfa” keyword to call the SigmoidFocalCrossEntropy() method and store it in the losses variable.

Step 8: Building the Model

Build the neural network model using the Sequential() method after importing the layers dependency from the Keras library:

import keras.layers

for loss in losses:
    model = keras.Sequential()
    model.add(keras.layers.Dense(16, input_dim = n_features, activation = 'relu',
        kernel_initializer = 'he_uniform'))
    model.add(keras.layers.Dropout(0.5))
    model.add(keras.layers.Dense(1, activation = 'sigmoid'))
    model.compile(optimizer = opt, loss = loss, metrics = metrics)

Use the for loop on the losses variable to apply the Sequential() method and store it in the model variable.
Add the neural network using the Dense() method with the number of neurons for each layer and their activation functions.
Lastly, compile the model using the optimizer, loss, and metrics arguments configured previously.

Step 9: Training the Model

Once the model is compiled, simply train the model with multiple iterations by setting the value for the epoch argument:

logs = model.fit(features_train, labels_train, validation_data = (features_valid,
        labels_valid), epochs = 5, verbose = 1)
logs_loss.append(logs.history['loss'])
logs_recall.append(logs.history['recall'])
eval_train = model.evaluate(features_train, labels_train, verbose = 0)
eval_test = model.evaluate(features_valid, labels_valid, verbose = 0)

Fit the model for the features and labels for training, testing and validation datasets using the history keyword to store the loss values for all the iterations.
Use the verbose argument with value 1 to make sure that each epoch is displayed with values like loss, true positive, false negative, and recall:

Step 10: Testing the Model

After training the model, get the results in the form of a table to get the evaluation values for testing and training datasets:

import prettytable

table = prettytable.PrettyTable()
table.field_names = ['Data', 'Loss', 'TruePositives', 'FalseNegatives', 'Recall']

for stage, eval_info in zip(('training', 'test'), (eval_train, eval_test)):
  row = [stage]
  for ii, lbl in enumerate(model.metrics_names):
    row.append(f'{eval_info[ii]:.3f}')
  table.add_row(row)
  print('\n')
  print(table)

Import the prettytable library to use the tabular structure and display the evaluated results of the model on the screen.
Set the field names of the table before using the for loop to extract the latest values for all the fields and store them in the tabular form.
After that, use the loop to get the evaluated values in their respective fields for testing and training rows separately.
The False Negative column has 0 values which means that the model has not predicted any instance falsely.
The recall evaluation method is used to find how many positive instances the model has predicted correctly:

Step 11: Plotting the Loss Values

Finally, plot the graph for the loss values of the complete dataset for each iteration using the line graph:

import matplotlib.pyplot as plt

plt.plot(logs_loss[0], label = 'Focal loss')plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.title('Focal Loss')
plt.show()

The following screenshot displays the line graph for the focal loss values extracted by the Sequential model throughout the iterations. The loss value started right above -42 before it started dropping with each iteration and finished around -46 after 5 iterations:

That’s all about the process of calculating the focal loss value for the trained deep learning in PyTorch.

Conclusion

To calculate the Focal Loss value for the trained neural network model in PyTorch, simply import the required libraries before loading the dataset. After that, split the data into training, testing, and validation sets to convert them into a normalized form. The normalized form enables the model to work properly and the model will get better accuracy across multiple iterations. Finish things off with the results of the model displayed in the tabular form or using the graphical visualization like line graphs, etc.