VIDEO FRAME CLASSIFICATION

Md. Wazir Ali
7 min readOct 26, 2021

TABLE OF CONTENTS:-

1. INTRODUCTION

2. PROBLEM STATEMENT

3. LIBRARIES USED

4. STRUCTURE OF THE GIVEN DATA

5. BUSINESS METRIC

6. APPROACH FOLLOWED

7. VIDEO FRAME EXTRACTION

8. GENERATION OF IMAGE BATCHES

9. MODELS USED

10. PERFORMANCE OF THE MODELS

11. GENERATING THE TEST PREDICTIONS

12. SOURCE CODE

13. REFERENCES

INTRODUCTION

A video is a sequence of frames which can be considered as still images containing some information. However, as a video is time based, so in many cases, the frames of the video can be considered to be in a particular sequence. Extraction of information from a video such as theft identification, classification of objects present in a video, detection of objects present in a video are examples of tasks which fins applications in large supermarkets, exit gates of the supermarkets and so on.

PROBLEM STATEMENT

In this problem, we are given a video and corresponding frames of the video with human generated labels. The video consists of 13 species of animals in a zoo. The task is to design a ML or DL based solution which could classify the species of animals in the video frames.

LIBRARIES USED

The various libraries used for this task are :-

  1. OpenCV
  2. Sklearn
  3. numpy
  4. pandas
  5. tensorflow.keras

STRUCTURE OF THE DATA

The following files are given as part of the given data:-

  1. Two mp4 files which shows the video of the animals in the zoo without any sound.
  2. Two .csv files train, which gives the Frame id and the corresponding human generated labels for the species of the animals and test which just has the test frame ids.

The data can be found here.

BUSINESS METRIC

The business constraint was to design the solution in such a way which would maximize the 100*micro_averaged_f1_score.

APPROACH FOLLOWED

The following steps were followed while designing the solution to the task:-

  1. Extraction of Frames from the video.
  2. Generating the images through flow_from_dataframe method of ImageDataGenerator.
  3. Extracting the visual features from the pretrained VGG16 and VGG19 models.
  4. Feeding these visual features to a Neural Network which would classify the objects into one of the 13 classes. This Neural Network would consist of Convolution Layer followed by Fully Connected Layers.

VIDEO FRAME EXTRACTION

For the task of extracting the video frames, we need to check the following:-

  1. The number of frames available in the video
  2. The duration of the video.
  3. The default Frames per second of the video.
  4. The desired number of frames from the video.

The library OpenCV was used for this task of Extraction of video frames. Below is the code snippet.

The frame rate was specified to be 1. The training video had a duration of 407 seconds and as per the frame rate, we had to extract one frame per second. The original frame rate of the video was 30. So, we incremented the count for extraction of frames to 30 after each frame was extracted. This was for the training video. For the test video, the original frame rate was 24 FPS and the count was incremented by 24 in this case for the frame extraction.

GENERATION OF IMAGE BATCHES

The images were generated from the flow_from_dataframe method of ImageGenerator class of tensorflow.keras.preprocessing.image module.

  1. This method takes the dataframe which has the path of the images and the labels of the same.
  2. It takes the column of the dataframe which specifies the path of the images as xcol and the column of the dataframe which specifies the label of the image as ycol.
  3. It takes the path where the images are present in the argument directory.
  4. It takes the batch_size which is used to generate batches as the argument batch_size.
  5. The argument shuffle specifies whether the images are to be shuffled for every batch of data.

There are other methods as flow_from_directory and flow in the same. More of it can be found here.

Below code snippet would clarify things more:-

MODELS USED

We used transfer learning from pretrained VGG16 and VGG19 models on imagenet data to generate visual features before the final FC layers of those models. Then we inserted these visual features in the final neural network with Convolutions and Fully Connected Layers. After that, we got the classification of various species of animals.

VGG16 Model:-

Model Architecture:-

Model with visual features of VGG16

VGG19 Model

The VGG19 Model was also fitted in the above way to the given data.

Model Architecture:-

Model with bottleneck features of VGG19

PERFORMANCE OF THE MODELS

Both the models were run for 15 epochs with a constant learning rate, the optimizer used was Adam and the loss was SparseCategoricalCrossEntropy. The checkpointing callback was used to checkpoint the best model according to the metric and tensorboard callback was used to plot the epoch loss and the metric. The performance metric was 100*micro_averaged_f1_score.

Best Model Performance:-

Model Performance

The VGG19 visual feature model performed slightly better than the VGG16 visual feature model.

VGG16 Tensorboard Epoch Loss and 100*F1_Score Plot

VGG19 Tensorboard Epoch Loss and 100*F1_Score Plot

GENERATING THE TEST PREDICTIONS:-

As the test frames were generated and saved earlier from the test.mp4 video, the trained VGG19 model would be used to generate the predictions. One limitation is that we don’t have the class labels for the images and the test.csv file only had the frames under frame_id column.

To overcome the above problem, all of the images were assigned a class of 1 to generate the images so that the bottleneck features can be generated out of that and those can be used for prediction from the best VGG19 model.

Here is the code for that.

# Reading the test.csv containing the name of the frames
test = pd.read_csv("test.csv")
test.head()
# Assigning the absolute paths after the current working directory
test["frame_path"] = "test_frames/" + test["Frame_ID"] #
#Assigning the label as "1" to each of the images
test["labels"] = "1"
# Generating the images using flow_from_dataframe methodImageFlow = tf.keras.preprocessing.image.ImageDataGenerator(rescale = 1./255)dir_path = "/content/drive/MyDrive/Dravya_Python_ML_DL_Challenge"Test_ImageGenerator = ImageFlow.flow_from_dataframe(test, directory = dir_path, x_col = 'frame_path', y_col = 'labels', subset = "training", target_size = (224,224), shuffle = False, seed = 5, batch_size = 32) # Generating the bottleneck features for the test data
test_bottleneck_vgg19 = pretrained_model_vgg19.predict(Test_ImageGenerator)
# Loading the best VGG19 model
VGG19_model.load_weights("VGG19_pretrained_mod/best_model.hdf5")
# Generating the predictions on the test visual features
predictions_vgg19 = VGG19_model.predict(test_bottleneck_vgg19)
# Generating the index from the one that has the maximum probability
true_predictions_vgg19 = np.argmax(predictions_vgg19, axis = 1)
# Encoding the indices to the species of the animals
fin_pred_vgg19 = []
for prediction in true_predictions_vgg19:
for k in dict_labels.keys():
if dict_labels.get(k) == str(prediction):
fin_pred_vgg19.append(k)
# Preparing the final submission dataframe
fin_submission_vgg19 = pd.DataFrame({'Frame_ID':list(test.Frame_ID.values),'labels':fin_pred_vgg19})
fin_submission_vgg19.head()# Writing the final submission file
fin_submission_vgg19.to_csv("vgg19_submission.csv", index=None)

SOURCE CODE:-

The source code can be found at my github repository. Please refer the following link:-

REFERENCES:-

--

--