PERFORMANCE METRICS OF OBJECT DETECTION

6 min readJun 30, 2021

INTRODUCTION

The problem of object detection can be categorized as detecting where an object is, in a particular image and thereafter classifying that image belonging to a particular category.

IOU (Intersection Over Union)

The metric on which the performance of the algorithm is judged is based on a concept of Intersection Over Union which is also denoted as IOU. This is a measure of overlap between two sets A and B which is given by: -

where, A = Area of the box which detects the presence of object by the algorithm,

B = Area of the box detecting the object according to the box drawn around the object by humans, also known as the ground truth.

It is very similar to set theory in which we can find the intersection and the union of two sets and then get the ratio of those as the IOU measure. The below picture would clarify the concept.

The above diagram shows the set theoretic concept but with the help of two boxes representing the actual place where the object is present and the prediction according to which the object is present.

Validity of IOU

The prediction of an object of class C is assumed to be correct if the IOU score is greater than or equal to 0.5 otherwise it is assumed to be incorrect.

Mean Average Precision (mAP)

There are multiple objects of the same class and/or different classes in an image. So, there is a need to have a final metric for judging the performance of an algorithm across all classes in all the images.

In order to achieve one single metric for judging the performance of the algorithm, mAP, also known as Mean Average Precision is used which is the mean of the average precision achieved for each of the classes.

Average Precision

Average Precision is calculated for each of the classes which is the area under the Precision-Recall Curve for that class in all the training images for various values of thresholds or probability scores.

Thresholds or Probability Scores

The algorithm after getting trained on the training images gives the probability of an object of a particular class C lying within a bounding box. This is termed as the threshold or the probability score.

Precision and Recall

Precision :- It is a value between 0 and 1 which indicates how well the algorithm has performed in predicting the right class out of the total predictions made by the algorithm.

Mathematically, Precision is the ratio of True Positives and the sum of True Positives and the False Positives.

Recall :- It is a value between 0 and 1 which denotes how well the algorithm has predicted the right positive class wrt the total number of positive classes which is actually present in the data or with respect to the context.

Mathematically, Recall is the ratio of True Positives and the sum of True Positives and False Negatives.

The precision and recall can also be understood better in the given image below:

Predictions

The predictions by the object detection algorithm are classified as True Positives, False Positives and False Negatives.

True Positives:-

The prediction hits the bull’s eye i.e. the predicted box around the object overlaps perfectly with the actual labeled box around the object.

There’s a dog in the picture and the red box which is the prediction by the algorithm overlaps the ground truth or the green box giving an IOU more than 0.5

False Positives:-

The predicted box around the object and the actual box around the object are not overlapping perfectly, causing a lower IOU lesser than 0.5.

Consider the above image containing two dogs. The encircled image has the prediction in red and the original image in green. Both these boxes doesn’t have an overlap of 0.5 or 50% which actually evaluates to an IOU of less than 0.5. This is the case of a False Positive Prediction.

False Negatives:-

The case of a False Negative prediction holds when the ground truth or the manually labelled object is completely missed out by the algorithm i.e. the algorithm doesn’t even think that there is an object present in that region.

True Negatives:-

In the case of Object Detection, there is no concept of a True Negative as there is no way one can specify what is the negative class or image wrt to a category of classes and hence, no way to learn predictions for the same.

Precision — Recall Curve

This curve as suggested is a plot between various values of precision and recall which are derived from the data for various values of thresholds or probability scores actually given by the algorithm for the detected images of a particular class.

Whenever an algorithm detects an image by giving a bounding box around it, it also gives a probability of an object within that bounding and the probability of a particular class of object within that bounding box

Example Toy Data of Images

The predicted images are sorted in descending order in terms of the probability scores.

After sorting the probability of thresholds for the predictions, we get a table like this

Confidence scores sorted in descending order

In the next step, we calculate the precision and recall for every confidence score or probability score and put that in a table

The Precision is calculated by taking a cumulative number of True Positives and False Positives encountered for probabilities greater than the current one in consideration as per a IOU threshold which could be at least 0.5 or even greater than that as per the business requirement.

The Recall is calculated for every predicted image and changes if the image is classified as True Positive because that would change the cumulative number of true positives encountered till now. The total number of actual positives doesn’t change in the data as it is the number of manually labeled bounding boxes. So, the denominator in the calculation of recall doesn’t change.

PR — Curve

Below is the Precision — Recall curve plotted for various thresholds for a particular class of object (dog in this case) and the area under the curve gives the average precision for the class dog.

We get the average precision for each of the class of objects in the training data and then we calculate the mAP (Mean Average Precision) by taking the average of the precisions obtained for each class.

Please note that mAP could be different based on the IOU threshold which is chosen as per the application to be designed. So, the different mAPs are represented for different IOU thresholds as mAP@0.5:0.55:0.6:095 and so on.