Name	Name	Last commit message	Last commit date
Latest commit History 21 Commits
aux_images	aux_images
lib	lib
samples/sample_2	samples/sample_2
.gitignore	.gitignore
LICENSE	LICENSE
README.md	README.md

Metrics for object detection

The motivation of this project is the lack of consensus used by different works and implementations concerning the evaluation metrics of the object detection problem. Although on-line competitions use their own metrics to evaluate the task of object detection, just some of them offer reference code snippets to calculate the accuracy of the detected objects.
Researchers who want to evaluate their work using different datasets than those offered by the competitions, need to implement their own version of the metrics. Sometimes a wrong or different implementation can create different and biased results. Ideally, in order to have trustworthy benchmarking among different approaches, it is necessary to have a flexible implementation that can be used by everyone regardless the dataset used.

This project aims to provide easy-to-use functions implementing the same metrics used by the the most popular competitions of object detection. Our implementation does not require modifications of your detection model to complicated input formats, avoiding conversions to XML or JSON files. We simplified the input data (ground truth bounding boxes and detected bounding boxes) and gathered in a single project the main metrics used by the academia and challenges. Our implementation was carefully compared against the official implementations and our results are exactly the same.

In the topics below you can find an overview of the most popular metrics and competitions, as well as samples showing how to use our code.

Different competitions, different metrics

PASCAL VOC challenge offers a Matlab script in order to evaluate the quality of the detected objects. Participants of the competition can use the provided Matlab script to measure the accuracy of their detections before submitting their results. A documentation explaining their criteria for object detection metrics can be accessed here The current metrics used by the current VOC PASCAL object detection challenge are the Precision/Recall curve and Average Precision.
The PASCAL VOC Matlab evaluation code reads the ground truth bounding boxes from XML files, requiring changes in the code if you want to apply it to other datasets or to your speficic cases. Even though projects such as Faster-RCNN implement VOC PASCAL evaluation metrics, it is also necessary to convert the detected bounding boxes into their specific format. Tensorflow framework also has their PASCAL VOC metrics implementation.
COCO challenge uses different metrics to evaluate the accuracy of object detection of different algorithms. Here you can find a documentation explaining the 12 metrics used for characterizing the performance of an object detector on COCO. This competition offers Python and Matlab codes so users can verify their scores before submitting the results. It is also necessary to convert the results to a format required by the competition.
Google Open Images Dataset V4 competition also uses mean Average Precision (mAP) over the 500 classes to evaluate the object detection task.
ImageNet Object Localization Challenge defines an error for each image considering the class and the overlapping region between ground truth and detected boxes. The total error is computed as the average of all min errors among all test dataset images. Here are more details about their evaluation method.

Important definitions

Intersection Over Union (IOU)

Intersection Over Union (IOU) is measure based on Jaccard Index that evaluates the overlap between two bounding boxes. It requires a ground truth bounding box and a predicted bounding box . By applying the IOU we can tell if a detection is valid (True Positive) or not (False Positive).
IOU is given by the overlapping area between the predicted bounding box and the ground truth bounding box divided by the area of union between them:

The image below shows the IOU between a ground truth bounding box (in green) and a detected bounding box (in red).

True Positive, False Positive, False Negative and True Negative

Some basic concepts used by the metrics:

True Positive (TP): A correct detection. Detection with IOU ≥ threshold
False Positive (FP): A wrong detection. Detection with IOU < threshold
False Negative (FN): A ground truth not detected
True Negative (TN): Does not apply. It would represent a corrected misdetection. In the detection object task there are many possible bounding boxes that should not be detected within an image. Thus, TN would be all possible bounding boxes that were corrrectly not detected (so many possible boxes within an image). That's why it is not used by the metrics.

threshold: depending on the metric, it is usually set as 50%, 75% or 95%.

Precision

Precision is the ability of a model to identify only the relevant objects. It is the percentage of positive predictions that are correct and is given by:

Recall

Recall is the ability of a model to find all the relevant cases (all ground truth bounding boxes). It is the percentage of true positive detected among all relevant ground truths and is given by:

Metrics

Precision x Recall curve

The Precision x Recall curve is a good way to evaluate the performance of an object detector as the confidence is changed. There is a curve for each object class. An object detector of a particular class is considered good if its prediction stays high as recall increases, which means that if you vary the confidence threshold, the precision and recall will still be high. Another way to identify a good object detector is to look for a detector that can identify only relevant objects (0 False Positives = high precision), finding all ground truth objects (0 False Negatives = high recall). A poor object detector needs to increase the number of detected objects (increasing False Positives = lower precision) in order to retrieve all ground truth objects (high recall). That's why the Precision x Recall curve usually starts with high precision values, decreasing as recall increases. This kind of curve is used by the VOC PASCAL 2012 challenge and is available in our implementation.

Average Precision

Another way to compare the performance of object detectors is to calculate the area under the curve (AUC) of the Precision x Recall curve. As AP curves are often zigzag curves frequently going up and down, comparing different curves (different detectors) in the same plot usually is not an easy task - because the curves tend to cross each other much frequently. That's why Average Precision (AP), a numerical metric, can also help us compare different detectors. In practice AP is the precision averaged across all recall values between 0 and 1.

VOC PASCAL 2012 challenge uses the interpolated average precision. It tries to summarize the shape of the Precision x Recall curve by averaging the precision at a set of eleven equally spaced recall levels [0, 0.1, 0.2, ... , 1]:

with

where is the measured precision at recall .

Instead of using the precision observed at each point, the AP is obtained by interpolating the precision at each level taking the maximum precision whose recall value is greater than .

Illustrated example

An example helps us understand better the concept of the interpolated average precision. Look at the detections below:

There are 7 images with 15 ground truth objects representented by the green bounding boxes and 24 detected objects represented by the red bounding boxes. Each detected object has a confidence level and is identified by a letter (A,B,...,Y).
The following table shows the bounding boxes with their corresponding confidences. The last column identifies the detections as TP or FP. In this example a TP is considered if IOU 30%, otherwise it is a FP. By looking at the images above we can roughly tell if the detections are TP or FP.

In some images there are more than one detection overlapping a ground truth. For those cases the detection with the highest IOU is taken, discarding the other detections. This rule is applied by the VOC PASCAL 2012 metric: "e.g. 5 detections of a single object is counted as 1 correct detection and 4 false detections”.

The Precision x Recall curve is plotted by calculating the precision and recall values of the accumulated TP or FP detections. For this, first we need to order the detections by their confidences, then we calculate the precision and recall for each accumulated detection as shown in the table below:

Plotting the precision and recall values we have the following Precision x Recall curve:

As seen before, the idea of the interpolated average precision is to average the precisions at a set of 11 recall levels (0,0.1,...,1). The interpolated precision values are obtained by taking the maximum precision whose recall value is greater than its current recall value. We can visually obtain those values by looking at the recalls starting from the highest (0.4666) to the lowest (0.0666) and, as we decrease the recall, we annotate the precision values that are the highest as shown in the image below:

The Average Precision (AP) is the AUC obtained by the interpolated precision. The intention is to reduce the impact of the wiggles in the Precision x Recall curve. We divide the AUC into 3 areas (A1, A2 and A3) as shown below:

Calculating the total area, we have the AP:

If you want to reproduce these results, see the Sample 1 source code.

How to use this project

This project is very easy to use. If you want to evaluate your algorithm with the most used object detection metrics, you are in the right place.

First, you need to import the Evaluator package and create the object Evaluator():

from Evaluator import *

# Create an evaluator object in order to obtain the metrics
evaluator = Evaluator()

Don't forget to put the content of the folder \lib in the same folder of your code. You could also put it in a different folder and add it in your project as done by the _init_paths.py file in the sample code.

With the evaluator object, you will have access to methods that retrieve the metrics:

Method	Description	Parameters	Returns
GetPascalVOCMetrics	Get the metrics used by the VOC Pascal 2012 challenge	`boundingboxes`: Object of the class `BoundingBoxes` representing ground truth and detected bounding boxes; `IOUThreshold`: IOU threshold indicating which detections will be considered TP or FP (default value = 0.5);	List of dictionaries. Each dictionary contains information and metrics of each class. The keys of each dictionary are: `dict['class']`: class representing the current dictionary; `dict['precision']`: array with the precision values; `dict['recall']`: array with the recall values; `dict['AP']`: average precision; `dict['interpolated precision']`: interpolated precision values; `dict['interpolated recall']`: interpolated recall values; `dict['total positives']`: total number of ground truth positives; `dict['total TP']`: total number of True Positive detections; `dict['total FP']`: total number of False Negative detections;
PlotPrecisionRecallCurve	Plot the Precision x Recall curve for a given class	`classId`: The class that will be plot; `boundingBoxes`: Object of the class `BoundingBoxes` representing ground truth and detected bounding boxes; `IOUThreshold`: IOU threshold indicating which detections will be considered TP or FP (default value = 0.5); `showAP`: if True, the average precision value will be shown in the title of the graph (default = False); `showInterpolatedPrecision`: if True, it will show in the plot the interpolated precision (default = False); `savePath`: if informed, the plot will be saved as an image in this path (ex: `/home/mywork/ap.png`) (default = None); `showGraphic`: if True, the plot will be shown (default = True)	The dictionary containing information and metric about the class. The keys of the dictionary are: `dict['class']`: class representing the current dictionary; `dict['precision']`: array with the precision values; `dict['recall']`: array with the recall values; `dict['AP']`: average precision; `dict['interpolated precision']`: interpolated precision values; `dict['interpolated recall']`: interpolated recall values; `dict['total positives']`: total number of ground truth positives; `dict['total TP']`: total number of True Positive detections; `dict['total FP']`: total number of False Negative detections

All methods that retreive metrics need you to inform the bounding boxes (ground truth and detected). Those bounding boxes are represented by an object of the class BoundingBoxes . Each bounding box is defined by the class BoundingBox. The snippet below shows the creation of the bounding boxes of two images (img_0001 and img_0002). In this example there are 6 ground truth bounding boxes (4 belonging to img_0001 and 2 belonging to img_0002) and 3 detections (2 belonging to img_0001 and 2 belonging to img_0002). Img_0001 ground truths contain bounding boxes of 3 classes (classes 0, 1 and 2). Img_0002 ground truths contain bounding boxes of 2 classes (classes 0 and 1):

# Defining bounding boxes
# Ground truth bounding boxes of img_0001.jpg
gt_boundingBox_1 = BoundingBox(imageName='img_0001', idClass=0, 25, 16, 38, 56, bbType=BBType.GroundTruth, format=BBFormat.XYWH)
gt_boundingBox_2 = BoundingBox(imageName='img_0001', idClass=0, 129, 123, 41, 62, bbType=BBType.GroundTruth, format=BBFormat.XYWH)
gt_boundingBox_3 = BoundingBox(imageName='img_0001', idClass=1, 30, 48, 40, 38, bbType=BBType.GroundTruth, format=BBFormat.XYWH)
gt_boundingBox_4 = BoundingBox(imageName='img_0001', idClass=2, 15, 10, 56, 70, bbType=BBType.GroundTruth, format=BBFormat.XYWH)
# Ground truth bounding boxes of img_0002.jpg
gt_boundingBox_5 = BoundingBox(imageName='img_0002', idClass=0, 25, 16, 38, 56, bbType=BBType.GroundTruth, format=BBFormat.XYWH)
gt_boundingBox_8 = BoundingBox(imageName='img_0002', idClass=1, 15, 10, 56, 70, bbType=BBType.GroundTruth, format=BBFormat.XYWH)
# Detected bounding boxes of img_0001.jpg
detected_boundingBox_1 = BoundingBox(imageName='img_0001', idClass=0, 90, 78, 101, 58, bbType=BBType.Detected, format=BBFormat.XYWH)
detected_boundingBox_2 = BoundingBox(imageName='img_0001', idClass=1, 85, 17, 49, 60, bbType=BBType.Detected, format=BBFormat.XYWH)
# Detected bounding boxes of img_0002.jpg
detected_boundingBox_3 = BoundingBox(imageName='img_0002', idClass=1, 27, 18, 45, 60, bbType=BBType.Detected, format=BBFormat.XYWH)

# Creating the object of the class BoundingBoxes 
myBoundingBoxes = BoundingBoxes()
# Add all bounding boxes to the BoundingBoxes object:
myBoundingBoxes.add(gt_boundingBox_1)
myBoundingBoxes.add(gt_boundingBox_2)
myBoundingBoxes.add(gt_boundingBox_3)
myBoundingBoxes.add(gt_boundingBox_4)
myBoundingBoxes.add(gt_boundingBox_5)
myBoundingBoxes.add(gt_boundingBox_6)
myBoundingBoxes.add(detected_boundingBox_1)
myBoundingBoxes.add(detected_boundingBox_2)
myBoundingBoxes.add(detected_boundingBox_3)

Some important points:

Create your bounding boxes using the constructor of the BoundingBox class. The 3rd and 4th parameters represent the most top-left x and y coordinates of the bounding box. The 5th and 6th parameters can be either the most bottom-right x and y coordinates of the bounding box or the width and height of the bounding box. If your bounding box is identified as x1, y1, x2, y2 coordinates, you need to pass format=BBFormat.XYX2Y2. If you want to identify it as x, y, width, height, you need to pass format=BBFormat.XYWH.
Use the tag bbType=BBType.GroundTruth to identify your bounding box as being ground truth. If it is a detection, use bbType=BBType.Detected.
Be consistent with the imageName parameter. For example: bounding boxes with imageName='img_0001' and imageName='img0001' are from two different images.
The code is all commented. Here you can see all parameters needed by the constructor of the BoundingBox class.

Of course you won't build your bounding boxes one by one as done in this example. You should read your detections within a loop and create your bounding boxes inside of it. sample_1.py reads detections from 2 different folders, one containing .txt files with ground truths and the other containing .txt files with detections. Check this sample code as a reference.

You

References

The Relationship Between Precision-Recall and ROC Curves (Jesse Davis and Mark Goadrich) Department of Computer Sciences and Department of Biostatistics and Medical Informatics, University of Wisconsin
http://pages.cs.wisc.edu/~jdavis/davisgoadrichcamera2.pdf
The PASCAL Visual Object Classes (VOC) Challenge
http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf
Evaluation of ranked retrieval results (Salton and Mcgill 1986)
https://www.amazon.com/Introduction-Information-Retrieval-COMPUTER-SCIENCE/dp/0070544840
https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-ranked-retrieval-results-1.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Metrics for object detection

Different competitions, different metrics

Important definitions

Intersection Over Union (IOU)

True Positive, False Positive, False Negative and True Negative

Precision

Recall

Metrics

Precision x Recall curve

Average Precision

Illustrated example

How to use this project

References

About

Releases

Packages

Languages

License

Z4ck404/Object-Detection-Metrics

Folders and files

Latest commit

History

Repository files navigation

Metrics for object detection

Different competitions, different metrics

Important definitions

Intersection Over Union (IOU)

True Positive, False Positive, False Negative and True Negative

Precision

Recall

Metrics

Precision x Recall curve

Average Precision

Illustrated example

How to use this project

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages