Expose API for setting labels, update README

cscribano · Aug 13, 2021 · e8db04f · e8db04f
1 parent 6120c29
commit e8db04f
Show file tree

Hide file tree

Showing 10 changed files with 120 additions and 82 deletions.
diff --git a/README.md b/README.md
@@ -7,7 +7,7 @@
   - (2021.7.4) Support yolov4-p5 and yolov4-p6
   - (2021.2.13) Support Scaled-YOLOv4 (i.e. yolov4-csp and yolov4x-mish)
   - (2021.1.3) Add DIoU-NMS for postprocessing
-  - (2020.11.28) Docker container provided for Ubuntu
+  - (2020.11.28) Docker container provided for x86 Ubuntu
 
 ## Description
 FastMOT is a custom multiple object tracker that implements:
@@ -17,11 +17,11 @@ FastMOT is a custom multiple object tracker that implements:
   - KLT tracker
   - Camera motion compensation
 
-Deep SORT requires running detection and feature extraction sequentially, which often becomes a bottleneck for real-time applications. FastMOT significantly speeds up the entire system to run in **real-time** even on Jetson. Motion compensation improves tracking for non-stationary camera where Deep SORT/FairMOT usually fail.
+Deep SORT is a two-stage tracker that runs detection and feature extraction sequentially, which often becomes a bottleneck for real-time applications. FastMOT significantly speeds up the entire system to run in **real-time** even on Jetson devices. Motion compensation improves tracking for non-stationary camera where Deep SORT/FairMOT usually fail.
 
 To achieve faster processing, FastMOT only runs the detector and feature extractor every N frames, while KLT fills in the gaps efficiently. FastMOT also re-identifies objects that moved out of frame and will keep the same IDs.
 
-YOLOv4 was trained on CrowdHuman (82% [email protected]) while SSD's are pretrained COCO models from TensorFlow. Both detection and feature extraction use the **TensorRT** backend and perform asynchronous inference. In addition, most algorithms, including KLT, Kalman filter, and data association, are optimized using Numba.
+YOLOv4 was trained on CrowdHuman (82% [email protected]) and SSD's are pretrained COCO models from TensorFlow. Both detection and feature extraction use the **TensorRT** backend and perform asynchronous inference. In addition, most algorithms, including KLT, Kalman filter, and data association, are optimized using Numba.
 
 ## Performance
 ### Results on MOT20 train set
@@ -33,13 +33,13 @@ YOLOv4 was trained on CrowdHuman (82% [email protected]) while SSD's are pretrained COCO m
 ### FPS on MOT17 sequences
 | Sequence | Density | FPS |
 |:-------|:-------:|:-------:|
-| MOT17-13 | 5 - 30  | 38 |
-| MOT17-04 | 30 - 50  | 22 |
-| MOT17-03 | 50 - 80  | 15 |
+| MOT17-13 | 5 - 30  | 42 |
+| MOT17-04 | 30 - 50  | 26 |
+| MOT17-03 | 50 - 80  | 18 |
 
-Performance is evaluated with YOLOv4 using [TrackEval](https://github.com/JonathonLuiten/TrackEval). Note that neither YOLOv4 nor OSNet was trained or finetuned on the MOT20 dataset, so train set results should generalize well. FPS results are obtained on Jetson Xavier NX.
+Performance is evaluated with YOLOv4 using [TrackEval](https://github.com/JonathonLuiten/TrackEval). Note that neither YOLOv4 nor OSNet was trained or finetuned on the MOT20 dataset, so train set results should generalize well. FPS results are obtained on Jetson Xavier NX (20W 2core mode).
 
-FastMOT has MOTA scores close to **state-of-the-art** trackers from the MOT Challenge. Increasing N shows small impact on MOTA. Tracking speed can reach up to **38 FPS** depending on the number of objects. Lighter models (e.g. YOLOv4-tiny) are recommended for a more constrained device like Jetson Nano. FPS is expected to be in the range of **50 - 150** on desktop CPU/GPU.
+FastMOT has MOTA scores close to **state-of-the-art** trackers from the MOT Challenge. Increasing N shows small impact on MOTA. Tracking speed can reach up to **42 FPS** depending on the number of objects. Lighter models (e.g. YOLOv4-tiny) are recommended for a more constrained device like Jetson Nano. FPS is expected to be in the range of **50 - 150** on desktop CPU/GPU.
 
 ## Requirements
 - CUDA >= 10
@@ -58,13 +58,13 @@ Make sure to have [nvidia-docker](https://docs.nvidia.com/datacenter/cloud-nativ
   # Add --build-arg TRT_IMAGE_VERSION=21.05 for Ubuntu 20.04
   docker build -t fastmot:latest .
 
-  # Run xhost local:root first for issues with display
+  # Run xhost local:root first if you cannot visualize inside the container
   docker run --gpus all --rm -it -v $(pwd):/usr/src/app/FastMOT -v /tmp/.X11-unix:/tmp/.X11-unix -e DISPLAY=unix$DISPLAY -e TZ=$(cat /etc/timezone) fastmot:latest
   ```
 ### Install for Jetson Nano/TX2/Xavier NX/Xavier
 Make sure to have [JetPack >= 4.4](https://developer.nvidia.com/embedded/jetpack) installed and run the script:
   ```bash
-  ./scripts/install_jetson.sh 4.5
+  ./scripts/install_jetson.sh <Jetpack Version>
   ```
 ### Download models
 This includes both pretrained OSNet, SSD, and my custom YOLOv4 ONNX model
@@ -83,49 +83,35 @@ Only required for SSD (not supported on Ubuntu 20.04)
   ```
 
 ## Usage
-- USB webcam:
-  ```bash
-  python3 app.py --input_uri /dev/video0 --mot
-  ```
-- MIPI CSI camera:
-  ```bash
-  python3 app.py --input_uri csi://0 --mot
-  ```
-- RTSP stream:
-  ```bash
-  python3 app.py --input_uri rtsp://<user>:<password>@<ip>:<port>/<path> --mot
-  ```
-- HTTP stream:
-  ```bash
-  python3 app.py --input_uri http://<user>:<password>@<ip>:<port>/<path> --mot
-  ```
-- Image sequence:
-  ```bash
-  python3 app.py --input_uri %06d.jpg --mot
-  ```
-- Video file:
-  ```bash
-  python3 app.py --input_uri video.mp4 --mot
-  ```
-- Use `--gui` to visualize, `--output_uri` to save output, and `--log` for MOT compliant results
-- To disable the GStreamer backend, set `WITH_GSTREAMER = False` [here](https://github.com/GeekAlexis/FastMOT/blob/3a4cad87743c226cf603a70b3f15961b9baf6873/fastmot/videoio.py#L11)
+```bash
+  python3 app.py --input_uri ... --mot
+```
+- Image sequence: `--input_uri %06d.jpg`
+- Video file: `--input_uri file.mp4`
+- USB webcam: `--input_uri /dev/video0`
+- MIPI CSI camera: `--input_uri csi://0`
+- RTSP stream: `--input_uri rtsp://<user>:<password>@<ip>:<port>/<path>`
+- HTTP stream: `--input_uri http://<user>:<password>@<ip>:<port>/<path>`
+- Use `--gui` to visualize, `--output_uri` to save output, and `--txt` for MOT compliant results
+- Use `-h` to show help message for all options
 - Note that the first run will be slow due to Numba compilation
+- To disable the GStreamer backend on x86, set `WITH_GSTREAMER = False` [here](https://github.com/GeekAlexis/FastMOT/blob/3a4cad87743c226cf603a70b3f15961b9baf6873/fastmot/videoio.py#L11)
 <details>
 <summary> More options can be configured in cfg/mot.json </summary>
 
-  - Set `resolution` and `frame_rate` that corresponds to the source data or camera configuration (optional). They are required for image sequence, camera sources, and MOT Challenge evaluation. List all configurations for your USB/CSI camera:
+  - Set `resolution` and `frame_rate` that corresponds to the source data or camera configuration (optional). They are required for image sequence, camera sources, and saving txt results. List all configurations for a USB/CSI camera:
     ```bash
     v4l2-ctl -d /dev/video0 --list-formats-ext
     ```
-  - To swap model, modify `model` under a detector. For example, you can choose from `SSDInceptionV2`, `SSDMobileNetV1`, or `SSDMobileNetV2` for SSD.
-  - If more accuracy is desired and processing power is not an issue, lower `detector_frame_skip`. Similarly, raise `detector_frame_skip` to speed up tracking at the cost of accuracy. You may also want to change `max_age` such that `max_age` × `detector_frame_skip` ≈ 30
+  - To swap network, modify `model` under a detector. For example, you can choose from `SSDInceptionV2`, `SSDMobileNetV1`, or `SSDMobileNetV2` for SSD.
+  - If more accuracy is desired and FPS is not an issue, lower `detector_frame_skip`. Similarly, raise `detector_frame_skip` to speed up tracking at the cost of accuracy. You may also want to change `max_age` such that `max_age` × `detector_frame_skip` ≈ 30
   - Modify `visualizer_cfg` to toggle drawing options.
   - All parameters are documented in the API.
 
 </details>
 
  ## Track custom classes
-FastMOT supports multi-class tracking and can be easily extended to custom classes (e.g. vehicle). You need to train both YOLO and a ReID model on your object classes. Check [Darknet](https://github.com/AlexeyAB/darknet) for training YOLO and [fast-reid](https://github.com/JDAI-CV/fast-reid) for training ReID. After training, convert the model to ONNX format and place it in fastmot/models. The TensorRT plugin adapted from [tensorrt_demos](https://github.com/jkjung-avt/tensorrt_demos/) is only compatible with Darknet.
+FastMOT can be easily extended to a custom class (e.g. vehicle). You need to train both YOLO and a ReID model on your object class. Check [Darknet](https://github.com/AlexeyAB/darknet) for training YOLO and [fast-reid](https://github.com/JDAI-CV/fast-reid) for training ReID. After training, convert the model to ONNX format. The TensorRT plugin adapted from [tensorrt_demos](https://github.com/jkjung-avt/tensorrt_demos/) is only compatible with Darknet. FastMOT also supports multi-class tracking. It is recommended to train a ReID network for each class. You should also implement a new MOT class that extracts features separately and concatenates them before feeding into the tracker.
 ### Convert YOLO to ONNX
 1. Install ONNX version 1.4.1 (not the latest version)
     ```bash
@@ -138,30 +124,48 @@ FastMOT supports multi-class tracking and can be easily extended to custom class
 ### Add custom YOLOv3/v4
 1. Subclass `fastmot.models.YOLO` like here: https://github.com/GeekAlexis/FastMOT/blob/32c217a7d289f15a3bb0c1820982df947c82a650/fastmot/models/yolo.py#L100-L109
     ```
-    ENGINE_PATH:   path to TensorRT engine (converted at runtime)
-    MODEL_PATH:    path to ONNX model
-    NUM_CLASSES:   total number of classes
-    LETTERBOX:     keep aspect ratio when resizing
-    NEW_COORDS:    new_coords parameter for each yolo layer
-    INPUT_SHAPE:   input size in the format "(channel, height, width)"
-    LAYER_FACTORS: scale factors with respect to the input size for each yolo layer
-    SCALES:        scale_x_y parameter for each yolo layer
-    ANCHORS:       anchors grouped by each yolo layer
+    ENGINE_PATH : Path
+        Path to TensorRT engine.
+        If not found, TensorRT engine will be converted from ONNX weights
+        at runtime and cached for later use.
+    MODEL_PATH : Path
+        Path to ONNX weights.
+    NUM_CLASSES : int
+        Total number of trained classes.
+    LETTERBOX : bool
+        Keep aspect ratio when resizing.
+    NEW_COORDS : bool
+        new_coords Darknet parameter for each yolo layer.
+    INPUT_SHAPE : tuple
+        Input size in the format `(channel, height, width)`.
+    LAYER_FACTORS : List[int]
+        Scale factors with respect to the input size for each yolo layer.
+    SCALES : List[float]
+        scale_x_y Darknet parameter for each yolo layer.
+    ANCHORS : List[List[int]]
+        Anchors grouped by each yolo layer.
     ```
-    Note that anchors may not follow the same order in the Darknet cfg file. You need to mask out the anchors for each yolo layer using the indices in `mask` in Darknet cfg.
+    Note anchors may not follow the same order in the Darknet cfg file. You need to mask out the anchors for each yolo layer using the indices in `mask` in Darknet cfg.
     Unlike YOLOv4, the anchors are usually in reverse for YOLOv3 and YOLOv3/v4-tiny
-2. Change class labels [here](https://github.com/GeekAlexis/FastMOT/blob/master/fastmot/models/label.py) to your object classes
-3. Modify cfg/mot.json: set `model` in `yolo_detector_cfg` to the added Python class name and set `class_ids` of interest. You may want to play with `conf_thresh` based on the accuracy of your model
+2. Set class labels to your object classes with `fastmot.models.set_label_map`
+3. Modify cfg/mot.json: set `model` in `yolo_detector_cfg` to the added Python class name and set `class_ids` of interest. You may want to play with `conf_thresh` based on model performance
 ### Add custom ReID
 1. Subclass `fastmot.models.ReID` like here: https://github.com/GeekAlexis/FastMOT/blob/32c217a7d289f15a3bb0c1820982df947c82a650/fastmot/models/reid.py#L50-L55
     ```
-    ENGINE_PATH:   path to TensorRT engine (converted at runtime)
-    MODEL_PATH:    path to ONNX model
-    INPUT_SHAPE:   input size in the format "(channel, height, width)"
-    OUTPUT_LAYOUT: feature dimension output by the model (e.g. 512)
-    METRIC:        distance metric used to match features ('euclidean' or 'cosine')
+    ENGINE_PATH : Path
+        Path to TensorRT engine.
+        If not found, TensorRT engine will be converted from ONNX weights
+        at runtime and cached for later use.
+    MODEL_PATH : Path
+        Path to ONNX weights.
+    INPUT_SHAPE : tuple
+        Input size in the format `(channel, height, width)`.
+    OUTPUT_LAYOUT : int
+        Feature dimension output by the model.
+    METRIC : {'euclidean', 'cosine'}
+        Distance metric used to match features.
     ```
-2. Modify cfg/mot.json: set `model` in `feature_extractor_cfg` to the added Python class name. You may want to play with `max_assoc_cost` and `max_reid_cost` based on the accuracy of your model
+2. Modify cfg/mot.json: set `model` in `feature_extractor_cfg` to the added Python class name. You may want to play with `max_assoc_cost` and `max_reid_cost` based on model performance
 
  ## Citation
  If you find this repo useful in your project or research, please star and consider citing it:

diff --git a/app.py b/app.py
@@ -8,6 +8,7 @@
 import cv2
 
 import fastmot
+import fastmot.models
 from fastmot.utils import ConfigDecoder, Profiler
 
 
@@ -16,18 +17,20 @@ def main():
     parser.add_argument('-i', '--input_uri', metavar="URI", required=True, help=
                         'URI to input stream\n'
                         '1) image sequence (e.g. %%06d.jpg)\n'
-                        '2) video file (e.g. video.mp4)\n'
+                        '2) video file (e.g. file.mp4)\n'
                         '3) MIPI CSI camera (e.g. csi://0)\n'
                         '4) USB camera (e.g. /dev/video0)\n'
                         '5) RTSP stream (e.g. rtsp://<user>:<password>@<ip>:<port>/<path>)\n'
                         '6) HTTP stream (e.g. http://<user>:<password>@<ip>:<port>/<path>)\n')
     parser.add_argument('-c', '--config', metavar="FILE",
                         default=Path(__file__).parent / 'cfg' / 'mot.json',
                         help='path to JSON configuration file')
+    parser.add_argument('-l', '--labels', metavar="FILE",
+                        help='path to label names (e.g. coco.names)')
     parser.add_argument('-o', '--output_uri', metavar="URI",
-                        help='URI to output video (e.g. output.mp4)')
-    parser.add_argument('-l', '--log', metavar="FILE",
-                        help='output a MOT Challenge format log (e.g. eval/results/mot17-04.txt)')
+                        help='URI to output video file')
+    parser.add_argument('-t', '--txt', metavar="FILE",
+                        help='output MOT Challenge txt results (e.g. eval/results/MOT20-01.txt)')
     parser.add_argument('-m', '--mot', action='store_true', help='run multiple object tracker')
     parser.add_argument('-g', '--gui', action='store_true', help='enable display')
     parser.add_argument('-v', '--verbose', action='store_true', help='verbose output for debugging')
@@ -42,17 +45,24 @@ def main():
     with open(args.config) as cfg_file:
         config = json.load(cfg_file, cls=ConfigDecoder, object_hook=lambda d: SimpleNamespace(**d))
 
+    # load labels if given
+    if args.labels is not None:
+        with open(args.labels) as label_file:
+            label_map = label_file.read().splitlines()
+            fastmot.models.set_label_map(label_map)
+
     stream = fastmot.VideoIO(config.resize_to, args.input_uri, args.output_uri, **vars(config.stream_cfg))
 
     mot = None
-    log = None
+    txt = None
     if args.mot:
         draw = args.gui or args.output_uri is not None
         mot = fastmot.MOT(config.resize_to, **vars(config.mot_cfg), draw=draw)
         mot.reset(stream.cap_dt)
-        if args.log is not None:
-            Path(args.log).parent.mkdir(parents=True, exist_ok=True)
-            log = open(args.log, 'w')
+        if args.txt is not None:
+            assert Path(args.txt).suffix == '.txt'
+            Path(args.txt).parent.mkdir(parents=True, exist_ok=True)
+            txt = open(args.txt, 'w')
     if args.gui:
         cv2.namedWindow('Video', cv2.WINDOW_AUTOSIZE)
 
@@ -67,12 +77,12 @@ def main():
 
                 if args.mot:
                     mot.step(frame)
-                    if log is not None:
+                    if txt is not None:
                         for track in mot.visible_tracks():
                             tl = track.tlbr[:2] / config.resize_to * stream.resolution
                             br = track.tlbr[2:] / config.resize_to * stream.resolution
                             w, h = br - tl + 1
-                            log.write(f'{mot.frame_count},{track.trk_id},{tl[0]:.6f},{tl[1]:.6f},'
+                            txt.write(f'{mot.frame_count},{track.trk_id},{tl[0]:.6f},{tl[1]:.6f},'
                                       f'{w:.6f},{h:.6f},-1,-1,-1\n')
 
                 if args.gui:
@@ -83,8 +93,8 @@ def main():
                     stream.write(frame)
     finally:
         # clean up resources
-        if log is not None:
-            log.close()
+        if txt is not None:
+            txt.close()
         stream.release()
         cv2.destroyAllWindows()
 

diff --git a/fastmot/detector.py b/fastmot/detector.py
@@ -60,7 +60,7 @@ def __init__(self, size,
             SSD model to use.
             Must be the name of a class that inherits `models.SSD`.
         class_ids : tuple, optional
-            Class IDs to detect.
+            Class IDs to detect. Note class ID starts at zero.
         tile_overlap : float, optional
             Ratio of overlap to width and height of each tile.
         tiling_grid : tuple, optional
@@ -228,7 +228,7 @@ def __init__(self, size,
             YOLO model to use.
             Must be the name of a class that inherits `models.YOLO`.
         class_ids : tuple, optional
-            Class IDs to detect.
+            Class IDs to detect. Note class ID starts at zero.
         conf_thresh : float, optional
             Detection confidence threshold.
         nms_thresh : float, optional
@@ -264,6 +264,7 @@ def postprocess(self):
         """Synchronizes, applies postprocessing, and returns a record array
         of detections (DET_DTYPE).
         This function should be called after `detect_async`.
+        Detections with the same labels have consecutive indices.
         """
         det_out = self.backend.synchronize()
         det_out = np.concatenate(det_out).reshape(-1, 7)

diff --git a/fastmot/models/__init__.py b/fastmot/models/__init__.py
@@ -1,4 +1,4 @@
 from .ssd import SSD
 from .yolo import YOLO
 from .reid import ReID
-from .label import LABEL_MAP
+from .label import get_label_name, set_label_map
diff --git a/fastmot/models/label.py b/fastmot/models/label.py
@@ -1,10 +1,12 @@
+from collections.abc import Sequence
+
+
 """
 91-class COCO labels
 `unlabled` (id = 0) is replaced with `head` for CrowdHuman
 These are different from the default 80-class COCO labels used by YOLO
 """
-
-LABEL_MAP = (
+_label_map = (
     'head',
     'person',
     'bicycle',
@@ -97,3 +99,23 @@
     'hair drier',
     'toothbrush',
 )
+
+
+def get_label_name(class_id):
+    """Look up label name given a class ID."""
+    return _label_map[class_id]
+
+
+def set_label_map(label_map):
+    """Set label name mapping from class IDs.
+
+    Parameters
+    ----------
+    label_map : sequence
+        A sequence of label names.
+        The index of each label determines its class ID.
+    """
+    assert isinstance(label_map, Sequence)
+    assert len(label_map) > 0
+    global _label_map
+    _label_map = tuple(label_map)