SLAM (Simultaneous Localization and Mapping) is a method used in autonomous robotics to navigate unknown environments. It enables robots to map their surroundings while tracking their own location.
SVO, or Semi-Direct Visual Odometry, was first introduced in 2014 by Forster et al. It combines feature-based and direct methods to optimize both the visual odometry (VO) and structure simultaneously. SVO is specifically designed for lightweight embedded systems, making it ideal for applications like drones or mobile robots.
- 윤성호: Researcher at KAIST, with past experience in LG Electronics, focused on SLAM and machine learning.
- Presentation Date: February 23, 2020
- Transcript Link: SLAM DUNK 2020 | 윤성호, 이재민 발표 - YouTube
- Hybrid Approach: Combines feature-based and direct methods.
- Efficient Processing: Approximately 2.5 ms per frame, which is suitable for real-time applications.
- Application: Effective for embedded systems, with adaptability to multi-camera setups, fisheye lenses, and more.
- Direct Methods: Focuses on minimizing photometric errors between pixel intensities. It works well even in environments where traditional feature extraction is challenging.
- Feature-Based Methods: Minimizes reprojection errors, making it robust in handling loop closures and wide-baseline matching.
- Direct Methods:
- Pros: Accurate in low-feature environments.
- Cons: Difficulty in obtaining accurate covariance, less integration with inertial sensors.
- Feature-Based Methods:
- Pros: Strong for wide-baseline matching and feature-rich scenes.
- Cons: Slow processing due to extraction and matching.
The SVO system is divided into Motion Estimation and Mapping threads.
- Sparse Image Alignment: Aligns the current frame with the previous one, calculating relative poses.
- Relaxation: Aligns frames to keyframes instead of only the previous frame, reducing drift.
- Refinement: Uses local bundle adjustment for optimization.
- Probabilistic Depth Filtering: Initializes and refines depth filters using a recursive Bayesian update. Depth values are continually adjusted until they converge.
- Transformation Representations: Represented in the Lie algebra se(3) space, with SE(3) denoting the space of 3D transformations. Twist coordinates map to SE(3) through an exponential map.
- Optimization: SVO employs least squares methods, typically through iterative Gauss-Newton optimization, to derive the relative pose ( T_{k, k-1} ).
- Inverse Compositional Algorithm: Allows pre-computation of the Jacobian for speed, aligning camera poses across multiple frames.
Aligns frames to compute initial relative poses. This is followed by frame-to-frame adjustments using the inverse compositional Lucas-Kanade algorithm.
Optimizes the pose alignment to keyframes instead of only the previous frame, which reduces accumulated drift. This step violates epipolar constraints but improves accuracy.
Refines both the camera pose and 3D points by minimizing reprojection errors, resulting in a locally optimized bundle adjustment.
SVO integrates a depth-filtering approach for continuous 2D-to-3D mapping. This depth filtering is accomplished by:
- Patch-Based Matching: Searches along the epipolar line for corresponding points to estimate depth.
- Gaussian-Uniform Mixture Model: Models depth uncertainty, distinguishing between inliers and outliers.
Uses FAST corners for feature points, distributing these points evenly across cells in an image grid. The depth filter is initialized with high uncertainty and refined through multiple observations.
Outliers are managed by introducing a Gaussian-Uniform mixture model for the depth estimate, which gradually refines as more observations are collected. The system employs an inverse depth representation for large scenes, enhancing stability over long distances.
SVO has been successfully adapted to:
- Embedded Systems: Particularly useful in drones where computational resources are limited.
- Wide Field of View Cameras: SVO can handle fisheye lenses and other wide FoV setups by tracking edges and using multiple cameras.
- CNN-SVO: An enhanced version introduced at ICRA 2019, which integrates CNNs to improve accuracy and robustness.
- Drift Accumulation: Although SVO includes drift mitigation, longer sequences may still experience drift.
- Reliance on Keyframes: The accuracy depends on selecting keyframes effectively, especially in dynamic scenes.
- Need for Initialization: SVO requires an initial bootstrapping phase, which may not be feasible in every scenario.
SVO is an effective SLAM method that provides a balanced approach to visual odometry. Its hybridization of feature-based and direct methods offers flexibility across various conditions, making it ideal for applications in embedded robotics and drones.