Visual SLAM and Active Matching: a brief introduction

MonoSLAM snapshot

The task of estimating sensor motion from measurements of a continuously expanding set of self-mapped features is known as Simultaneous Localisation and Mapping (SLAM). It has recently been shown that SLAM techniques can be successfully applied to estimate the ego-motion of a single agile camera in real-time by building a sparse map of visual features on the fly. My work is focused on applying principles of Information Theory to improve the robustness and efficiency of SLAM techniques.

[Back to Publications]

What is SLAM?

Simultaneous Localisation And Mapping (SLAM) is perhaps the fundamental engineering problem currently challenging the mobile robotics research community:

How can a body navigate in a previously unknown envirnoment while constantly building and updating a map of its workspace using on-board sensors only?

Solutions to SLAM will provide mobile robots the ability to operate in real autonomy, leading to a plethora of impressive applications. Enabling the robots to navigate autonomously without installing any infrastructure (e.g. placing beacons) to aid navigation has only been possible in the presence of costly localisation tools like GPS. The use of such intruments however is constraint as they cannot operate indoors/underwater. SLAM can potentially provide a versatile solution to navigation, so research effort is focused in producing efficient algorithms to run on affordable software.

MonoSLAM: Augmented Reality
Consumer robotics: autonomous vacuum cleaner
Wearable computing: remote assistant
Advanced robotics: humanoid
Augmented Reality: Virtual decoration with MonoSLAM Consumer Robotics: Autonomous vacuum cleaner Wearable Computing: Remote assistant Advanced Robotics: Humanoid

[Back to top]

Single-Camera SLAM

Since my work is focused on Davison's MonoSLAM, this section is a simplified descritpion of the procedure followed in single camera SLAM.
MonoSLAM snapshot

1. A probabilistic map stores uncertain current estimates of the location of the camera and a set of point features. The belief about the position of each feature is represented by a 'bubble' in space, centered at the feature location estimate.

2. A motion model representing the dynamics is used to predict the motion of the camera in the "blind" interval between frames. A realistic motion model can give predictions closer to ground truth, protecting the system from localization errors.

3. Efficient active measurement of selected features in each new image (arriving at 30Hz) updates the map. Instead of searching the image exhaustively for the prediced-to-be-visible features, each landmark is searched for in the corresponding area of high likelihood (elliptical regions).

4. Constantly tracking features and updating the system state, the trajectory of the camera is recovered in real-time and without drift, while a stable map of features is built.

[Back to top]

Loop Closing and Data Association

Tracking using a single camera can be a very challenging task. Despite the considerable maturity that the SLAM methods have reached, preserving consistency in the map is still a major hurdle to overcome. We aim to tackle this problem improving current feature matching techniques, by addressing the following closelyrelated problems:

> Data Association: How do observations relate with the elements stored in the map?
> Loop Closure : Has the camera returned to a previsously mapped area?

Data Association & Loop-close

This diagram shows the evolution uncertainty in the belief about the position of each feature. As the camera is moving from position 1 onwards, the uncertainty about the position of the camera is gradually growing causing a similar effect to the estimated locations of the visual landmarks picked along the way. Position 5 is critical for the preservation of consistency: the camera is re-observing old features closing a loop of its trajectory; if the system manages to recognise them, then the uncertainty in the system immediately shrinks down to almost what it was initially. This is how drift-free performance can be achieved. As the elliptical search region grows for each feature, it is more likely that we encounter multiple matches within an ellipse for that feature. This is where we need a way to handle data associations.

[Back to top]

Probabilistic Prediction and Active Search

In continuous tracking problems, there are strong priors available on the absolute and/or relative image locations of features. Moreover, the predictions of locations of different features in the same image are often highly correlated (as discussed in previous section), and this correlation can be harnessed to further increase efficiency.

In MonoSLAM, the image position of each feature is predicted indicating a sub-region in the image where each feature is most likely to lie. Therefore, instead of scanning the feature template across the whole image, it is only tested for a match in the corresponding search-region.

But is this really the best we can do in active search?

[Back to top]

Mixture of Gaussians

The successful / failed measurement of a feature within a frame, can provide extra information on the positions of the rest of the features. Our approach to exploit this property is to use a dynamic Mixture of Gaussians (MOG) representation which grows as necessary to representthe discrete multiple hypotheses arising during active search.

Below is a demonstration of our algorithm guiding matching in the context of a sequential SLAM system, where we see a dramatic reduction of image processing operations required in order to pin down the corners of a black square.

Active Search with MOG (1)
Initial belief: singe Gaussian.
F0 is the feature to measure next.
F0 yields a match. Split belief in
two Gaussians: blue-match was false
positive, red-match was true positive
F1 yields a match, so a new
Gaussian is born.

Active Search with MOG (1)
F2 yields a match. The new Gaussian
has much smaller uncertainty
F3 yields a match.
The object is pinned down successfully, searching
a much smaller area than that of the initial belief

The order of feature measurements is determined by the expected information gain (number of bits per pixel searched). The yellow regions are the areas selected for measurement of a particular feature. At every step, red colour denotes the most probable hypothesis and tones of blue are used for the the lower probability hypotheses (the darker the colour the smallest the probability).

This methodology in feature matching, we dubbed Active Matching. More details about this work can be found here.

[Back to top]