Loop closure detection (LCD) is critical for reducing drift and maintaining map consistency in SLAM systems, yet image-retrieval–based methods struggle with perceptual aliasing, viewpoint and appearance changes, and limited scalability. We reformulate LCD as clustering in a learned latent space rather than database retrieval. A convolutional autoencoder (CAE) is first pre-trained on environment imagery to produce compact, structure-aware embeddings. We then create globally-aware descriptors with a new model. During operation, keyframes are encoded and compared in latent space against a growing memory set. If an embedding lies beyond a threshold distance, it is considered a potential loop closure and added to the clustering structure. To enforce spatially meaningful structure, we apply triplet loss: the immediate previous frame serves as a positive (temporal proximity), while other keyframes act as negatives, encouraging embeddings from the same place to cluster and distinct places to separate. This design improves robustness to aliasing and appearance variation and reduces computational cost by avoiding exhaustive database search. Experiments on multiple place-recognition and navigation datasets show competitive or superior precision–recall performance compared to NetVLAD, DBoW2, and AP-GeM with more temporally consistent loop-closure clusters. Overall, the results indicate that latent-space clustering with globally-aware descriptors provides a scalable and robust alternative to conventional retrieval-based LCD.
Files and links (1)
url
Loop Closure Detection Revisited: A Clustering PerspectiveView
Text (supplemental)link to publisher's website Open
Related links
Details
Title
Loop Closure Detection Revisited: A Clustering Perspective