Deep Feature and Graph-Based Correspondence Learning for Monocular Visual Odometry
Main Article Content
Abstract
Visual Odometry (VO) plays a central role in monocular camera-based localization, yet its performance heavily depends on the reliability of feature extraction and correspondence estimation under challenging imaging conditions. In this work, we introduce a deep learning–driven monocular VO architecture that leverages neural feature representation and graph-based correspondence reasoning to enhance motion estimation accuracy. The proposed framework replaces handcrafted feature pipelines with a fully learnable front-end composed of a convolutional neural network (CNN) for joint keypoint detection and descriptor generation. To further improve feature association, a graph neural network (GNN)–based matching module is integrated to model contextual relationships among feature points, enabling more discriminative and globally consistent correspondences. This deep neural feature matching strategy allows the system to maintain robustness against illumination changes, scale variations, and viewpoint transformations. By incorporating the state-of-the-art SuperGlue matching mechanism into the visual odometry pipeline, we construct an end-to-end deep VO framework capable of exploiting high-level semantic and geometric cues extracted from image sequences. Experimental evaluation demonstrates that the proposed system achieves improved trajectory estimation accuracy and stability compared to conventional and existing learning-based VO approaches.