Content area
Full text
1. Introduction
Simultaneous localization and mapping (SLAM) refers to the technology where a mobile carrier equipped with sensors achieves self-positioning and constructs a surrounding map in an unknown environment (Cadena et al., 2016; Huang and Dissanayake, 2016). With the continuous development of this technology, SLAM is becoming increasingly closely related to research fields such as underwater survey, augmented reality and automotive engineering. Particularly in the field of autonomous driving, SLAM technology can build a comprehensive road database for vehicle navigation systems.
Unlike laser-based SLAM, which requires actively emitting laser signals for environment perception, Visual SLAM achieves localization and map construction through dense visual perception of the surrounding environment. Additionally, Visual SLAM has the significant advantage of preserving semantic information in the map. Moreover, from a sensor cost perspective, cameras are generally more cost-effective compared to light detection and ranging (LiDAR). These factors have made Visual SLAM a popular research direction in the current SLAM field (Fuentes-Pacheco et al., 2015; Younes et al., 2017). However, in real-world scenarios with rich dynamic objects such as vehicles and pedestrians, traditional methods often lead to SLAM degradation or even become unusable. Therefore, current research in Visual SLAM has shifted from static to dynamic environments, resulting in the development of various Visual SLAM techniques specifically designed for dynamic environments (Saputra et al., 2018; Wang et al., 2017).
In real-world environments, the presence of dynamic objects poses a significant challenge to dynamic Visual SLAM as it can lead to inaccurate feature matching or loss of tracking in camera images. Therefore, effectively handling dynamic objects in the received image information is a key problem in dynamic Visual SLAM research. The mainstream approach focuses on identifying and excluding dynamic objects, removing them during the preprocessing stage of image analysis and retaining only the static parts of the images for SLAM. However, when there are an excessive number of dynamic objects in the image information, the remaining static portions are reduced, resulting in a decrease in the accuracy of Visual SLAM or rendering it ineffective.
The paper begins by examining the standard process of Visual SLAM and proceeds to introduce benchmark data sets used for evaluating the performance of Visual SLAM systems. It further provides a detailed account of...





