Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Aiming at the complexity of network architecture design and the low computational efficiency caused by variations in the number of modalities in multimodal cloud detection tasks, this paper proposes an efficient and unified multimodal cloud detection model, M2Cloud, which can process any number of modal data. The core innovation of M2Cloud lies in its novel multimodal data fusion method. This method avoids architectural changes for new modalities, thereby significantly reducing incremental computing costs and enhancing overall efficiency. Furthermore, the designed multimodal data fusion module possesses strong generalization capabilities and can be seamlessly integrated into other network architectures in a plug-and-play manner, greatly enhancing the module’s practicality and flexibility. To address the challenge of unified multimodal feature extraction, we adopt two key strategies: (1) constructing feature extraction modules with shared but independent weights for each modality to preserve the inherent features of each modality; (2) utilizing cosine similarity to adaptively learn complementary features between different modalities, thereby reducing redundant information. Experimental results demonstrate that M2Cloud achieves or even surpasses the state-of-the-art (SOTA) performance on the public multimodal datasets WHUS2-CD and WHUS2-CD+, verifying its effectiveness in the unified multimodal cloud detection task. The research presented in this paper offers new insights and technical support for the field of multimodal data fusion and cloud detection, and holds significant theoretical and practical value.

Details

Title
Enhanced Cloud Detection Using a Unified Multimodal Data Fusion Approach in Remote Images
Author
Mo, Yan 1   VIAFID ORCID Logo  ; Chen Puhui 2 ; Zhou Wanting 3   VIAFID ORCID Logo  ; Chen, Wei 4   VIAFID ORCID Logo 

 College of Aeronautics Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China, School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China; [email protected] 
 State Key Laboratory of Mechanics and Control of Mechanical Structures, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China; [email protected] 
 School of Information Engineering, Nanchang Hangkong University, Nanchang 330063, China; [email protected] 
 College of Geoscience and Surveying Engineering, China University of Mining & Technology, Beijing 100083, China; [email protected] 
First page
2684
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
14248220
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3203224748
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.