Full Text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Hybrid CNN–Transformer networks seek to merge the local feature extraction capabilities of CNNs with the long-range dependency modeling abilities of Transformers, aiming to simultaneously address both local details and global contextual information. However, in many existing studies, CNNs and Transformers are often combined through the straightforward fusion of encoder features, which does not promote effective interaction between the two, thus limiting the potential benefits of each architecture. To overcome this shortfall, this study introduces a novel medical image segmentation (MIS) network, designated as DEFI-Net, which is based on dual-encoder interactive fusion. This network enhances segmentation performance by fostering interactive learning and feature fusion between the CNN and Transformer encoders. Specifically, during the encoding phase, DEFI-Net utilizes parallel encoding with both the CNN and Transformer to extract local and global features from the input images. The global–local interaction learning (GLIL) module then enables both the Transformer and CNN to assimilate global semantics and local details from each other, fully leveraging the strengths of the two encoders. In the feature fusion phase, the global–local feature fusion (GLFF) module integrates features from both encoders, using both global and local information to produce a more precise and comprehensive representation of features. Extensive experiments conducted on multiple public datasets, including multi-organ, cardiac, and colon polyp datasets, demonstrate that DEFI-Net surpasses several existing methods in terms of segmentation accuracy, thus highlighting its effectiveness and robustness in MIS tasks.

Details

Title
Medical Image Segmentation Network Based on Dual-Encoder Interactive Fusion
Author
Yang, Hong  VIAFID ORCID Logo  ; Fan, Yong; Yang, Ping
First page
3785
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3188779125
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.