Content area
The proposed YOLO11-LiB achieves a high drowning class mean average precision (DmAP50) of 94.1% while being extremely lightweight (2.02 M parameters, 4.25 MB size). Key innovations include the LGCBlock for efficient downsampling, the C2PSAiSCSA module for enhanced spatial–channel feature attention, and the BiFF-Net for improved multi-scale feature fusion.
Addresses critical limitations in real-time drowning detection: poor edge deployment efficiency, robustness in complex water environments, and multi-scale object challenges. Provides a high-performance, computationally efficient solution enabling practical real-time surveillance in swimming pool scenarios. Drowning constitutes the leading cause of injury-related fatalities among adolescents. In swimming pool environments, traditional manual surveillance exhibits limitations, while existing technologies suffer from poor adaptability of wearable devices. Vision models based on YOLO still face challenges in edge deployment efficiency, robustness in complex water conditions, and multi-scale object detection. To address these issues, we propose YOLO11-LiB, a drowning object detection model based on YOLO11n, featuring three key enhancements. First, we design the Lightweight Feature Extraction Module (LGCBlock), which integrates the Lightweight Attention Encoding Block (LAE) and effectively combines Ghost Convolution (GhostConv) with dynamic convolution (DynamicConv). This optimizes the downsampling structure and the C3k2 module in the YOLO11n backbone network, significantly reducing model parameters and computational complexity. Second, we introduce the Cross-Channel Position-aware Spatial Attention Inverted Residual with Spatial–Channel Separate Attention module (C2PSAiSCSA) into the backbone. This module embeds the Spatial–Channel Separate Attention (SCSA) mechanism within the Inverted Residual Mobile Block (iRMB) framework, enabling more comprehensive and efficient feature extraction. Finally, we redesign the neck structure as the Bidirectional Feature Fusion Network (BiFF-Net), which integrates the Bidirectional Feature Pyramid Network (BiFPN) and Frequency-Aware Feature Fusion (FreqFusion). The enhanced YOLO11-LiB model was validated against mainstream algorithms through comparative experiments, and ablation studies were conducted. Experimental results demonstrate that YOLO11-LiB achieves a drowning class mean average precision (DmAP50) of 94.1%, with merely 2.02 M parameters and a model size of 4.25 MB. This represents an effective balance between accuracy and efficiency, providing a high-performance solution for real-time drowning detection in swimming pool scenarios.
Details
; Chen, Lu 1
; Shi Jianchun 2 1 School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China; [email protected]
2 Jiangsu Zhaoming Information Technology Co., Ltd., Nantong 213000, China; [email protected]