Content area
The development of detection and identification technologies for biofouling organisms on marine aquaculture cages is of paramount importance for the automation and intelligence of cleaning processes by Autonomous Underwater Vehicles (AUVs). The present study proposes a methodology for the detection of fouling shellfish on marine aquaculture cages. This methodology is based on an improved version of a symmetric Faster R-CNN: The original Visual Geometry Group 16-layer (VGG16) network is replaced with a 50-layer Residual Network with Aggregated Transformations (ResNeXt50) architecture, incorporating a Convolutional Block Attention Module (CBAM) to enhance feature extraction capabilities; In addition, the anchor box dimensions must be optimised concurrently with the Intersection over Union (IoU) threshold. This is to ensure the adaptation to the scale of the object; combined with the Multi-Scale Retinex with Single Scale Component and Color Restoration (MSRCR) algorithm with a view to achieving image enhancement. Experiments demonstrate that the enhanced model attains an average precision of 94.27%, signifying a 10.31% augmentation over the original model whilst necessitating a mere one-fifth of the original model’s weight. At an intersection-over-union (IoU) value of 0.5, the model attains a mean average precision (mAP) of 93.14%, surpassing numerous prevalent detection models. Furthermore, the employment of an image-enhanced dataset during the training of detection models has been demonstrated to yield an average precision that is 11.72 percentage points higher than that achieved through training with the original dataset. In summary, the technical approach proposed in this paper enables accurate and efficient detection and identification of fouling shellfish on marine aquaculture cages.
Full text
1. Introduction
The state has been promoting cage aquaculture in marine farming (e.g., fish and shellfish) vigorously due to its advantages in economic efficiency, intensive production, and ecological sustainability. However, cages submerged in water for extended periods accumulate substantial biofouling organisms on their netting, referred to as biofouling organisms on marine aquaculture cages. This category comprises organisms such as algae, sea anemones, and oysters. The accumulation of biofouling has been demonstrated to impede water exchange and hinder the growth of farmed organisms. Consequently, it is imperative that cage netting is subject to regular cleaning. Historically, this task has relied heavily on manual labour, which has been proven to be time-consuming and labour-intensive. Consequently, the advancement of mechanisation, automation, and intelligent operation of marine cage cleaning has emerged as a pivotal area of focus in the research endeavours of China’s marine engineering equipment technology [1,2,3,4,5,6]. The development of Autonomous Underwater Vehicles (AUVs) to replace manual cage cleaning is therefore of crucial importance. Among these, the detection and identification technology for biofouling organisms on marine aquaculture cages is both a manifestation of AUV intelligence and a core technical means for assessing cage cleaning effectiveness [7,8,9]. However, the inherent characteristics of the underwater environment, namely its low visibility, significant absorption, and substantial scattering, frequently result in image degradation. This degradation manifests in the form of colour distortion, reduced contrast, and blurred details. Consequently, the precise detection of underwater fouling remains quite challenging.
Convolutional Neural Networks (CNNs) represent a specialised neural network architecture within the field of deep learning, which is designed for the processing of image data. The model is primarily composed of multiple convolutional layers, pooling layers, and fully connected layers, which facilitate the efficient extraction of features from images. These layers are widely applied in object detection, image recognition, and classification tasks. However, in comparison with single-stage detection algorithms such as YOLO and SSD, two-stage detection algorithms—typically represented by Faster R-CNN—possess superior feature extraction capabilities and end-to-end training advantages. Consequently, the latter achieves higher detection accuracy and is better suited for complex applications such as the detection and identification of biofouling organisms on marine aquaculture cages.
The primary contributions of this paper are as follows: (1). The present study will focus on the characteristics of fouling shellfish on marine aquaculture cages. In order to achieve this, RGB images of shellfish fouling growing on marine aquaculture cages will be acquired. (2). The present paper explores the issues of underexposure, low contrast, and colour bias in underwater images. These issues are addressed by combining the Gray World Assumption theory with the Multi-Scale Retinex with Single Scale Component and Color Restoration (MSRCR) algorithm. This approach has been demonstrated to enhance the image quality of marine cage biofouling for constructing datasets. (3). The backbone network of Faster R-CNN was replaced from the VGG16 model to the ResNeXt50 model, which was integrated with the CBAM. The structurally symmetric ResNeXt50 model, based on residual networks and numerous parallel branches, not only effectively addresses the vanishing gradient problem caused by increased network depth but also reduces the number of model parameters. This renders it more appropriate for the deployment of algorithmic models on AUVs with constrained computational resources. Concurrently, the dimensionally symmetric CBAM employs channel attention and spatial attention mechanisms. This facilitates the detection model’s ability to focus on both the spatial location and feature textures of targets, thereby enhancing detection accuracy to a significant degree. (4). Finally, adjustments must be made to the size and aspect ratio of the anchor boxes, alongside modification of the Intersection over Union (IoU) threshold for non-maximum suppression (NMS). This is necessary in order to accommodate the size and quantity of fouling shellfish on marine aquaculture cages, thereby enhancing the detection model’s accuracy.
2. Related Works
With the advancement of machine learning methodologies, researchers have begun to extract features such as colours texture, and shape from underwater images, and are employing methods like Support Vector Machines (SVM), K-Means clustering and Random Forests (RF) for object detection [10,11,12]. Li et al. [13] developed an algorithmic model that utilizes two distinct algorithms: The following methods were employed: random forest and partial least squares (PLS). This model integrates spectral data from critical growth stages of potatoes and empirical measurements of leaf chlorophyll content. The model’s primary objective is to identify the most suitable spectral bands, particularly the Opt-NDVI, within the center bands of 408 nm and 552 nm. Xia et al. [14] employed support vector machines (SVMs) and maximum likelihood classification to automate the recognition of cotton plants in images. Additionally, a methodology was developed that leveraged the morphological characteristics of cotton plants to ensure precise enumeration of overlapping seedlings, exhibiting a 91.13% success rate. Alderdice et al. [15] proposed a decision tree model that combines water quality sensor data and fish behavioral characteristics to diagnose common diseases in salmonids with 82% accuracy. However, traditional machine learning algorithms are only suitable for detection tasks in simple scenarios due to their poor generalisation capabilities. In the context of biofouling organisms present on marine aquaculture cages, which exhibit complex and variable shapes, textures and colours, traditional machine learning algorithms are rendered significantly less accurate.
In recent years, deep learning has been widely applied in image recognition and classification within the aquaculture industry [16]. In the context of automated detection of farmed fish, Banno K et al. [17] employed the YOLOv4 target detection algorithm and the construction of a DeepSORT multi-target tracking framework for the automatic identification of behavioral patterns exhibited by Norwegian salmon in aquaculture. These patterns encompassed foraging, evasive swimming, and other behaviors. The study achieved a behavioral classification accuracy of 94.5% and a real-time processing speed of 30 frames per second (FPS). The present study has developed a real-time behavior recognition system, which facilitates the timely detection of anomalous behavior or critical events, thereby enabling the expeditious implementation of suitable measures. Hong et al. [18] introduced a multi-feature extraction MFE module by improving the MobileViTv3 model for automatic identification of feeding behavior of largemouth bass, achieving an accuracy of 96.7% on the feeding intensity classification task. Pang et al. [19] developed a novel deep learning model that integrates the Transformer architecture with both attention mechanisms for the automatic monitoring and prediction of harmful algal blooms (HABs). This model enables accurate recognition of the spatial and temporal distribution of the blooms, thereby providing early warning systems. Liao et al. [20] developed an automated system for the detection of breakage in underwater farmed nets. The system employs an optimized MobileNet-SSD model in conjunction with a key frame extraction detection technique, yielding a model average precision (AP) of 88.55% and a detection speed of 50 frames per second (FPS). This study has achieved automated detection of damage to offshore fish cages, thereby reducing manual intervention and improving detection efficiency. In their study, Yasin Yari et al. [21] achieved an accurate biomass estimation of farmed Atlantic salmon by combining Attention Gates with improved U-Net++, resulting in a Dice coefficient of 0.92. De Verdal et al. [22] developed a deep learning model based on an enhanced Mask R-CNN for the automatic measurement of the size of farmed abalone, achieving a segmentation accuracy mAP of 94%.
In summary, single-stage deep learning detection algorithms are increasingly being applied to underwater object detection. Lightweight single-stage detection algorithms have been shown to achieve optimal performance on underwater objects with relatively simple backgrounds and clear texture features, while also maintaining lightweight characteristics. Nevertheless, challenges persist in the detection of underwater objects, such as biofouling organisms, on marine aquaculture cages. These cages present complex detection environments, characterised by unclear texture features and ambiguous colour information. Consequently, the improved symmetric Faster R-CNN detection model proposed in this paper, when combined with image enhancement algorithms, enables high-precision detection of fouling shellfish on marine aquaculture cages without compromising efficiency.
3. Materials and Methods
3.1. Overall Scheme
The research in this paper is divided into three sections. The subsequent sections will describe the acquisition of RGB images, the construction of the dataset, and the development of a neural network detection model for fouling shellfish on marine aquaculture cages. As shown in Figure 1. A comparison experiment was conducted to analyze the performance of the model. RGB images are primarily acquired by means of underwater drones equipped with searchlight apparatus. These drones take close-up photographs of aquaculture cages underwater, thereby obtaining RGB images of the organisms attached to the cages. The dataset is principally used to train a deep learning network model. This model involves image enhancement, image flipping, and image data annotation of the collected images. The constructed dataset will be used to train an improved symmetric Faster R-CNN visual detection model to obtain detection results for fouling shellfish on marine aquaculture cages. Through continuous optimisation of the neural network model, precise detection of fouling shellfish on marine aquaculture cages has been achieved.
3.2. Image Data Acquisition
The present study was conducted for the purpose of validating the proposed model. The focus was on marine aquaculture cage biofouling (primarily mussels and oysters; hereinafter referred to collectively as fouling shellfish on marine aquaculture cages). The site selected for the collection of RGB image data was the Xiangshan Xing-Yu Aquaculture Cooperative (121.86917° N, 29.47758° E), which is located west of North Bridge at Sanmenkou, Xiangshan County, Ningbo City, Zhejiang Province, China. The region’s climate is classified as subtropical monsoon, characterized by distinct seasonal variations and an average annual temperature of approximately 16 °C. The moderating influence of the ocean ensures conditions conducive to marine aquaculture. Furthermore, the natural environment of Xiangshan Port and its surrounding waters is conducive to the habitation, reproduction, and growth of fish, as evidenced by the relatively turbid coastal waters. This region is particularly conducive to the cultivation of marine fish species, including yellow croaker and sea bass. Additionally, it fosters favorable conditions for the proliferation of biofouling on underwater net cages.
To obtain RGB images of underwater cage shellfish attachments, the ROVMAKER Edge underwater drone was selected from Changzhou Yisu Underwater Robotics Technology Co., Ltd. The drone is equipped with a high-sensitivity USB camera capable of moving up and down within a 90° range, as well as two adjustable 1500-lumen underwater lights. As shown in Figure 2a. Given that the attachments primarily grow on the netting of the aquaculture cages within 3 m of the water surface, the drone was positioned within 3 m of the water surface. The distance between the drone’s lens and the netting was calibrated to a range of 10–15 cm, with planned movements in the up, down, left, and right directions. As shown in Figure 2b. A total of 800 RGB images were captured, with each image possessing a resolution of 768 × 432 pixels.
3.3. Underwater Image Enhancement
Due to the distinct characteristics of the underwater environment, light entering the water is extensively absorbed and scattered by the medium, leading to underwater images frequently exhibiting issues such as underexposure, low contrast, and color casts. This phenomenon complicates the discernment of the edges and texture characteristics of objects in the images, thereby enhancing the complexity involved in utilizing them for training deep learning models. Therefore, in the case of photographs of underwater net cage nets with attached shellfish, the primary focus of image enhancement is on balancing exposure and correcting colors. In this study, we put forth a novel image enhancement algorithm, GWA-MSRCR [23], as a potential solution to address these limitations. The present study primarily employs the grayscale world assumption theory [24] for color correction and utilizes the Multi-Scale Retinex with Single Scale Component and Color Restoration (MSRCR) algorithm [25] to enhance the brightness of the image. The enhanced images are subsequently employed in the construction of the dataset.
When light of different colors enters the water, the blue and green components of the light will propagate in the water for a longer time compared to the red component because of the shorter wavelength. Consequently, the underwater images captured will tend to be dominated by blue and green because there are more pixel dots on the image in the blue and green components. This phenomenon is referred to as a color bias problem in the image. As demonstrated in Figure 3a, the initial phase of the experiment involved the administration of the test substance to the subjects. The grayscale world assumption posits that the average luminance of the red, green, and blue color channels in any given image is ideally equal. This implies that the mean value of the three color channels should be relatively close to each other. Concurrently, the pixel points of the three color channels will be uniformly distributed on the histogram of [0, 255]. Therefore, a color correction method is employed that is predicated on the assumption of a grayscale world in order to address the issue of color bias in underwater images. Initially, the mean and variance of the three color channels of the image are obtained. Subsequently, a new normalized range is designed according to the mean and variance of each channel. Ultimately, the histogram distribution range covers the entire [0, 255] interval, as illustrated in Figure 3b. As demonstrated in Equations (1)–(4):
Calculate the mean and variance of the red, green and blue channels of the underwater image, respectively, and design a new normalized range based on the mean and variance of each channel:
(1)
(2)
where ; denotes the mean of the three channels; denotes the variance of the three channels; and λ is a parameter controlling the dynamic range of the image, which is generally taken as 2.For each channel c, calculate the normalized channel:
(3)
where denotes the original channel value; denotes the normalized channel value.The normalized channel values are scaled and traversed to fit the range [0, 255]:
(4)
where denotes the normalized channel value after scaling.Subsequent to the implementation of color correction, the low exposure of underwater images necessitates consideration. Consequently, the Multi-Scale Retinex with Single Scale Component and Color Restoration (MSRCR) algorithm is employed for the purpose of adjusting underwater image lighting. The MSRCR algorithm is predicated on the Retinex theory, which posits that the value of each pixel in an image can be decomposed into two independent components: the reflectance component, which represents the inherent color of the object surface, and the illumination component, which represents the change in light. Firstly, the luminance decomposition of each pixel point of the underwater image is performed using the MSRCR algorithm to separate the reflection component and the illumination component. Secondly, Gaussian blurring is used to improve the illumination component of the underwater image. Finally, gamma correction is used to process the reflection component in order to equalize the exposure and contrast of the underwater image. As demonstrated in Figure 3c, the presence of these elements is indicative of the occurrence of the phenomenon under investigation. As illustrated in Equations (5)–(7):
Convert the RGB color space of the color-corrected underwater image to HSV color space, and use the MSRCR algorithm to decompose the luminance of the Value layer in HSV space:
(5)
where is the input RGB image; is the converted HSV image; is the luminance component extracted from the HSV image; is the illumination component; is the reflection component; and .Using Gaussian blur to estimate the illumination component to adjust the luminance of the underwater image and to calculate the reflection component, gamma correction is applied to the reflection component at each scale to enhance the contrast, and finally, the HSV color space is converted back to the RGB color space:
(6)
(7)
where is the Gaussian kernel of scale s for estimating the illumination component; γ is the gamma correction parameter, which is usually less than 1, and is taken as 0.4 here; is the weight parameter of scale s; and is the final reflection component.The histogram of pixel distribution in the RGB channels of the image obtained after image enhancement of the first row in Figure 3 is shown in Figure 4. As demonstrated in Figure 4a, the pixel distributions across the three primary colour channels of the original image are, in general, narrow and concentrated in areas of low pixel density. Of particular note is the red channel, which displays the narrowest pixel distribution and the fewest pixels of all. This finding aligns with the established characteristics of underwater imagery, characterised by low contrast and pronounced colour bias. Following the implementation of the Grayscale World algorithm for the purpose of colour correction (as illustrated in Figure 4b), the pixel distribution across all three primary colour channels is observed to extend towards both extremes. The number of peak pixels decreases, and the distribution becomes relatively smoother. This approach has been demonstrated to effectively address issues related to colour bias, enhance image contrast, and achieve a marked improvement in overall image quality. Following the implementation of the MSRCR algorithm for the purpose of enhancing the image brightness (see Figure 4c), a shift to the right was observed in the pixel peaks of the low-value regions of the three primary colour channels. This process involved a reduction in the number of pixels in the low-value regions, resulting in an enhancement of brightness. Consequently, this led to an improvement in the visual clarity of the underwater image. In summary, the image enhancement algorithm of GWA-MSRCR effectively improved the image quality of Fouling Shellfish on Marine Aquaculture Cages, providing high-quality image data for subsequent detection model training.
3.4. Data Set Construction
The construction of a dataset serves as the foundational element for model learning. Neural network models are capable of acquiring the ability to recognize and detect objects through the analysis of the pixel features of objects in the dataset and their corresponding labels. It also functions as a performance metric for quantifying model precision, recall, and other parameters. Initially, the 2500 images enhanced by the GWA-MSRCR algorithm undergo preprocessing. The utilization of Python scripts (PyCharm2023) is predominantly focused on the resizing and random flipping (horizontal, vertical, and 90° rotation) of images to achieve data augmentation. Subsequent to the aforementioned processes, the number of images is augmented to 4000, with a size of 400 × 400 pixels. In order to guarantee the quality of the annotation, all data annotation was carried out by a single individual, with bounding boxes meticulously aligned to object edges. This approach is predicated on the principle of ensuring that no objects are omitted, while concomitantly minimising the inclusion of superfluous background elements. For minute objects, images were subjected to enlargement prior to the application of annotation. Finally, all shellfish attachments exhibiting occlusion rates below 50% were annotated. The images and annotation files were prepared according to the PASCAL VOC (Visual Object Classes) dataset format. The annotated dataset was randomly partitioned into a training set and a testing set at a ratio of 4:1 for the purpose of model training.
4. Detection Model Improvements
4.1. Model Structure for Detecting Fouling Shellfish on Marine Aquaculture Cages
The Faster R-CNN model is a prototypical two-stage detection algorithm that initially attained end-to-end training. The model is composed of two primary components: the Region Proposal Network (RPN), which generates candidate regions through the placement of reference boxes of varying dimensions and lengths at each position of the feature map, and the primary detection network, which is based on the Fast R-CNN model. This network incorporates the VGG16 backbone network, ROI Pooling, and classification regression. The first part, RPN, generates candidate regions based on the feature maps extracted by the backbone network and sends them back to the second part for classification and bounding box regression. In order to better extract detection targets with complex backgrounds and small features and alleviate problems such as gradient disappearance, the backbone network is replaced with the ResNext50 algorithm model with fusion (CBAM), in Figure 5a. In conclusion, the overall performance of the biofouling organisms on the marine aquaculture cage detection model is enhanced by modifying the ratio and size of the anchor boxes and adjusting the intersection over union (IoU) threshold for non-maximum suppression.
4.2. Feature Extraction Using the ResNeXt50 Algorithmic Model
It is evident that biofouling organisms present on marine aquaculture cages exhibit a wide range of morphologies and dimensions. VGG16, the original backbone network for Faster R-CNN, consists of a simple stack of 13 convolutional layers and 3 fully connected layers, containing approximately 138 million parameters. Consequently, the vanishing gradient problem becomes pronounced as network depth increases, hindering the detection of small and complex-featured targets. Furthermore, the VGG16 model is not well-suited for deployment on mobile devices due to its considerable parameter count. Consequently, the ResNeXt50 model [26] was utilised as the replacement backbone network. ResNeXt represents an enhanced iteration of the ResNet architecture, incorporating novel mechanisms such as “channel grouping” and “multi-scale feature extraction” to establish a parallelized, symmetric basic unit, thereby facilitating more efficient image feature extraction [27]. A comparison of the ResNeXt50 model with VGG16 reveals that the former demonstrates superior performance in processing complex underwater cage environments and detecting minute attached organisms, capturing richer details and semantic information.
The ResNeXt50 model introduces grouped convolutions and aggregated residual transformations (ARTs) on top of ResNet50, as shown in Figure 5b. The ResNeXt50 model performs grouped convolutions horizontally within its residual modules, organised into 32 groups (Cardinality = 32). Each group contains a Residual Bottleneck module for the purpose of performing parallel, independent convolutional operations. Finally, the output channels from each branch are summed and fused, as shown in Figure 6a. In the ResNeXt50 module architecture, the input feature map is subdivided into 32 topologically identical branches, each of which features the same convolution kernel size and channel count. The processing of input data by these branches occurs in parallel, with each performing operations that are both identical and independent. This extensive repetition of identical structures directly embodies structural symmetry. This approach has been shown to significantly reduce computational load while increasing network width to enhance feature extraction capabilities. Consequently, the model has the capacity to capture richer textures, edges, and shapes of fouling shellfish on marine aquaculture cages, thereby improving detection accuracy.
4.3. Integration of CBAM
In light of the intricate background pertaining to marine aquaculture cages, in conjunction with the diverse morphologies and dimensions of biofouling organisms that characteristically manifest on such structures, which can result in mutual occlusion, a Convolutional Block Attention Module (CBAM) is incorporated into the backbone network. This integration serves to augment the detection model’s capacity to focus on the designated detection target. The objective of this approach is to enhance the feature extraction capability of the convolutional neural network through the utilisation of channel attention and spatial attention [28], which also serves as a key manifestation of dimensional symmetry. in Figure 6a,b. In accordance with dimensional symmetry, CBAM concomitantly concentrates on the two fundamental dimensions of feature maps: the channel dimension and the spatial dimension. This symmetrical consideration of feature information integrity ensures that the model does not overlook the ‘what’ (channel dimension) nor neglect the ‘where’ (spatial dimension). This facilitates the detection model’s capacity to comprehensively account for both the spatial location of detected objects and their feature textures, thereby enabling more accurate detection decisions.
In order to facilitate the optimisation of the ResNeXt50 backbone network, with the objective of enabling it to focus more effectively on small and overlapping fouling shellfish on marine aquaculture cages, accelerating model convergence and enhancing overall model accuracy and robustness, this study proposes the addition of a set of CBAM modules after each bottleneck block in the four stages of the network (as shown in Figure 6b). This fusion approach does not result in a significant increase in the model’s parameter count, while also facilitating lightweight deployment. As in Equations (8)–(11):
The calculation process for the CBAM module is as follows:
(8)
(9)
where F denotes feature map data; denotes average pooled features; denotes maximum pooled features; σ denotes sigmoid function; , .(10)
(11)
where is the feature data after channel attention processing; is the feature data after spatial attention processing; and denote the average pooled features and the maximum pooled features on each channel, respectively; and denotes a convolution operation with a convolution kernel size of 7 × 7.4.4. Adjustment of the Anchor Boxes
RPN employs a small sliding window (3 × 3 convolution kernel) to scan the feature maps output by the backbone network. The position of the kernel’s core on the feature map is termed the anchor point, and the region scanned by the kernel is designated the anchor box [29]. As shown in Figure 7, for each anchor point on the feature map, the network pre-sets k baseline anchor boxes with varying dimensions and aspect ratios to accommodate the diverse shapes and sizes of fouling shellfish on marine aquaculture cages. The feature information from the region covered by each sliding window is fed into a 256-dimensional fully connected layer for encoding, generating rich intermediate features. Subsequently, the features are entered into two parallel output layers. The classification layer (cls layer) generates 2k scores for each anchor box, determining whether the box contains the target object (foreground) or is merely background. The regression layer (reg layer) outputs 4k coordinate parameters. These parameters are utilised to optimise the position and dimensions of the baseline anchor boxes, thereby facilitating a more precise alignment with the actual contour of the attachment. This mechanism facilitates the efficient generation of a substantial number of high-quality candidate regions by the model, thereby establishing a robust foundation for subsequent precise identification and localisation.
It can thus be concluded that anchor boxes of differing dimensions and proportions have a significant impact on the overall detection performance of the model. For instance, Muhammad et al. [30] determined that the default anchor box dimensions and aspect ratio of the Faster R-CNN model were inadequate for weed detection tasks. The default anchor boxes may not accurately represent the true dimensions and shape of weeds, leading to suboptimal detection performance. Extensive comparative experimentation revealed that augmenting the size to 64 × 64 and the ratio to 1:3 and 3:1 led to a substantial enhancement in the average precision (AP) of weed detection. Specifically, the AP of Chinese apple weeds exhibited a 24.95% increase, while the mean average precision (mAP) demonstrated a 2.58% rise. In a departure from the conventional approach of manual adjustment of anchor box parameters. Yan et al. [31] introduced the K-means clustering algorithm to determine the size and ratio of anchor boxes. The enhanced Faster R-CNN model demonstrated a substantial enhancement in target detection accuracy in radar images. In comparison with conventional methodologies, the enhanced model demonstrates a superior capacity to discern diminutive targets with greater precision. In summary, the judicious selection of an anchor box size and ratio is conducive to enhancing the detection progress of the detection algorithm. The initial Faster R-CNN model configured the anchor box width-to-height ratio to (1:2, 1:1, 2:1) and the size to (8, 16, 32). In order to enhance the consistency of the anchor box with the proportions and sizes of shellfish attached to marine aquaculture cages, the aspect ratio of the pixels of the images of shellfish attached to underwater cages in the training set was estimated. In conjunction with the comprehensive training regimen, the aspect ratio of the anchor boxes was modified to (1:2, 1:1, 2:1, 3:1), and the size was adjusted to (2, 4, 8, 16) to ensure optimal alignment with the dimensions and proportions of the target objects. Concurrently, the detection accuracy that was trained with this ratio model exhibited the highest level of precision.
4.5. Adjusting the IoU Threshold for NMS
In the RPN framework, the generation of anchor boxes necessitates their alignment with the actual bounding boxes, i.e., the image annotations. The IoU threshold plays a pivotal role in quantifying the extent of overlap between the anchor boxes and the real bounding boxes. These two types of boxes are designated as positive samples. Foreground samples are designated as negative samples (i.e., background) when their overlap exceeds the IoU threshold (e.g., 0.7) or falls short of the IoU threshold (e.g., 0.3). In the event that the measurement falls between the two established thresholds (0.3–0.7), it is designated as an “ignored sample.” It is important to note that these anchor boxes are typically disregarded in such instances.
By modifying the aspect ratio and pixel size of the anchor boxes, 16 different scales and sizes of anchor boxes are generated for each anchor point on the feature map (four aspect ratios, four pixel sizes). The anchor boxes are then binary classified by a 1 × 1 convolution to determine whether or not they contain the target. The probability value that each anchor box contains the target is output. The anchor boxes’ position is fine-tuned to approximate the real bounding box by predicting its offset from the real bounding box through an additional 1 × 1 convolution. Subsequently, all the generated anchor boxes are filtered to extract the top 2000 anchor boxes with the highest scores. Finally, these 2000 anchor boxes are subjected to non-maximum suppression to remove the anchor boxes with excessive overlap. This process ensures that each anchor box is compared with the anchor box with the highest score in the current equals, and the anchor box is suppressed if the intersection over union (IoU) is greater than a set threshold. In the final stage of the process, the 300 anchor boxes that have received the highest scores are filtered for subsequent classification and regression. Therefore, the selection of a reasonable IoU threshold, non-maximum suppression (NMS), can effectively remove redundant anchor boxes and improve the accuracy and efficiency of target detection. As demonstrated in Equations (12) and (13):
The formula for IoU is as follows:
(12)
where A and B denote the respective regions of two overlapping anchor boxes; denotes the intersection of two anchor boxes; and denotes the union region of two anchor boxes.The NMS calculation formula is as follows:
(13)
where is the score of the i-th test frame; A is the test frame with the highest confidence in the region of interest, is the i-th test frame that overlaps with A; and is the overlapping region threshold.5. Results
5.1. Model Performance Evaluation Metrics
The performance of the improved algorithm model for detecting fouling shellfish on marine aquaculture cages is measured by several key parameters, including precision, recall, F1 score, average precision (AP), mean average precision (mAP), and frames per second (FPS). Precision, in particular, is defined as the ratio of correctly detected shellfish targets in the dataset to the total number of correctly and incorrectly identified shellfish targets. The model’s accuracy in predicting positive samples is measured by this metric. Recall is defined as the ratio of the number of correctly identified shellfish targets to the total number of shellfish targets in the dataset, including both correctly and incorrectly detected samples. It serves as a metric for evaluating the model’s capacity to encompass positive samples. The F1 Score is a metric that considers both precision and recall, aiming to achieve a balanced evaluation of both. Consequently, the F1 score will only increase when both precision and recall are high. The average accuracy of the model (AP) at varying recall rates is represented by the area under the precision-recall curve (PRC). The mean average precision (mAP) is the average of the AP values across multiple categories. It is a critical metric for evaluating the performance of multi-class object detection models. In the event that a single detection category is employed, mAP attains numerical equivalence with AP. FPS, or frames per second, is a metric that quantifies the number of images or video frames a model processes within a given time frame. It serves as an indicator of the model’s real-time performance and efficiency. As in Equations (14)–(19):
(14)
(15)
(16)
(17)
(18)
(19)
Among them:
TP (True Positive sample): indicates the number of underwater net-clad shellfish attachment targets that were correctly detected;
FP (False Positive Sample): indicates the number of other attachments that were incorrectly detected as underwater net-clothed shellfish attachment targets;
FN (False Negative Samples): indicates the number of samples that were originally underwater net-coated shellfish attachments but were not detected;
N is the number of labeling categories, and denotes the AP value of the i-th category.
5.2. Training Platform and Parameters Setting
The proposed enhanced detection model is trained using the dataset cited in Section 3.4, and its performance is evaluated. The experimental model training platform is composed of the following components: an NVIDIA GeForce RTX 4080 for GPU acceleration, an Intel (R) Core (TM) i5-14600KF, and 16 GB of RAM for CPU. The programming software utilized was Pycharm 2023, and the deep learning framework employed was Pytorch 1.12.1. During the training of the model, the dataset is uniformly resized to 400 pixels × 400 pixels. The learning rate, weight decay, and momentum are, respectively, the following important parameters for model training: The values 0.0001, 0, and 0.9 are utilized, and the optimizer is designated as Adam (Adaptive Moment Estimation). A total of 250 epochs are trained.
5.3. Performance Comparison Before and After Model Improvement
In order to demonstrate that the enhanced detection model evinces strong generalisation capabilities and eradicates overfitting during training, K-fold cross-validation was employed to subdivide the dataset [32]. The 4000 underwater images and their corresponding label files were then divided into five subsets (K = 5). Each subset was utilised as a validation set on a single occasion, with the remaining sets comprising the training data. The enhanced symmetric Faster R-CNN model underwent five training iterations, each comprising multiple training runs. The mean average precision across these five training sets is illustrated in Figure 8. The confidence interval for this cross-validation training was determined to be 92.76% (95% CI [91.12%, 95.16%]) through the implementation of the aforementioned training. This finding suggests that the average precision of the cross-validation is 92.76%. Following a series of training iterations, the true value of each training run has a 95% probability of falling within the interval [91.12%, 95.16%]. Within the permissible error range, these findings indicate that the enhanced detection model exhibits remarkable generalisation capabilities, the dataset is representative, the model training process is stable, the selection of hyperparameters is reasonable, and the data volume is sufficient. The synergy of these factors ensures the model’s effective performance on unseen new data.
Utilising a dataset of fouling shellfish on marine aquaculture cages that was meticulously constructed, a comprehensive evaluation of the performance of an improved symmetric Faster R-CNN detection model was conducted. A series of ablation experiments was conducted, and the results were compared based on several evaluation metrics. As shown in Table 1, replacing the backbone network from the original VGG16 to ResNeXt50 improved the average precision by 5.89%, demonstrating that residual networks are more suitable as the backbone for fouling shellfish on marine aquaculture cages detection models. Concurrent integration of the CBAM module into the ResNeXt50 backbone network led to an enhancement in average precision by 2.19%, thereby addressing the challenge posed by substantial variations in the morphology and dimensions of biofouling. Finally, modifying the proportions and dimensions of the anchor boxes resulted in an additional improvement to the average precision of 1.10%. It is important to note that the weight file of the optimised detection model is approximately one-fifth the size of the original model. As shown in Figure 9, the line charts depict the relationship between average precision (AP) and validation loss, plotted against the number of training epochs for various model modification approaches. The figure indicates that the AP curves of all improved models plateau in the later stages (in Figure 9a). Furthermore, the final enhanced model demonstrated its superior comprehensive performance by converging to the highest AP value and consistently outperforming other models throughout most of the training cycles. Concurrently, the loss curves for all models stabilise in the subsequent training stages without significant rebound, indicating that the models did not overfit and possess strong generalisation capabilities (in Figure 9b). This has the effect of enhancing the efficiency of detecting fouling shellfish on marine aquaculture cages. In summary, the optimised detection model presented in this paper demonstrates high detection accuracy and reduced model weight, making it suitable for the task of detecting biofouling organisms on marine aquaculture cages.
The utilisation of the improved symmetric Faster R-CNN detection model and the original Faster R-CNN detection model to detect and identify marine aquaculture cage shellfish biofouling in captured images has yielded the results illustrated in Figure 10. As demonstrated in the accompanying figure, due to the low visibility and blurriness of underwater images, the improved symmetric Faster R-CNN detection model is capable of detecting smaller and more concealed instances of shellfish fouling in comparison to the original Faster R-CNN detection model. Furthermore, the dimensions and aspect ratio of the detection boxes generated by the improved symmetric Faster R-CNN detection model are more appropriate for this detection task. In summary, the improved symmetric Faster R-CNN detection model demonstrates higher detection accuracy than the original Faster R-CNN model. Consequently, this model is deemed to be more suitable for detecting fouling shellfish on marine aquaculture cages.
5.4. Impact of IoU Threshold on Detection Performance
The elimination of overlapping anchor boxes through the implementation of an IoU thresholding strategy results in the conversion of high-overlapping low low-confidence anchor boxes to high-confidence anchor boxes. Consequently, the selection of an appropriate IoU threshold proves advantageous in enhancing the model’s accuracy. To this end, five distinct IoU thresholds (0.3, 0.4, 0.5, 0.6, and 0.7) were incorporated into the enhanced model for training. The ensuing results are presented in Table 2 and Figure 11. The IoU thresholds of 0.4, 0.5, and 0.6 demonstrate a substantial enhancement in their aggregate performance in comparison to 0.3 and 0.7. Notably, when the threshold is set at 0.4, the F1 score attains its maximum of 86.71%. Conversely, when the threshold is set at 0.6, the average accuracy achieves its peak at 94.27%. As the IoU decreases, the overlap between the prediction frame and the real frame necessitates a smaller amount of overlap, thereby facilitating higher precision and recall, as indicated by Equation (16). Consequently, achieving a higher F1 score becomes more feasible. Conversely, when the IoU threshold is set at 0.6, the overlap between the prediction frame and the real frame increases, leading to a reduction in precision to a certain extent. However, the recall rate remains higher, as demonstrated by Equation (17), enabling the attainment of a higher average precision. The following text is intended to provide a comprehensive overview of the subject matter. Given the necessity of high precision and recall rates for marine aquaculture cage detection, the IoU threshold of 0.4 is deemed more suitable for this particular detection task.
5.5. Comparison Between Different Detection Models
In order to validate the superiority of the improved symmetric Faster R-CNN model, a range of state-of-the-art one-stage object detection algorithms were analysed, namely YOLOv8, YOLOv5, YOLOv11, and SSD512 object detection models. In addition, two classic two-stage detection algorithms, Mask R-CNN and Fast R-CNN, were considered, as well as a lightweight underwater detection algorithm model based on the Transformer architecture, HTDet, and the original Faster R-CNN model. These were then compared against the improved symmetric Faster R-CNN model (in Table 3). As shown in Table 3: (1). The improved symmetric Faster R-CNN model demonstrates a substantially higher mAP (IoU = 0.5) value in comparison to the single-stage detection model, exhibiting a 13.51% enhancement over the SSD512 model. This finding indicates that the feature extraction of single-stage detection models is relatively coarse, rendering them susceptible to false negatives for small, densely packed objects in complex backgrounds. Consequently, they are not well-suited to the task of detecting biological fouling on marine cages. (2). The improved symmetric Faster R-CNN model demonstrated superior performance in comparison to the two-stage detection model Mask R-CNN, with an increase of 1.17%. Additionally, its performance exceeded that of the lightweight underwater detection algorithm model HTDet, which is based on the Transformer architecture, by 2.8%. This finding indicates that utilising ResNeXt50 with the integrated CBAM module as the primary network architecture consistently yields superior feature extraction performance in comparison to detection models employing ResNet50 as the backbone network. Furthermore, integration of the CBAM module enhances the detection model’s ability to extract features from small and overlapping objects, rendering it more suitable for the current detection task. Despite the Transformer architecture’s proficiency in global modelling, it is susceptible to overfitting in underwater domains characterised by limited data, resulting in high computational costs. Consequently, it is ill-suited for complex underwater detection tasks. (3). The weight file of the improved symmetric Faster R-CNN model is only larger than those of YOLOv5s (17.71 MB) and YOLOv11s (20.36 MB), amounting to approximately one-fifth of the original Faster R-CNN model. The FPS is at 12.68 f/s. Despite the fact that the inference speed is not especially elevated, in the context of the application scenario of observing the distribution of fouling shellfish on marine aquaculture cages to provide cleaning routes for AUV, a reduced pace is deemed to be acceptable. However, it is imperative to minimise the occurrence of missed detections and false positives. Consequently, the accuracy of detection is a more critical factor than the speed of inference.
Overall, the optimized model demonstrates superior performance in terms of detection accuracy and inference efficiency when compared to current mainstream object detection models. This renders the model highly suitable for the task of detecting fouling shellfish on marine aquaculture cages.
Table 3Performance comparison of the improved Faster R-CNN detection model with other detection models.
| Models | Backbone | mAP/% (0.5) | W-Size/MB | FPS/f/s |
|---|---|---|---|---|
| YOLOv8n | CSPDarknet | 67.77 | 130.46 | 4.57 |
| YOLOv5s | CSPDarknet | 64.28 | 17.71 | 6.97 |
| YOLOv11s | CSPDarknet | 70.36 | 20.36 | 8.69 |
| SSD512 | VGG16 | 79.63 | 356.13 | 15.67 |
| HTDet | Transformer | 90.34 | 125.06 | 10.61 |
| Fast R-CNN | VGG16 | 80.54 | 415.03 | 16.68 |
| Mask R-CNN | ResNet-50(FPN) | 91.97 | 168.16 | 10.98 |
| Original Faster R-CNN | VGG16 | 83.96 | 494.43 | 19.51 |
| Improved symmetric Faster R-CNN | ResNeXt50(CBAM) | 93.14 | 115.99 | 12.68 |
5.6. Impact on Detection Model Performance Before and After Image Enhancement
In order to demonstrate the impact of image enhancement on the performance of the detection model and the efficacy of the conventional image enhancement algorithm GWA-MSRCR presented in this paper, a comparative experiment was conducted using four widely applied traditional algorithms for underwater image enhancement: Single Scale Retinex Algorithm (SSR), Multi Scale Retinex Algorithm (MSR), Multi Scale Retinex with Color Restoration Algorithm (MSRCR), and HSV Histogram Equalization. The enhancement effect on underwater images is demonstrated in Figure 12. The primary comparison metrics encompass detection model performance indicators (i.e., precision, recall, and AP) and quality assessment indicators such as peak signal-to-noise ratio (PSNR). One of the quality assessment metrics, PSNR, is a full-reference (FR) assessment metric. That is to say, it is necessary to assess the image quality by comparing the difference between the processed image and the original image. The larger the value, the better the image enhancement quality is demonstrated to be. Since all images in the entire dataset must be evaluated, the average peak signal-to-noise ratio of the entire dataset is calculated. As in Equations (20)–(22):
(20)
(21)
(22)
where, “MSE” is the mean square error; “MAX” is the maximum value of the image data, which is generally 255; I is the original image data; K is the enhanced image data; m and n are the number of rows and columns of the image, as well as the “Average PSNR” is the average peak signal-to-noise ratio; and N is the total number of images in the dataset.As demonstrated in Table 4, the PSNR of the underwater images processed using the GWA-MSRCR algorithm is 2.47 dB, 6.08 dB, 3.68 dB, and 1.33 dB higher than that of other algorithms, respectively, thereby substantiating the superior performance of the image enhancement algorithm proposed in this paper. The average precision of the detection model that was trained using a dataset that was enhanced with the GWA-MSRCR algorithm was 8.58%, 9.98%, 7.99%, and 5.52% higher than that of the datasets enhanced with other algorithms, respectively. Furthermore, the average precision of the detection model that was trained using the original dataset was 11.72% higher than that of the other datasets, thereby proving the effectiveness of image enhancement in improving the detection performance of detection algorithms.
6. Conclusions
The present study employs a combination of AUV remote sensing and an advanced deep learning detection model of Faster R-CNN, in conjunction with a newly developed traditional image enhancement algorithm, GWA-MSRCR, for the detection and recognition of marine fouling organisms in offshore aquaculture. The original Faster R-CNN detection model has been enhanced through the implementation of four distinct methodologies, yielding the following primary conclusions: (1). The improved symmetric Faster R-CNN detection model has demonstrated a substantial enhancement in the efficacy of biofouling organism detection on shellfish in marine aquaculture cages, as compared with the original Faster R-CNN detection model. The average detection rate of the proposed model is 9.18% higher than that of the original Faster R-CNN detection model, and its training weight is approximately one-fifth of that of the original Faster R-CNN detection model. Suitable for high-precision detection of fouling shellfish on marine aquaculture cages in real-world environments. (2). After training the five IoU thresholds (0.3, 0.4, 0.5, 0.6, and 0.7), it was determined that the IoU threshold of 0.6 yielded the highest average detection rate of 94.27%. Similarly, the IoU threshold of 0.4 resulted in the highest F1 score of 86.71%. These findings suggest that the IoU threshold of 0.4 provides a more balanced accuracy and recall balance for the detection model, making it a suitable choice for this detection task. (3). The improved symmetric Faster R-CNN detection model was employed to identify fouling shellfish on marine aquaculture cages. The mean precision was 93.14% when the IoU threshold was set to 0.5, which is 25.37%, 28.86%, 22.78%, 13.51%, 2.8%, 12.6% and 1.17% higher than the current mainstream YOLOv8, YOLOv5, YOLOv11, SSD512, HTDet, Fast R-CNN and Mask R-CNN detection models, respectively. This result indicates that the proposed method clearly has advantages in detecting fouling shellfish on marine aquaculture cages. (4). The application of the GWA-MSRCR image algorithm to enhance the dataset results in a peak signal-to-noise ratio that is 2.47 dB higher than that of the MARCR image enhancement algorithm. The enhanced dataset demonstrates an average accuracy that is 8.58% higher than that of the MARCR algorithm and 11.72% higher than that of the original dataset. These results indicate that preliminary image enhancement of the underwater dataset can effectively improve the performance of the detection model. Image enhancement has been shown to be an effective strategy for improving the performance of the detection model.
Future of Work
This study demonstrates that the improved symmetric Faster R-CNN convolutional neural network model exhibits high accuracy and robustness for detecting fouling shellfish on marine aquaculture cages. However, it is important to note that water turbidity significantly impacts detection performance. Furthermore, due to the challenges associated with the acquisition of underwater images and the close relationship between biofouling growth and temperature/climate conditions, the present research focused solely on the detection performance of shellfish biofouling. In subsequent research, multi-sensor fusion technology will be applied to underwater detection, with optical and acoustic fusion schemes being utilised. Optical sensors (underwater cameras) provide colour and texture information of underwater detection targets at close range, while acoustic sensors (underwater sonar) detect edge contours and distance information of underwater detection targets at long range. The integration of these two data sources at the level of the data itself has the capacity to generate RGB-D images with depth information, thereby enhancing the accuracy of detection. Finally, additional datasets of biofouling organisms on marine aquaculture cages will be collected and applied to train underwater net detection models. The subsequent deployment of this algorithmic model for the detection and identification of biofouling organisms on marine aquaculture cages will facilitate broader applications in aquaculture.
P.Z.: Software: Investigation, Data curation, Writing—original draft preparation, Visualization, Validation; H.L.: Conceptualization, Formal analysis, Resources, Writing—review and editing, Project administration; J.C.: Methodology, Writing—review and editing, Supervision, Funding acquisition; C.G.: Writing—review and editing, Supervision. All authors have read and agreed to the published version of the manuscript.
The data presented in this study are available on request from the cor-responding author. The data are not publicly available due to the privacy policy of the organization.
The authors declare no conflicts of interest.
| AUV | Autonomous Underwater Vehicle |
| CNN | Convolutional Neural Network |
| MSRCR | Multi-Scale Retinex with Single Scale Component and Color Restoration |
| GWA | Grayscale World Assumption |
| VGG16 | Visual Geometry Group 16-layer |
| ResNeXt50 | Residual Networks with Aggregated Transformations 50-layer |
| RPN | Region Proposal Network |
| CBAM | Convolutional Block Attention Module |
| IoU | Intersection over Union |
| NMS | non-maximum suppression |
| AP | Average Precision |
| mAP | mean Average Precision |
| FPS | Frames Per Second |
| K-fold | K-Fold Cross-Validation |
| PSNR | Peak Signal-to-Noise Ratio |
| FR | Full-Reference |
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 General flowchart of marine fouling organisms detection.
Figure 2 Data acquisition platform. (a) UAV structure diagram, (b) schematic diagram of attachment picture acquisition.
Figure 3 Comparison of underwater image enhancement. (a) Original image, (b) Color correction image, (c) Brightness enhanced image.
Figure 4 Comparison of histogram distributions for RGB channel pixels in enhanced images. (a) Original Image Histogram, (b) Color correction Histogram, (c) Brightness enhanced Histogram. Note: Frequency: Vertical axis. Represents the number of pixels with a specific pixel value.
Figure 5 (a) Faster R-CNN detection model, (b) ResNeXt50 backbone network model. Note: Fully connected 1 (cls): The first fully connected layer, used for classification (cls), outputting category probabilities; Fully connected 2 (box): The second fully connected layer, used for bounding box regression (box), outputting the coordinates of the bounding box; Max Pool 3 × 3: Followed by a 3 × 3 max pooling layer; Global avg pool: Global average pooling, which compresses each channel of the feature map into a single value.
Figure 6 (a) Structure of residual network fused with CBAM, (b) Structure of CBAM. Note: Avg Pool: Average pooling operation, used to capture the average features across channels; Share MLP: Shared multi-layer perceptron (MLP), used to learn the correlations between channels; Hadamard product: Hadamard product (element-wise product), used to apply channel attention to the input feature map.
Figure 7 Schematic diagram of anchor boxes in RPN.
Figure 8 Comparison of AP curves for different folds in cross-validation.
Figure 9 Comparison of performance before and after model improvement. (a) Average Precision of Ablation Study Line Chart, (b) Line Chart Comparing Loss Rates in Ablation Study.
Figure 10 Comparison of video detection results between the improved detection model and the original model. (a,b) Original shot images, (c,d) Original detection model detects images, (e,f) Improved of detection model detects images. The red solid rectangular box in the Figure indicates that the model detects the target by itself, and the black dashed oval box indicates that the target is not detected by the model.
Figure 11 Comparison of AP curves for different IoU thresholds.
Figure 12 Comparison of underwater image effects enhanced by different algorithms.
Performance comparison of different modifications of the Faster R-CNN detection model.
| Method | Precision/% | Recall/% | AP/% | W-Size/MB |
|---|---|---|---|---|
| Original Faster R-CNN | 55.39 | 97.44 | 83.96 | 494.43 |
| Faster R-CNN + ResNeXt50 | 80.27 | 87.90 | 89.85 | 106.22 |
| Faster R-CNN + ResNeXt50 + CBAM | 78.86 | 87.49 | 92.04 | 115.84 |
| Faster R-CNN + ResNeXt50 + CBAM + Anchor | 88.17 | 82.46 | 93.14 | 115.97 |
Comparison of detection model performance for different IoU thresholds.
| IoU | Precision/% | Recall/% | AP/% | F1-Score/% |
|---|---|---|---|---|
| 0.3 | 88.66 | 82.56 | 87.33 | 85.50 |
| 0.4 | 90.44 | 83.28 | 93.33 | 86.71 |
| 0.5 | 88.17 | 82.46 | 93.14 | 85.21 |
| 0.6 | 80.20 | 88.40 | 94.27 | 84.09 |
| 0.7 | 68.80 | 81.95 | 85.23 | 74.79 |
Comparison of Detection Model Performance Before and After Underwater Image Enhancement.
| Method | Av-PSNR/dB | Precision/% | Recall/% | AP/% |
|---|---|---|---|---|
| Original Image | 0.00 | 82.58 | 76.82 | 82.55 |
| MSRCR Image | 62.09 | 80.59 | 80.31 | 85.69 |
| SSR Image | 58.48 | 77.08 | 81.56 | 84.29 |
| MSR Image | 60.88 | 80.13 | 80.78 | 86.28 |
| HSV Image | 63.23 | 80.06 | 84.36 | 88.75 |
| GWA-MSRCR Image | 64.56 | 80.20 | 88.40 | 94.27 |
Note: The term “Original Image” is used to denote an image that has undergone no image enhancement; the term “HSV Image” refers to an image that has undergone enhancement using HSV Histogram Equalization; the term “GWA-MSRCR Image” refers to an image obtained by color correction using an algorithm based on the grayscale world assumption and brightness enhancement using the MARCR algorithm, which is the algorithm used in this paper.
1. Rossi, L.; Zoli, M.; Capoccioni, F.; Pulcini, D.; Martini, A.; Bacenetti, J. Insights into different marine aquaculture infrastructures from a life cycle perspective. Aquac. Eng.; 2024; 107, 102462. [DOI: https://dx.doi.org/10.1016/j.aquaeng.2024.102462]
2. Li, P.; Gong, F.; Qin, H.; Yu, S. Numerical and experimental study of two common types of fouled net panels in the ocean. Aquac. Eng.; 2024; 106, 102394. [DOI: https://dx.doi.org/10.1016/j.aquaeng.2024.102394]
3. Xing, W.; Huang, X.H.; Li, G.; Pang, G.; Yuan, T. Development of a dynamic simulation platform for deep-sea cage cleaning robot based on Gazebo. South China Fish. Sci.; 2024; 20, pp. 1-10.
4. Yuan, T.-P.; Huang, X.-H.; Hu, Y.; Wang, S.-M.; Tao, Q.-Y.; Pang, G.-L. Aquaculture net cleaning with cavitation improves biofouling removal. Ocean. Eng.; 2023; 285, 115241. [DOI: https://dx.doi.org/10.1016/j.oceaneng.2023.115241]
5. Liu, S.Y.; Huang, L.Q.; Fan, G.B. Research status and application of cage cleaning technology. Clean. World; 2021; 37, pp. 28–29+32.
6. Fitridge, I.; Dempster, T.; Guenther, J.; de Nys, R. The impact and control of biofouling in marine aquaculture: A review. Biofouling; 2012; 28, pp. 649-669. [DOI: https://dx.doi.org/10.1080/08927014.2012.700478]
7. Ramkumar, G.; Ayyadurai, M. An Effectual Underwater Image Enhancement using Deep Learning Algorithm. Proceedings of the 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS); Madurai, India, 6–8 May 2021; pp. 1507-1511. [DOI: https://dx.doi.org/10.1109/ICICCS51141.2021.9432116]
8. Biazi, V.; Marques, C. Industry 4.0-based smart systems in aquaculture: A comprehensive review. Aquac. Eng.; 2023; 103, 102360. [DOI: https://dx.doi.org/10.1016/j.aquaeng.2023.102360]
9. Wang, L.; Xu, X.; An, S.; Han, B.; Guo, Y. CodeUNet: Autonomous underwater vehicle real visual enhancement via underwater codebook priors. ISPRS J. Photogramm. Remote Sens.; 2024; 215, pp. 99-111. [DOI: https://dx.doi.org/10.1016/j.isprsjprs.2024.06.009]
10. Xiao, F.; Wang, H.; Li, Y.; Cao, Y.; Lv, X.; Xu, G. Object Detection and Recognition Techniques Based on Digital Image Processing and Traditional Machine Learning for Fruit and Vegetable Harvesting Robots: An Overview and Review. Agronomy; 2023; 13, 639. [DOI: https://dx.doi.org/10.3390/agronomy13030639]
11. Islam, J.; Xia, Y.; Sattar, J. Fast Underwater Image Enhancement for Improved Visual Perception. IEEE Robot. Autom. Lett.; 2020; 5, pp. 3227-3234. [DOI: https://dx.doi.org/10.1109/LRA.2020.2974710]
12. Li, R.; Tian, T.; Ji, F.F.; Wang, D.; He, Q.; Zhang, Q. Application and prospect of image processing technology in tomato growth monitoring. J. Chin. Agric. Mech.; 2025; 46, pp. 74-82. [DOI: https://dx.doi.org/10.13733/j.jcam.issn.2095-5553.2025.11.011]
13. Li, C.; Chen, P.; Ma, C.; Feng, H.; Wei, F.; Wang, Y.; Shi, J.; Cui, Y. Estimation of potato chlorophyll content using composite hyperspectral index parameters collected by an unmanned aerial vehicle. Int. J. Remote Sens.; 2020; 41, pp. 8176-8197. [DOI: https://dx.doi.org/10.1080/01431161.2020.1757779]
14. Xia, L.; Zhang, R.; Chen, L.; Huang, Y.; Xu, G.; Wen, Y.; Yi, T. Monitor cotton budding using SVM and UAV images. Appl. Sci.; 2019; 9, 4312. [DOI: https://dx.doi.org/10.3390/app9204312]
15. Alderdice, D.F.; Velsen, F.P.J. Relation between temperature and incubation time for eggs of Chinook salmon (Oncorhynchus tshawytscha). J. Fish. Res. Board Can.; 1978; 35, pp. 69-75. [DOI: https://dx.doi.org/10.1139/f78-010]
16. Wang, X.; Wu, Y.; Xiao, M.; Shi, Y. Research progress on intelligent recognition technology in aquaculture. J. South. China Agric. Univ.; 2023; 44, pp. 24-33. [DOI: https://dx.doi.org/10.7671/j.issn.1001-411X.202204013]
17. Banno, K.; Kaland, H.; Crescitelli, A.M.; Tuene, S.A.; Aas, G.H.; Gansel, L.C. A novel approach for wild fish monitoring at aquaculture sites: Wild fish presence analysis using computer vision. Aquacult Environ. Interact.; 2022; 14, pp. 97-112. [DOI: https://dx.doi.org/10.3354/aei00432]
18. Wan, G.; Yao, L. LMFRNet: A Lightweight Convolutional Neural Network Model for Image Analysis. Electronics; 2024; 13, 129. [DOI: https://dx.doi.org/10.3390/electronics13010129]
19. Qian, J.; Qian, L.; Pu, N. An Intelligent Early Warning System for Harmful Algal Blooms: Harnessing the Power of Big Data and Deep Learning. Environ. Sci. Technol.; 2024; 58, pp. 15607-15618. [DOI: https://dx.doi.org/10.1021/acs.est.3c03906]
20. Liao, W.; Zhang, S.; Wu, Y.; An, D.; Wei, Y. Research on intelligent damage detection of far-sea cage based on machine vision and deep learning. Aquac. Eng.; 2022; 96, 102219. [DOI: https://dx.doi.org/10.1016/j.aquaeng.2021.102219]
21. Yari, Y.; Næve, I.; Hammerdal, A.; Bergtun, P.H.; Måsøy, S.-E.; Voormolen, M.M.; Lovstakken, L. Automated Measurement of Ovary Development in Atlantic Salmon Using Deep Learning. Ultrasound Med. Biol.; 2024; 50, pp. 364-373. [DOI: https://dx.doi.org/10.1016/j.ultrasmedbio.2023.11.008]
22. Amer, S.A.; Rahman, A.N.A.; ElHady, M.; Osman, A.; Younis, E.M.; Abdel-Warith, A.-W.A.; Moustafa, A.A.; Khamis, T.; Davies, S.J.; Ibrahim, R.E. Use of moringa protein hydrolysate as a fishmeal replacer in diet of Oreochromis niloticus: Effects on growth, digestive enzymes, protein transporters and immune status. Aquaculture; 2024; 579, 740202. [DOI: https://dx.doi.org/10.1016/j.aquaculture.2023.740202]
23. Zhang, W.; Li, G.; Ying, Z. A new underwater image enhancing method via color correction and illumination adjustment. Proceedings of the 2017 IEEE Visual Communications and Image Processing (VCIP); St. Petersburg, FL, USA, 10–13 December 2017; pp. 1-4. [DOI: https://dx.doi.org/10.1109/VCIP.2017.8305027]
24. van de Weijer, J.; Gevers, T.; Gijsenij, A. Edge-Based Color Constancy. IEEE Trans. Image Process.; 2007; 16, pp. 2207-2214. [DOI: https://dx.doi.org/10.1109/TIP.2007.901808]
25. Panetta, K.; Gao, C.; Agaian, S. Human-visual-system-inspired underwater image quality measures. IEEE J. Ocean. Eng.; 2016; 41, pp. 541-551. [DOI: https://dx.doi.org/10.1109/JOE.2015.2469915]
26. Cheng, J.Y.; Chen, M.J.; Li, T.; Sun, Q.N.; Zhang, X.B.; Zhao, Y.Y.; Zhu, Y.H.; Gu, Q. Detection of peach trees in UAV remote sensing images based on improved Faster-R-CNN network. Acta Agric. Zhejiangensis; 2024; 36, pp. 1909-1919. [DOI: https://dx.doi.org/10.3969/j.issn.1004-1524.20230912]
27. Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); Honolulu, HI, USA, 21–26 July 2017; pp. 1430-1438. [DOI: https://dx.doi.org/10.1109/CVPR.2017.634]
28. Woo, S.; Park, J.; Lee, J.Y.; Kwon, I. CBAM: Convolutional block attention module. Proceedings of the European Conference on Computer Vision (ECCV); Munich, Germany, 8–14 September 2018; pp. 3-19. [DOI: https://dx.doi.org/10.48550/arXiv.1807.06521]
29. Pan, Y.; Zhu, N.; Ding, L.; Li, X.; Goh, H.-H.; Han, C.; Zhang, M. Identification and Counting of Sugarcane Seedlings in the Field Using Improved Faster R-CNN. Remote Sens.; 2022; 14, 5846. [DOI: https://dx.doi.org/10.3390/rs14225846]
30. Saleem, M.H.; Potgieter, J.; Arif, K.M. Weed Detection by Faster RCNN Model: An Enhanced Anchor Box Approach. Agronomy; 2022; 12, 1580. [DOI: https://dx.doi.org/10.3390/agronomy12071580]
31. Yan, H.; Chen, C.; Jin, G.; Zhang, J.; Wang, X.; Zhu, D. Implementation of a Modified Faster R-CNN for Target Detection Technology of Coastal Defense Radar. Remote Sens.; 2021; 13, 1703. [DOI: https://dx.doi.org/10.3390/rs13091703]
32. Leinonen, T.; Wong, D.; Vasankari, A.; Wahab, A.; Nadarajah, R.; Kaisti, M.; Airola, A. Empirical investigation of multi-source cross-validation in clinical ECG classification. Comput. Biol. Med.; 2024; 183, 109271. [DOI: https://dx.doi.org/10.1016/j.compbiomed.2024.109271]
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.