Content area
Full text
Introduction
The potato is a widely grown food and cash crop, known for its drought resistance, versatility, and long industrial chain. It is considered one of the most promising cash crops of the twenty-first century. Currently, potatoes are the fourth largest food crop in China. While the area planted with potatoes and annual output rank first globally, the adoption of potato combine harvesters faces significant challenges. A major contributing factor is the severe damage caused to potato skin by harvesters, which directly impacts subsequent transport, storage, and marketing, leading to significant economic losses [1]. Therefore, early detection of potato damage during harvesting is crucial for implementing timely adjustments to minimize losses.
Traditional machine learning was initially applied to agricultural inspection. Early studies used feature extraction algorithms based on color, texture, and geometric shapes to extract crop information, which was then fed into classifiers for further processing [2]. However, these methods required significant adaptation when applied to different detection problems [3]. Additionally, the variations in environmental factors such as illumination during harvesting also imposed significant limitations on the application of machine learning [4]. To solve these problems, target detection methods based on deep learning have been widely used. Compared with traditional machine learning, deep learning can automatically learn features from a large amount of data, achieve higher detection accuracy and faster detection speed [5]. In recent years, there have been a variety of deep learning-based algorithms for crop target detection. Among them, YOLO has been rapidly developing with its high efficiency and accuracy, and has become the first choice for many researchers and applications [6].
Bazame et al. [7] used the YOLOv3-tiny network to detect coffee ripeness during harvesting, achieving a mAP value of 84%. However, the detection accuracy for overripe coffee beans was only 80%. Nasirahmadi et al. [8] used YOLOv4 to assess mechanical damage in beets during harvesting, reaching a detection accuracy of 94%. However, the system’s frame rate was only 29 frames per second, which hindered its ability to meet the real-time detection requirements during harvesting. Wu et al. [9] applied YOLOv5 to identify and locate grapevine stems, achieving an accuracy of 90.2%, although the frame rate was limited to just 7.7 frames per second. In a comparative study. Li et al. [10]...