This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
With the development of technology, many intensive tasks worked by the human in orchards can be replaced by using agricultural robots [1–8]. However, developing such robots in a real orchard, safely and reliably, is still a challenge. One major challenge for an autonomous agricultural robot in an orchard is row following. Recently, vision sensors have been widely used in agricultural robot navigation since their low cost, high efficiency, and capability to provide huge information [9–17]. In our previous study, a row-following system based on traditional machine vision for an apple orchard was designed, of which navigation was divided into multiple subtasks, such as image binarization, boundaries detection, guidance path generation, coordinate transformation, and low-level motor control [18]. Each subtask was processed independently, and their outputs were integrated as the final control decision. Based on our previous test, traditional development is easy to adjust, optimize, and troubleshoot each module. However, the system’s complexity increased at the same time when more and more modules were added to improve the navigation. In addition, it was found that once the environment changed, such as the light intensity and occurrence of deep shadows, the traditional navigation system had to be readjusted to pick out the important features consistently. By contrast, deep learning has the potential to learn for performing many of the complex perception tasks of mobile robot navigation [19, 20]. This study attempted to develop a deep learning network that was directly mapped from pixels to actuation based on the end-to-end learning scheme. As Figure 1 shows, traditional subtasks of the navigation were replaced by a specially designed deep network, which reduces manual programming and simplifies the system.
[figure omitted; refer to PDF]
The ALVINN (autonomous land vehicle in a neural network) study was the first time to prove that end-to-end learning was feasible for unmanned driving. However, limited by the computing power of the time, the ALVINN system used a single hidden layer backpropagation network and a small number of samples for learning, which prevented it from being applied in more complex environments [21]. Recently, with the development of deep learning-based hardware and theories, many breakthroughs have been made in autonomous driving based on end-to-end learning. NVIDIA developed a self-driving car system based on the DAVE-2 network. The network was trained via human driving; samples were captured simultaneously by using three different camera angles. Test results showed that, with less than a hundred hours of training, the car could execute autonomous driving on highways, local roads, and residential neighborhoods in sunny, cloudy, and rainy conditions [22, 23]. Vastly different from the NVIDIA self-driving system, Mobileye divided the self-driving system into three parts: perception, high-precision map, and driving decision; each part was designed on an independent network. Supervised learning techniques and direct optimization were implemented in the recurrent neural network to solve the long-term planning problem. Results showed that, by incorporating adversarial elements into the environment, robust policies could be learned by the designed self-drive system [24]. Large and diverse datasets of driving information, such as steering, braking, and speed, were also considered in some studies and implemented in the end-to-end learning component to achieve the optimal driving strategy. Such a self-drive system was developed by Comma.ai, which expected both speed and steering direction under different driving conditions that could be calculated intelligently [25].
Most of the current research on self-driving (autonomous) systems using deep learning mainly focuses on land-to-road navigation [26–33], while outdoor robot navigation studies remain relatively low. Muller et al. developed a visual-based obstacle avoidance system using deep networks for an outdoor robot. During training, the robot was executed to drive under different terrains, obstacles, and lighting conditions by remote control. Test results showed that the robot exhibited an excellent ability to detect obstacles and navigate around them at speeds of 2 m/s [34]. Hwu et al. applied the IBM NS1e board, which contained the IBM Neurosynaptic System (IBM TrueNorth chip), on the Robotics platform to speed up the CNN (convolutional neural network) training process. Results showed that the designed self-drive system based on CNN enabled the robot to drive along a mountain path with low power processing [35]. Orchard environments are much more complicated than land road environments. Orchard environments have uneven ground surfaces, diversification of trees, and so on, which makes developing an autonomous navigation system based on deep learning challenging. However, studies in this field have not adequately addressed the issue. Bell et al. developed a monocular vision-based row-following system for pergola structured orchards. A fully convolutional network was used to perform semantic segmentation of color images for an abstract class called “traversable space.” Test results indicated that the designed self-drive system executed row following better than the existing 3D lidar navigation system. However, the designed system still needs the traditional subtasks such as boundary detection and centerline fitting to generate the final steering decision [36].
The objective of this study is mainly focused on row following and the detection of row ends. A CNN network for the row-following system was developed, which consisted of five convolutional layers and one fully connected layer. Experiments were carried out and results were analyzed. The main novelties of this study are as follows:
(1) A tree row-following system that was directly mapped from pixels to actuation based on the end-to-end learning scheme was developed, which saves much hand programming and simplifies the system compared to traditional methods. In addition, the deep learning-based system has the potential to improve problems such as the fluctuation of light intensity and shadows in complex environments.
(2) A sample collection method for network training was also proposed, by which the robot could automatically drive and collect data without an operator or remote control. No hand labeling of training samples is required.
(3) Methods such as batch normalization, dropout, data augmentation, and 10-fold cross-validation were adopted to improve the network’s generalization ability and visualization analysis has been executed to clearly understand the useful features learned by the network.
The remainder of this paper is divided as follows. Section 2 contains an overview of the vision-based navigation system, the design of the training data collection method, and the CNN network architecture. Section 3 details the results of the simulation and row-following test and discussion. Finally, Section 4 states the conclusion of this study.
2. Materials and Methods
2.1. System Overview
A crawler-type robot platform was used in this research. To keep the navigation system simple and relatively low-cost, a monocular camera (Imaging Source DFK 21AU04) with a frame rate of 30 Hz and an image resolution of 640 × 480 pixels was used for image acquisition. An industrial computer was used to execute high-level algorithms and a microcontroller was used for low-level control operations. A dual antenna GNSS (Global Navigation Satellite System) and a Trimble BD982 receiver were applied for measuring driving trajectories. Figure 1 demonstrates the schematic diagram of the tree row-following platform.
2.2. Sample Collection
Given a series of N images
For navigation based on deep learning, collecting a large number of training samples is a great challenge because a mobile robot sometimes needs to be steered or controlled by an operator for several days or even weeks. To reduce the workload, a collection method was developed based on robot path tracking using GNSS in this study. During the initial stage, to focus more on the performance of the designed method and reduce losses if the robot runs over a barrier of trees, an artificial apple orchard environment was set up. A reference path was defined as the center of the tree rows. A path tracking controller was designed, and the details were presented in [37]. The sample collection process was carried out as follows: First, the robot executed a straight-line path tracking at a speed of 0.3 m/s. Image sequences were then saved and categorized as “move forward.” Second, the yaw angle of the camera was adjusted until only the left row could be seen in the camera field of view. Straight-line path tracking was executed again, and image sequences were categorized as “turn right.” “Turn left” commands were obtained using a similar process. Finally, images of “move forward,” “turn right,” and “turn left” actions, which showed the last row, were extracted and categorized as “stop.” Some samples are shown in Figure 2.
[figures omitted; refer to PDF]
2.3. Network Configuration and Training
2.3.1. Network Architecture
The designed CNN consists of five convolution layers and one fully connected layer. To speed up the training and real-time navigation, an input image is first resized to 48 × 36. The output of the network is expected steering commands. As shown in Figure 3, the network mainly consists of convolution layers, activation functions, pooling layers, a fully connected layer, and a SoftMax layer. Five convolutional layers are designed to perform feature extraction and each kernel of the convolutional layer corresponds to a feature map, which will be further compressed and extract key features using pooling layers. Finally, all the features are connected using the fully connected layer and results are output to the classifier for steering command prediction.
[figure omitted; refer to PDF]
As shown in Figure 10, the lateral errors were 0.14 m and heading errors were 0.8° on average for traditional navigation, while the lateral errors were 0.29 m and heading errors were 1.8° on average for end-to-end navigation. In general, traditional navigation had better performance. However, when the robot drove along trees with large gaps or when trees happened to swing due to wind, traditional navigation became unstable since tree rows could not be fully extracted or were overextracted due to the fluctuation of light intensity. By contrast, the end-to-end row-following system has greatly improved these issues since different light intensities had been simulated using image augmentation during the network training process. Both lateral and heading errors of navigation based on the end-to-end method kept decreasing after each run, which means that performance improved after each network’s retraining. Figure 12 demonstrates the poor performance of the end-to-end navigation system. Once a significant difference between the training sample and the real-time captured image was found, the accuracy of the prediction would decrease. In future work, this problem can be improved by appropriately adding more steering types such as a large turn or a series of small turns. Increasing the amount of training data would also likely increase the accuracy of prediction. Moreover, since the ideal steering control was a continuous control problem, using a regression model to describe the robot movement would be more accurate. Applying other methods such as regularization, early stopping, and network pretraining would improve the network’s generalization and should be tested in the future. Overall, even with a certain initial deviation, the robot could still return to the center of the row after a while.
[figures omitted; refer to PDF]
4. Conclusions
A visual tree row-following system based on end-to-end learning for an agricultural robot in an apple orchard environment was developed in this paper. The input image was directly mapped to steering commands by the designed CNN. A data collection method without human driving or remote control was also proposed. The CNN network consisted of five convolutional layers and one fully connected layer. To improve the network generalization ability, techniques such as batch normalization, dropout, data augmentation, and 10-fold cross-validation were adopted in the study. Two types of row-following tests were carried out. Test results showed that the robot could adjust its posture according to different situations and drive through the tree row.
A tree row-following system was carried out using a simple landscape with an obvious color contrast and shape structure as a preliminary test. With implementers installed on the mobile robot, this research could be expanded to different agricultural tasks such as planting, spraying, fertilizing, cultivating, harvesting, thinning, weeding, and inspection. In future work, more realistic elements of the orchard navigation will be added. (1) For example, leaves and weeds in a real apple orchard appear as noise in the captured images; this affects the accuracy of visual navigation. In this study, an input image was resized to ensure low resolution, after which noise was reduced while the main regions of tree rows were maintained. In a future study, additional complications will be introduced; e.g., samples under different weather conditions, different tree trunk sizes and colors, canopy shadows, and tree trunk with branches may be collected and added to the training process to enhance the generalization of the designed model. Moreover, image preprocessing such as noise reduction should also be considered and developed to increase the robustness of the navigation system. (2) Keeping the camera steady during data collection is a great challenge for visual navigation. To simulate this effect, samples were randomly shifted and rotated to simulate camera vibration during training. In future studies, camera position, such as yaw, pitch, and roll angles measured by the IMU (Inertial Measurement Unit), could be used to calibrate the captured images and reduce the camera vibration further. (3) A tracked mobile robot was adopted in this study since it has better maneuverability in rough terrain and higher friction in turns due to its tracks and multiple points of contact with the surface. Furthermore, the sliding effects can be incorporated into an extended kinematics model to make the robot adapt to different terrain conditions. (4) In a real orchard, some apple trees or even an entire tree row may be missed due to the planting plan. This situation should also be considered when executing the sample collection process in future research. (5) Adopting high-performance GPUs and using an embedded development platform such as NVIDIA Jetson TX2, which can realize image and video information processing efficiently, will also be considered. With further development, the system can be combined with longer-term strategies, such as headland turning planning and control.
Acknowledgments
This work was funded by the Science and Technology R & D Projects in Key Fields of Guangdong Province (2019B020223003), Guangdong Agricultural Technology Research and Development Project (2018LM2167), and Guangdong Province Modern Agricultural Industrial Technology System Innovation Team Project (Guangdong Agricultural Letter (2019) no. 1019). The authors are grateful to the student Mo Dongyan, Zhao Yunxia, and Lin Guchen for their support during the experiments.
[1] J. Li, Y. Tang, X. Zou, G. Lin, H. Wang, "Detection of fruit-bearing branches and localization of litchi clusters for vision-based harvesting robots," IEEE Access, vol. 8, pp. 117746-117758, DOI: 10.1109/access.2020.3005386, 2020.
[2] H. Kang, H. Zhou, X. Wang, C. Chen, "Real-time fruit recognition and grasping estimation for robotic apple harvesting," Sensors, vol. 20 no. 19,DOI: 10.3390/s20195670, 2020.
[3] W. Zhang, L. Gong, S. Chen, W. Wang, Z. Miao, C. Liu, "Autonomous identification and positioning of trucks during collaborative forage harvesting," Sensors, vol. 21 no. 4,DOI: 10.3390/s21041166, 2021.
[4] J. Chen, Q. Hu, J. Wu, "Navigation path extraction for greenhouse cucumber-picking robots using the prediction-point Hough transform," Computers and Electronics in Agriculture, vol. 180,DOI: 10.1016/j.compag.2020.105911, 2021.
[5] L. Gong, X. Du, K. Zhu, "Pixel level segmentation of early-stage in-bag rice root for its architecture analysis," Computers and Electronics in Agriculture, vol. 186,DOI: 10.1016/j.compag.2021.106197, 2021.
[6] Y. Majeed, M. Karkee, Q. Zhang, "Development and performance evaluation of a machine vision system and an integrated prototype for automated green shoot thinning in vineyards," Journal of Field Robotics, vol. 38,DOI: 10.1002/rob.22013, 2021.
[7] Z. Song, Z. Zhou, W. Wang, "Canopy segmentation and wire reconstruction for kiwifruit robotic harvesting," Computers and Electronics in Agriculture, vol. 181,DOI: 10.1016/j.compag.2020.105933, 2021.
[8] R. Urban, M. Štroner, I. Kuric, "The use of onboard UAV GNSS navigation data for area and volume calculation," Acta Montanistica Slovaca, vol. 25, pp. 361-374, DOI: 10.46544/ams.v25i3.9, 2020.
[9] C. Wang, Y. Tang, X. Zou, L. Luo, X. Chen, "Recognition and matching of clustered mature litchi fruits using binocular charge-coupled device (CCD) color cameras," Sensors, vol. 17 no. 11,DOI: 10.3390/s17112564, 2017.
[10] M. Chen, Y. Tang, X. Zou, K. Huang, L. Li, Y. He, "High-accuracy multi-camera reconstruction enhanced by adaptive point cloud correction algorithm," Optics and Lasers in Engineering, vol. 122, pp. 170-183, DOI: 10.1016/j.optlaseng.2019.06.011, 2019.
[11] G. Lin, Y. Tang, X. Zou, J. Cheng, J. Xiong, "Fruit detection in natural environment using partial shape matching and probabilistic hough transform," Precision Agriculture, vol. 21 no. 1, pp. 160-177, DOI: 10.1007/s11119-019-09662-w, 2020.
[12] H. Wang, L. Dong, H. Zhou, "YOLOv3-Litchi detection method of densely distributed litchi in large vision scenes," Mathematical Problems in Engineering, vol. 2021,DOI: 10.1155/2021/8883015, 2021.
[13] G. Lin, Y. Tang, X. Zou, "Three-dimensional reconstruction of guava fruits and branches using instance segmentation and geometry analysis," Computers and Electronics in Agriculture, vol. 184,DOI: 10.1016/j.compag.2021.106107, 2021.
[14] Z. Yang, L. Gong, C. Liu, "Efficient TCP calibration method for vision guided robots based on inherent constraints of target object," IEEE Access, vol. 9, pp. 8902-8911, DOI: 10.1109/access.2021.3049964, 2021.
[15] Y. Tang, M. Chen, C. Wang, L. Luo, J. Li, G. Lian, X. Zou, "Recognition and localization methods for vision-based fruit picking robots: a review," Frontiers of Plant Science, vol. 11,DOI: 10.3389/fpls.2020.00510, 2020.
[16] M. Chen, Y. Tang, X. Zou, "3D global mapping of large-scale unstructured orchard integrating eye-in-hand stereo vision and SLAM," Computers and Electronics in Agriculture, vol. 187,DOI: 10.1016/j.compag.2021.106237, 2021.
[17] M. Sága, V. Bulej, N. Čuboňova, I. Kuric, "Case study: performance analysis and development of robotized screwing application with integrated vision sensing system for automotive industry," International Journal of Advanced Robotic Systems, vol. 17,DOI: 10.1177/1729881420923997, 2020.
[18] P. Huang, Z. Zhang, X. Luo, "Monocular visual navigation based on scene. model of differential-drive robot in corridor-like orchard environments," International Agricultural Engineering Journal, vol. 28 no. 1, pp. 310-316, 2019.
[19] G. E. Hinton, S. Osindero, "A fast learning algorithm for deep belief nets," Neural Computation, vol. 18 no. 7, pp. 1527-1554, DOI: 10.1162/neco.2006.18.7.1527, 2006.
[20] Y. LeCun, Y. Bengio, G. Hinton, "Deep learning," Nature, vol. 521 no. 7553, pp. 436-444, DOI: 10.1038/nature14539, 2015.
[21] D. A. Pomerleau, "Alvinn: an autonomous land vehicle in a neural network," Advances in Neural Information Processing Systems, pp. 305-313, 1989.
[22] M. Bojarski, D. Del Testa, D. Dworakowski, "End to end learning for self-driving cars," 2016. https://arxiv.org/abs/1604.07316
[23] M. Bojarski, P. Yeres, A. Choromanska, K. Choromanski, "Explaining how a deep neural network trained with end-to-end learning steers a car," 2017. https://arxiv.org/abs/1704.07911
[24] S. Shalev-Shwartz, N. Ben-Zrihem, A. Cohen, "Long-term planning by short-term prediction," 2016. https://arxiv.org/abs/1602.01580
[25] E. Santana, G. Hotz, "Learning a driving simulator," 2016. https://arxiv.org/abs/1608.01230
[26] S. Grigorescu, B. Trasnea, T. Cocias, "A survey of deep learning techniques for autonomous driving," Journal of Field Robotics, vol. 37 no. 3, pp. 362-386, DOI: 10.1002/rob.21918, 2020.
[27] S. Chen, Y. Leng, S. Labi, "A deep learning algorithm for simulating autonomous driving considering prior knowledge and temporal information," Computer-Aided Civil and Infrastructure Engineering, vol. 35 no. 4, pp. 305-321, DOI: 10.1111/mice.12495, 2020.
[28] J. Hu, X. Zhang, S. Maybank, "Abnormal driving detection with normalized driving behavior data: a deep learning approach," IEEE Transactions on Vehicular Technology, vol. 69 no. 7, pp. 6943-6951, DOI: 10.1109/tvt.2020.2993247, 2020.
[29] M. Gjoreski, M. Ž Gams, M. Luštrek, "Machine learning and end-to-end deep learning for monitoring driver distractions from physiological and visual signals," IEEE Access, vol. 8, pp. 70590-70603, DOI: 10.1109/access.2020.2986810, 2020.
[30] J. Ni, Y. Chen, Y. Chen, "A survey on theories and applications for self-driving cars based on deep learning methods," Applied Sciences, vol. 10 no. 8,DOI: 10.3390/app10082749, 2020.
[31] D. Shin, H. Kim, K. Park, "Development of deep learning based human-centered threat assessment for application to automated driving vehicle," Applied Sciences, vol. 10 no. 1,DOI: 10.3390/app10062138, 2020.
[32] Z. Guo, Y. Huang, X. Hu, "A survey on deep learning based approaches for scene understanding in autonomous driving," Electronics, vol. 10 no. 4,DOI: 10.3390/electronics10040471, 2021.
[33] G. Li, Y. Yang, X. Qu, "A deep learning based image enhancement approach for autonomous driving at night," Knowledge-Based Systems, vol. 213,DOI: 10.1016/j.knosys.2020.106617, 2021.
[34] U. Muller, J. Ben, E. Cosatto, "Off-road obstacle avoidance through end-to-end learning," Advances in Neural Information Processing Systems 18, Proceedings of the 2005 Conference, pp. 739-746, .
[35] T. Hwu, J. Isbell, N. Oros, "A self-driving robot using deep convolutional neural networks on neuromorphic hardware," 2016. https://arxiv.org/abs/1611.01235
[36] J. Bell, B. A. MacDonald, H. S. Ahn, "Row following in pergola structured orchards by a monocular camera using a fully convolutional neural network," Proceedings of 2016 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 640-645, .
[37] P. Huang, L. Zhu, Z. Zhang, "Row end detection and headland turning control for an autonomous banana-picking robot," Machines, vol. 9 no. 5,DOI: 10.3390/machines9050103, 2021.
[38] R. Lingyan, Z. Yanning, Z. Qilin, "Convolutional neural network-based robot navigation using uncalibrated spherical images," Sensors, vol. 17 no. 6, 2017.
[39] S. Ioffe, C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," 2015. https://arxiv.org/abs/1502.03167
[40] N. Srivastava, G. Hinton, A. Krizhevsky, "Dropout: a simple way to prevent neural networks from overfitting," Journal of Machine Learning Research, vol. 15 no. 1, pp. 1929-1958, 2014.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2021 Peichen Huang et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
A row-following system based on end-to-end learning for an agricultural robot in an apple orchard was developed in this study. Instead of dividing the navigation into multiple traditional subtasks, the designed end-to-end learning method maps images from the camera directly to driving commands, which reduces the complexity of the navigation system. A sample collection method for network training was also proposed, by which the robot could automatically drive and collect data without an operator or remote control. No hand labeling of training samples is required. To improve the network generalization, methods such as batch normalization, dropout, data augmentation, and 10-fold cross-validation were adopted. In addition, internal representations of the network were analyzed, and row-following tests were carried out. Test results showed that the visual navigation system based on end-to-end learning could guide the robot by adjusting its posture according to different scenarios and successfully passing through the tree rows.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details




1 College of Automation, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China
2 College of Electro-mechanical Engineering, Zhongkai University of Agriculture and Engineering, Guangzhou 510225, China
3 Key Laboratory of Key Technology on Agricultural Machine and Equipment, Ministry of Education, South China Agricultural University, Guangzhou 510642, China