Mask R‐CNN‐based feature extraction and

Full text

Turn on search term navigation

INTRODUCTION

Rice is a staple food crop worldwide, and the stability of rice production strongly affects daily life as well as national food security (Yang et al., 2013). Rice yield measurements can be used to analyze the effects of different fertilizers (Chen et al., 2013; Pu et al., 2011) and planting and raising patterns on rice growth (Guilong & Zhang, 2018). Rice yield is usually characterized by the number of rice panicles. A rice panicle is composed of grains and branches, wherein the grain is made of chaff and rice. Rice yield traits include the number of grains per panicle, the number of plump grains per panicle, the panicle length, and the seed setting rate (the ratio of the number of plump grains to the number of grains per panicle). The number of grains per panicle and the seed setting rate directly reflect the rice yield (Gong et al., 2018) and are thus generally considered the two most important rice yield traits (Oosterom & Hammer, 2008).

The number of grains per panicle was originally counted manually. Using this process for a large number of samples is tedious, time‐consuming, inefficient, and prone to error. In addition, the structure of rice panicles may be damaged during sampling. Thus, accurate characterization is not guaranteed. Machine vision can be used to analyze rice panicle shapes (Duan, Yang, Bi, et al., 2011; Duan, et al., 2011). Automatic counting of rice panicles is more efficient and convenient than manual counting. The Panicle Trait Phenotyping tool (P‐TRAP) can be used to process and quantify several traits related to the rice panicle structure, detect and count grains, and determine the grain shape parameters (Al‐Tam et al., 2013). Samuel Crowell et al. (2014) developed an open source phenotyping platform that enables the simultaneous measurement of phenotypes with multiple structures and branches from an image. However, visual inspection requires that each grain on the panicle be manually spread to prevent panicle overlap. Moreover, vision imaging allows only surface imaging of grains and cannot be used to identify blighted grains, precluding the calculation of the seed setting rate.

The continuous development of nondestructive testing technology has resulted in X‐ray computed tomography (CT) imaging (Yang et al., 2020). And automated, non‐destructive phenotyping of rice tiller traits at a high spatial resolution and high‐throughput for large‐scale assessment of rice accessions is urgently needed (Di et al., 2019). Hu et al. proposed an image analysis method to extract 3D grain traits based on X‐ray CT (Weijuan et al., 2020). Su et al. used X‐ray CT to characterize the morphology of rice panicles. Yang used a CT system to directly scan potted rice plants in an industrial conveyor belt. A filtered back projection algorithm was used to reconstruct a cross‐sectional image of the rice stalk, and image segmentation was used to automatically extract the number of tillers (Yang et al., 2011). Subsequently, a CT reconstruction method with a reduced computational load was developed for rice tiller measurement (Jiang et al., 2012). Hughes et al. used micro‐CT to acquire wheat CT images, which were then morphologically processed to accurately extract and characterize traits (Hughes et al., 2017). Although current applications of X‐ray CT for characterizing rice panicle traits are mature, a quantitative analysis of the rice panicle seed setting rate requires the segmentation and three‐dimensional (3‐D) recognition of acquired rice panicle CT images.

Existing CT image segmentation methods (e.g., threshold‐, region‐ and edge‐based methods) (Jung & Kim, 2010; Jung et al., 2010; Plissiti et al., ,2010a, 2010b, 2011) are ineffective for rice grain CT images because of various problems, such as grain occlusion, nonseparation of grains from a panicle (by connecting branches), and similarities between the external features of plump and blighted grains. Thus, it is very difficult to distinguish plump grains from blighted grains in 3‐D, which is a prerequisite for determining the seed setting rate. However, the subtle shape difference between the two grain types can be identified using deep learning‐based segmentation and classification, which has already been extensively applied to CT image segmentation. Liu et al. proposed a region convolutional neural network (R‐CNN) to directly identify arterial calcification, thereby obviating artery extraction (Liu & Yao, 2017). Wu et al. proposed an accurate threshold‐based segmentation method in conjunction with Faster R‐CNN to detect defects in industrial CT images (Xiaoyuan et al., 2019). Armato et al. pioneered the use of deep learning to detect vertebral compression fractures in chest and abdominal CT scans (Bar et al., 2017). Tang et al. used neural networks to extract features from diagnostic CT images to analyze bone states (Tang et al., 2019). In plant phenotyping, Wu et al. used deep learning to develop a high‐throughput micro‐CT image analysis pipeline (Di et al., 2021).

In this study, we applied deep‐learning‐based semantic segmentation to CT images of rice panicles. The segmented and masked image sequence was stacked into 3‐D data. A 3‐D recognition algorithm was used to count plump and blighted grains separately. The results were used to calculate the rice panicle seed setting rate. This paper consists of three parts: (a) segmentation of rice grains in rice panicle CT by using the Mask R‐CNN algorithm to obtain masks with different values; (b) development of a 3‐D recognition algorithm for the segmented 3‐D data to count plump and blighted grains; and (c) experimental verification and analysis.

METHOD

3‐D CT imaging of rice panicles

The X‐ray CT system used in this study is shown in Figure 1, which is manufactured by YXLON. CT scanning and reconstruction were used to generate a rice panicle image sequence. Figure 2 shows images with various numbers of layers.

View Image - 1 FIGURE. System architecture diagram: the yellow box on the left shows the detector, the yellow box on the right shows the X‐ray source, and the red box in the center shows the rice panicle, which can be rotated by 360 degrees during image data acquisition

1 FIGURE. System architecture diagram: the yellow box on the left shows the detector, the yellow box on the right shows the X‐ray source, and the red box in the center shows the rice panicle, which can be rotated by 360 degrees during image data acquisition

2 FIGURE. Rice CT image sequence with different layers

Figure 2 shows grains connected by branches to the panicle, adhesion and overlap between grains and blighted grains. An accurate determination of the number of grains per panicle and the seed setting rate requires the extraction of individual rice grains, followed by 3‐D counting. However, in the CT image of rice, the grains cover each other substantially, and segmentation and recognition from a single CT image is performed first. Then, target identification between CT sequences is used to distinguish the different grains.

Data set construction

For grain identification based on deep learning, a certain amount of CT sequences data are necessary. Therefore, we used multiple different rice plants to obtain CT sequences with an X‐ray CT system (Figure 1) and marked plump and blighted grains in CT images (Figure 2). From Figure 2, there are two grain forms. One is high‐brightness grain, and the other is a low gray contour. So during the marking process, the plump grains are defined as the highlighted gray area, and the blighted grains are the area with the low gray contour. From the image sequences reconstructed by CT scanning projections, 300 unprocessed CT images with 10 strains of rice are selected as the data set, with an image size of 512 × 141 pixels. According to the gray characteristics of plump and blighted grains, every image was marked manually to establish classification and corresponding segmentation mask labels. For this data set, 80% is used as the training set, and 20% is used as the verification set.

The image annotation tool Labelme was used to annotate the CT image sequences to generate mask images of rice. As shown in Figure 3, these mask CT images are then used to calculate the reverse loss in the model training and optimization of the model parameters. The plump and blighted grain regions of the CT image are labeled, and the remaining region are defaulted to the background.

3 FIGURE. Masked CT images for deep learning, (left) CT image, (middle) masked CT image, (right) image labels

Grain segmentation with mask R‐CNN

Mask R‐CNN adds a mask branch to the Faster R‐CNN model (Ren et al., 2015) to create a mask for each region of interest. The mask branch is parallel to the regression branches of the existing classification and the bounding box, that is, a small fully convolutional network is applied to each region of interest. The model frame is shown in Figure 4. During network implementation, feature extraction in the proposed Mask R‐CNN is designed according to the characteristics of the rice CT images.

4 FIGURE. Schematic of the Mask R‐CNN model

The CT image of rice is the small training set. Therefore, model training is carried out on the basis of a pre‐training model based on the COCO data set (http://cocodataset.org). The COCO data set is a large data set used for target detection and image segmentation, including 328K images and 91 target categories. The pre‐training model extracts the general features of all categories in the COCO data set. Even if the training set is small, the parameters of the model can be adjusted to a better state according to the pre‐training model.

A typical rice CT image contains grains with different sizes and traits (and therefore features), as well as random and complex grain overlapping. The features extracted from both the shallow and deep layers are combined using a feature pyramid network (FPN), wherein feature mapping is employed to improve recognition. Additionally, ResNet‐50 is used as the backbone network in conjunction with the FPN network for feature extraction.

For the loss function, the training loss of Mask R‐CNN is mainly composed of two parts: the training loss of the region proposal network (RPN) and the training loss in the multibranch prediction network. The calculation formula of training total loss L_final is as follows:[Image Omitted. See PDF]where L_RPN includes anchor classification loss (softmax loss) and rectangular box regression loss (smooth L₁ loss). L_RPN is calculated as follows:[Image Omitted. See PDF]where L_Mul‐Branch is the sum of the three branch losses (softmax loss, smoothing L₁ loss and mask loss) in the multitask prediction network:[Image Omitted. See PDF]

In Equations (2) and (3), the constant N_* represents the number of corresponding anchor points or rectangular boxes. Hyperparameters λ^* and γ^* are the balanced rectangular box regression loss and mask loss, respectively. The classification loss L_cls, regression loss L_reg, and mask loss L_mask are derived from the following formulas:[Image Omitted. See PDF][Image Omitted. See PDF][Image Omitted. See PDF]where p_i represents the classification probability of anchor i, and p_i^* represents the ground‐truth label probability of anchor i. The variable t_i represents the difference between the predicted rectangle box and ground‐truth label box by using four parameter vectors: the horizontal and vertical coordinates of the points in the rectangle and the width and height of the rectangle. In addition, t_i^* represents the difference between the ground‐truth label box and the positive anchor, and $s_{i}^{*}$ and s_i represent the mask binary matrices from the prediction and ground‐truth labels, respectively.

3‐D identification of rice grains between CT sequence layers

By Mask R‐CNN, all the rice grains are segmented layer by layer in CT sequences. Then the mask correlation is used to realize 3‐D grain recognition. However, because of the stack of rice grains, there is adhesion in the mask area (Figure 5), which affects 3‐D grain identification.

5 FIGURE. Segmentation and mask results in presence of grain adhesion

The 3‐D structural information of an intact grain exists in multiple CT sequences. Therefore, we propose a 3‐D grain recognition method based on the minimum of Euclidean distance between adjacent CT sequences. In the segmentation mask sequence, each mask area is labeled starting from the first layer, and the center coordinates are calculated. For each mask area in the next layer, the minimum Euclidean distance is used to determine whether the masks of the two adjacent layers are from the same grain. The center coordinates of the i^th mask area of the k^th layer are denoted by (x_ik,y_ik), and the center coordinates of the j^th mask area of the (k + 1)^th layer are denoted by (x_j₍_k+_1), y_j₍_k+₁₎). The minimum of Euclidean distance model is written as follows:[Image Omitted. See PDF]

A minimum in the result of Equation (7) corresponds to layers from the same grain, which is used to identify if it is the same grain. As every layer is calculated incrementally, every complete grain is labeled. Of course, it is impossible for a single grain to occupy all the CT sequences. In CT sequences, for every grain, the labeled pixel is different. A complete grain is determined by the change of the number of pixels between the labeled layers for the same grain. Based on Equation (7), the pixel number of the mask area between the adjacent layers is counted. According to the segmentation and mask sequences, for one grain, the range of the pixel number is about 5–300 at every layer. Take the minimum pixel number as a reference, if the pixel number of the current layer determined by Equation (7) is reduced to 10% of the previous layer, the labeled grain is complete. Then, a new labeled grain in the next layer is then started until the entire CT sequence is labeled. The process is detailed in Figure 6.

6 FIGURE. The process of 3‐D grain recognition

The Mask R‐CNN classification results can label plump and blighted grains. And the 3‐D recognition algorithm is then used to count every grain. So, the number of plump and blighted grains per panicle can be counted. From this, the seed setting rate can be calculated, which is the proportion of plump grains to the total number of grains.

EXPERIMENTS

CT imaging experiment parameters

The rice panicles are composed of grains and panicle branches, wherein the grains are composed of chaff and rice. To make it easier to fix, the rice panicles are enclosed in plastic tubes. In CT imaging, the high‐resolution microfocus CT scanner (YXLON FF20) was used to obtain the CT images of rice panicles, which is shown in Figure 1. The respective parameters are shown in Table 1.

1 TABLERice panicle CT imaging scan parameters

Projection size	1,536 × 1,920 (pixels)
CT image size	512 × 512 × 512(pixels)
Physical distance between rice and X‐ray source	567 mm
Physical distance between X‐ray source and detector	781 mm
Source voltage	80 kV
Current	50 μA
Field of view	19.5 cm × 24.4 cm
Detector resolution	127 μm
Spatial resolution of CT system	10 μm
No. of projection acquisitions	360
Projection acquisition interval	1°
Scanning time for each panicle	126 s
Reconstruction time for each panicle	17s

According to Table 1, for every rice panicle, 3‐D CT image (512 × 512 × 512 pixels) can be obtained. But the network of grain segmentation takes 2‐D CT sequences as input. The CT sequences of the longitudinal section of rice are obtained by longitudinal segmentation of the 3‐D CT image (Figure 7).

7 FIGURE. CT sequences of the longitudinal section of rice

These CT sequences are used as training data. Here, training data are obtained using 10 strains of rice by CT imaging system. Remove the background data from CT sequences, A total of 300 CT sequences (512 × 141 pixels each) were used as training data. For these CT sequences, similar to Figure 3, according to plump grains’ highlighting gray and blighted grains’ low gray contour, every grain area is labeled as plump or blighted grain, and the corresponding area is masked by different color. So, every sequence is labeled by the Labelme program to establish the classification and the corresponding mask label for the training data.

Characterization of rice panicle traits

The Keras deep learning platform was used to input the training data and the corresponding labels to the Mask R‐CNN model. The network was trained at a learning rate of 0.001, with a weight decay of 0.0001 and a momentum of 0.9. The features were extracted using ResNet‐50 in conjunction with FPN. The scan was conducted directly by the RPN on the feature map of the previous layer in a sliding manner, such that the extracted features could be effectively reused. The position and size of the anchor were fine‐tuned to obtain the final recommended candidate frame, which was then mapped to the feature map using the RoIAlign program and sent to the three branches. The loss function was used to calculate the results and labels. The minibatch gradient descent algorithm was used to conduct the learning training with the target of minimal loss.

To complete the training, a set of rice panicle CT image sequences from one rice plant was input to the trained model as a test data set. The result is shown in Figure 8b, wherein the mask colors represent different numerical masks assigned to different instances. The box is the identification of each instance identified by a box, where the letter on the box label indicates the class of the instance (“full” denotes a plump grain, and “kblighted” denotes a blighted grain), and the number is the recognition accuracy, which is clearly high. Then, 3‐D recognition of the grain was performed by changing the output of the Mask R‐CNN model to the mask output (Figure 8c). The image background value was set to 0, and the rice grain masks were assigned integers starting from 1. Identical mask values corresponded to the same grain, whereas different grains had different mask values.

View Image - 8 FIGURE. Mask R‐CNN segmentation results: (a) original reconstructed rice panicle image; (b) segmentation identification result; and (c) mask image of segmentation example

8 FIGURE. Mask R‐CNN segmentation results: (a) original reconstructed rice panicle image; (b) segmentation identification result; and (c) mask image of segmentation example

The 3‐D recognition of blighted grains was implemented by setting the output recognition result to the blighted grain mask and performing a 3‐D visualization. The result is shown in Figure 9, wherein plump and blighted grains are presented in green and red, respectively.

View Image - 9 FIGURE. Results of 3‐D recognition of blighted grains: (a) original CT image; (b) recognition map; (c) mask output and (d) visualization of 3‐D recognition result

9 FIGURE. Results of 3‐D recognition of blighted grains: (a) original CT image; (b) recognition map; (c) mask output and (d) visualization of 3‐D recognition result

Verification experiment

The accuracy of the proposed method was then verified. The CT image sequences of eight rice panicles were selected, and each sequence was sent to the trained Mask R‐CNN model for testing. The method described in Section 2.3 was used to perform 3‐D visualization of the recognized plump and blighted grains. The results are shown in Figure 10, wherein plump and blighted grains are presented in green and red, respectively.

View Image - 10 FIGURE. Verification results for eight rice panicle samples obtained using method described in Section 4: (a–h images of rice panicle samples and (i–p 3‐D visualization results of blighted grain recognition

10 FIGURE. Verification results for eight rice panicle samples obtained using method described in Section 4: (a–h images of rice panicle samples and (i–p 3‐D visualization results of blighted grain recognition

From 3‐D visualization results in Figure 10, eight rice panicles have been completely reconstructed. In addition, according to segmentation and 3‐D identification, the plump and blighted grains can be distinguished well with different color. Then these results can be used to determine the number of grains per panicle and the seed setting rate. Table 2 is a comparison of the results of the proposed method and manually obtained results.

2 TABLECharacterization of rice panicle traits by the proposed method

Sample label	1	2	3	4	5	6	7	8
No. of grains per panicle determined manually	70	79	117	90	78	61	61	72
No. of grains per panicle determined using proposed algorithm	70	79	114	89	77	61	61	71
No. of blighted grains per panicle determined manually	3	10	44	42	5	1	1	8
No. of blighted grains per panicle determined using proposed algorithm	3	10	41	41	6	1	1	8
Seed setting rate determined manually, %	95.7	87.3	62.4	53.4	93.5	98.3	98.3	88.8
Seed setting rate determined using the proposed algorithm, %	95.7	87.3	64.0	53.9	92.2	98.3	98.3	88.7
Accuracy of the grain number per panicle, %	100	100	97.4	98.9	98.7	100	100	98.6
Accuracy of the seed setting rate, %	100	100	97.4	99.1	98.6	100	100	99.9

From Table 2, the average accuracy is 98.6% for the grain number per panicle and 99.9% for the seed setting rate. This indicates high accuracy and stability for the proposed method. The deviations for a few cases can be attributed to identification errors for some small blighted grains during testing of the Mask R‐CNN model. Unlike partially blighted grains, empty blighted grains may be recognized as panicle branches and are therefore difficult to identify. This issue will be addressed in a future study by considering a higher probability of blighted grains in the training data, thereby improving the training and accuracy of model testing.

CONCLUSION

In this study, the rice panicle seed setting rate was determined by developing a feature extraction method based on Mask R‐CNN and a 3‐D recognition method. Microfocus X‐ray CT scanning was used to obtain CT sequences of rice panicles. Then, 3‐D feature extraction was performed by using Mask R‐CNN to segment and generate masks for the sequence. The Euclidean distance of the mask areas of each layer was minimized. The Mask R‐CNN classification result was used to identify plump and blighted grains, and the seed setting rate of rice panicles was subsequently calculated. The proposed method was experimentally verified using eight sets of different rice panicles. The results showed that the proposed method can guarantee an accuracy of not less than 99% for the seed setting rate. This nondestructive and efficient method provides reliable data for the morphological characterization of rice panicles. In future collaborations with rice research institutions, we will apply the proposed method to CT data acquired for growing or mature pot‐planted rice plants to analyze rice panicle shapes.

Word count: 3584

Show less

© 2021. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

The rice panicle seed setting rate is extremely important for calculating rice yield and performing genetic analysis. Unlike machine vision, X‐ray computed tomography (CT) imaging is a nondestructive technique that provides direct information on the internal and external structure of rice panicles. However, occlusion and adhesion of panicles and grains in a CT image sequence make these objects difficult to identify, which in turn hinders accurate determination of the seed setting rate of rice panicles. Therefore, this paper proposes a method based on a mask region convolutional neural network (Mask R‐CNN) for feature extraction and three‐dimensional (3‐D) recognition of CT images of rice panicles. X‐ray CT feature characterization was combined with the Mask R‐CNN algorithm to perform feature extraction and classification of a panicle and grains in each layer of the CT sequence. The Euclidean distance between adjacent layers was minimized to extract the features of a 3‐D panicle and grains. The results were used to calculate the rice panicle seed setting rate. The proposed method was experimentally verified using eight sets of different rice panicles. The results showed that the proposed method can efficiently identify and count plump grains and blighted grains to achieve an accuracy above 99% for the seed setting rate.

Details

Title

Mask R‐CNN‐based feature extraction and three‐dimensional recognition of rice panicle CT images

Author

Kong, Huihua¹; Chen, Ping¹

¹ North University of China, Taiyuan, China

Section

ORIGINAL RESEARCH

Publication year

2021

Publication date

May 2021

Publisher

John Wiley & Sons, Inc.

e-ISSN

24754455

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/pld3.323

ProQuest document ID

2533110454

Mask R‐CNN‐based feature extraction and three‐dimensional recognition of rice panicle CT images

Jump to:

Full text

Abstract

Details

Suggested sources