Banana (Musa L. spp.) is one of the most commonly cultivated crops worldwide and is the top agricultural commodity in many countries (Yeturu et al., 2016; Food and Agriculture Organization of the United Nations, 2017). However, banana production is severely affected by black leaf streak disease (BLSD, also known as black Sigatoka disease), which is caused by the fungal pathogen Pseudocercospora fijiensis. BLSD is considered the most widespread and damaging disease affecting bananas worldwide, causing plant necrosis in six symptomatic stages (Bakache et al., 2019).
BLSD is characterized by a biotrophic phase followed by a necrotrophic phase with visible symptoms. The disease affects the photosynthetic tissues of banana leaves and decreases chlorophyll production (Chaerle et al., 2007), resulting in changes to the structure of the leaves. The first symptoms of the disease are small dark spots on the underside of the leaf that develop to form fine brown lines 2–3 mm long, which are also visible on the adaxial surface of the infected leaves. As the disease progresses, the stripes join together and gradually turn black, showing the first signs of necrosis. The dead zones of the leaves then dry out, causing defoliation and the early maturation of the fruit. The presymptomatic biotrophic phase can last for several weeks, and by the time symptoms are visible the banana plants are irreversibly affected and the disease has already spread (Marin et al., 2003), potentially inducing production losses of up to 85% (Luna‐Moreno et al., 2019). In the initial stages of BLSD (i.e., presymptomatic infected leaves and those in stages 1 and 2; see Table 1), the physical changes in the plant are minimal, making the visual identification of leaf damage difficult. Furthermore, asexual and sexual spores develop from stage 2 of the disease onward. Conidial (asexual) spores are waterborne over short distances, whereas ascospores (sexual spores) can be carried over long distances and are responsible for the spread of the disease; therefore, the early detection of BLSD and the timely application of fungicides is crucial for controlling a P. fijiensis infestation. Early treatments reduce production costs and improve the health of crops while using shorter treatment times.
TableSeverity scale of black leaf streak disease (BLSD).Stage | Symptoms |
1 | Yellowish spots <1 mm in diameter on the abaxial leaf surface |
2 | Red or brown streaks from 1 to 5 mm |
3 | Similar to stage 2, but streaks are >5 mm |
4 | Brown elliptical streaks on the abaxial leaf surface, black streaks on the adaxial leaf surface |
5 | The streak is totally black and has spread to the abaxial leaf surface. The streak is surrounded by a yellow halo. |
6 | The center of the streak is light gray surrounded by a black ring and a yellow halo. |
Currently, the detection of BLSD relies on the visual observation of symptoms or the use of destructive analyses, such as those based on DNA or immunological assays (Luna‐Moreno et al., 2019). To the best of our knowledge, hyperspectral image–based non‐destructive methods have not yet been evaluated for the early detection of BLSD.
The optical properties of a leaf can be characterized by (1) the light transmission through the leaf, (2) the light absorbed by chemicals within the leaf (e.g., pigments, water, sugars, lignin, and amino acids), and (3) the light reflected by the leaf surface or internal structures. Light reflectance levels depend on complex biophysical and biochemical interactions within the leaves. Changes in the photosynthetic pigments are evident in the visible region (VIS, 400–750 nm wavelength), changes in leaf structure and the scattering process affect reflectance in the near‐infrared region (NIR, 750–1350 nm), and the water content influences the reflectance in the mid‐infrared region (MIR, 1350–2500 nm) (Hunt and Rock, 1989). Many reflectance‐related changes have been observed in diseased plants (Mahlein et al., 2012). In banana plants, P. fijiensis destroys photosynthetic tissue, which induces an increase in fluorescence and heat emission in the leaf blade and modifies the transport pattern of the photoassimilates, affecting the production of chlorophyll (Hidalgo et al., 2006). The resulting necrotic and chlorotic lesions cause variations of reflectance in the VIS and NIR regions of the spectrum.
Structural and chemical changes occurring in leaves during pathogenesis have enabled disease detection using hyperspectral image analyses (Siche et al., 2016). Previous works on sugar beet (Beta vulgaris L. subsp. vulgaris) (Mahlein, 2011), wheat (Triticum aestivum L.) (Ashourloo et al., 2014), tomato (Solanum lycopersicum L.), and lettuce (Lactuca sativa L.) (Lara et al., 2013) have provided some insights into the relationship between pathogen infections and spectral variations in leaves. In banana plants, hyperspectral image research has mostly focused on fruit measurements (Hu et al., 2015), quality control (Intaravanne et al., 2012), and the differentiation of black Sigatoka from yellow Sigatoka disease (Bendini et al., 2015); however, the use of hyperspectral images for early detection of BLSD has not been evaluated.
Data from hyperspectral leaf images usually present both spatial and spectral dimensions showing (1) high collinearity in the adjacent bands, (2) variability of hyperspectral signatures, and (3) high dimensionality due to the increased sensitivity and resolution of hyperspectral sensors (Lu and Fei, 2014). It is therefore necessary to apply multivariate data processing methods to be able to correlate hyperspectral fingerprints to plant infections.
Partial least squares (PLS) is a dimension reduction technique that is useful when the number of variables is greater than the number of observations. This tool maximizes the covariance between dependent and independent (predictors) variables, extracting from the predictors a set of orthogonal latent factors that are linear combinations of the original variables with the best predictive power (Abdi, 2010). When the response variable is categorical, PLS discriminant analysis (PLS‐DA) has been used by other researchers (Brereton and Lloyd, 2014). PLS‐DA is a linear classifier and its objective is to find a straight line that separates the regions. In this paper, we use an alternative method based on the algorithm proposed by Bastien et al. (2005), which takes into account the binary nature of the response using logistic regression rather than linear regression. This method is complemented with a hyperspectral biplot (HS biplot) that generalizes the proposal of Oyedele and Lubbe (2015), whose method was based on linear PLS components, whereas our study uses logistic PLS components. Additionally, penalized logistic regressions (PLR) have been introduced into the main algorithm to avoid the separation problems that occur when positive groups are presented separately to the negative groups, which prevents finding the maximum likelihood estimators (Albert and Anderson, 1984). This PLS–PLR model has not previously been used for the early detection of BLSD from hyperspectral images.
The objective of this study was to assess the use of hyperspectral imaging for the early detection of BLSD. The PLS–PLR technique was used to classify banana leaves, and was complemented by an HS biplot representation that facilitates the observation of the influence of the disease‐associated changes in the reflectance of artificial light irradiated on a banana leaf.
Banana plants (Musa acuminata, AAA Group, Cavendish subgroup, cultivar ‘Williams’) were obtained from commercial banana propagation facilities (Sociedad Ecuatoriana de Biotecnología C.A. [SEBIOCA], Guayaquil, Ecuador). A total of 100 plants that had been established for 3–4 months in a greenhouse were transported to our greenhouse located at the Centro de Investigaciones Biotecnológicas del Ecuador (Escuela Superior Politécnica del Litoral [ESPOL]) and grown at 28°C in 70% relative humidity and 12 hours of natural light, and were watered every 48 hours. Sixteen plants were randomly selected from this population, of which 10 were inoculated with P. fijiensis and six were mock‐inoculated for use as a control. The inoculation was carried out as reported by Gbongue et al. (2019). Briefly, isolates of P. fijiensis were inoculated onto potato dextrose agar and incubated for two weeks at 30°C. The mycelium was then ground in 10 mL of sterile water and filtered to separate the mycelium from the conidia. The conidial suspension was concentrated by centrifugation at 3000 × g for 10 min at 4°C. Banana leaves were spray‐inoculated with the concentrated conidia suspension using an aerograph atomizer (Gerensa, Guayaquil, Ecuador), and disease symptoms were monitored using the severity scale shown in Table 1, as suggested by Fouré (1986). The visual symptoms at each disease stage are shown in Fig. 1. The control plants were mock inoculated with autoclaved distilled water.
1 Figure. Stages of black leaf streak disease (BLSD): (A) non‐infected, (B) stage 1, (C) stage 2, (D) stage 3, (E) stage 4, and (F) stage 5. Pen marks highlight the affected areas.
The imaging system used in our experiments (Appendix S1) was exactly as described by Ochoa et al. (2016) and included a spectrometer (ImSpector V10E, Specim, Oulu, Finland) connected to a 12 bits per pixel camera (1500M‐GE; Thorlabs Inc., Newton, New Jersey, USA) with high sensitivity in the NIR region. The camera was mounted on a slider in a push‐broom configuration and controlled by a computer with a storage capacity of 1 TB, an Intel Core i5 3.1‐GHz processor, and 16 GB of RAM. The system operated in a spectral range between 386–1019 nm, with a spectral resolution of 4.55 nm and a spatial resolution of 1040 (rows) by 1392 (columns).
The system allowed the nondestructive scanning of individual leaves from intact plants. Each hyperspectral image was composed of three dimensions, including one spectral and two spatial dimensions, resulting in a hyperspectral cube. The spatial dimensions consisted of the position of each pixel on the image (M × N), whereas the spectral dimension was made of the wavelength of reflected light (J). The system generated a hyperspectral cube for each leaf, with a resolution of 205 rows (M), 198 columns (N), and 520 wavelengths (J). To obtain the width of a hyperspectral cube (N), we set a pixel binning‐x camera of 7, resulting in a reduction of 1392 pixels to 198 pixels. To calculate the number of wavelengths in a hyperspectral cube (J), we set a pixel binning‐y camera of 2, resulting in a reduction of 1040 pixels to 520 pixels. Finally, the height of a hyperspectral cube (M) was determined according to the acquisition frame rate of the camera to scan the entire leaf holder, resulting in 205 pixels.
The plants were placed in the imaging system and the selected leaves were tagged and scanned. Three leaves were selected from each of the six control plants, but two were damaged due to manipulation, resulting in 16 images of non‐infected leaves. Two leaves were selected from each of the 10 inoculated plants. The leaves were scanned every three days for three months. At every scan, the infection symptoms were visually assessed by an expert using the symptoms scale, as detailed in Table 1. During this period, the disease progression in the leaves was unequal. The level 1 symptoms appeared between seven and 31 days after the inoculation and increased irregularly, reaching higher severity levels in different time periods. In some leaves, the disease reached a severity level of 5. Due to manipulation, several leaves were damaged and were therefore discarded during the experiment.
From the images scanned and tagged by experts, we selected those that belonged to infected leaves at the presymptomatic stage and stages 1 and 2 of BLSD. The presymptomatic images (16) were obtained six days before the leaf presented symptoms at severity level 1. The following images were taken in intervals of six days, during the progression of severity level 1 (54) and severity level 2 (18).
The final data set consisted of 104 images (16 non‐infected, 16 presymptomatic, 54 severity level 1, and 18 severity level 2). The severity level reported corresponded to the highest disease stage found in the leaf.
Standard image calibration methods were applied. First, spectral calibration was applied using mercury (Hg), argon (Ar), helium (He), and hydrogen (H) light sources to estimate the wavelength corresponding to each charge‐coupled device (CCD) line. Second, radiometric calibration was applied to reduce the influence of light intensity variation and the noise of the CCD sensor. For this purpose, white and dark references were imaged before each scanning session and the raw spectral image was normalized. In addition, the spatial calibration of the slider was performed to avoid overlap between leaf regions when the images were acquired.
To retain only the leaf pixels of each hyperspectral cube, we generated a mask using the image at a wavelength of 700 nm, and setting the leaf holder and background pixels to zero. Next, each hyperspectral cube was normalized using the standard normal variate technique (Barnes et al., 1989). This step is necessary to compensate for reflectance variation caused by the relative orientation of the leaf surface and the sensor. Finally, the dimensionality reduction was achieved by calculating the average of the reflectance values measured at each wavelength of the hyperspectral cube, resulting in a matrix of I = 104 rows (one per image) and J = 520 columns (Appendix S2). A preliminary analysis of infected regions showed differences in the spectral patterns of the disease stages. Figure 2 shows the reflectance patterns for severity level 2 and non‐infected regions.
2 Figure. Normalized reflectance curves of labeled regions in non‐infected and infected banana (Musa acuminata) leaves. The curves were normalized using the standard normal variate technique. Infected leaves correspond to severity level 2.
The data were arranged into one vector and one matrix. The Y vector consisted of the binary (infected and non‐infected) response variable, whereas the X matrix (X1, … , Xp) consisted of a set of predictors that correspond to the reflectance intensity at each wavelength.
The general logistic PLS model and its associated biplot were included in the MultBiplotR package (Vicente‐Villardón, 2017). The package implements the algorithm that estimates PLS components and applies a logistic regression on the PLS components and the response vector Y. To prevent the separation problem reported in previous studies (Albert and Anderson, 1984; Santner and Duffy, 1986) and detected in our initial tests, the package applies a ridge penalty (Le Cessie and Van Houwelingen, 1992), which is calculated by the sum of the squares of the coefficients (L2 norm) multiplied by a penalty parameter λ. The parameter λ can have a value between 0 and 1, and can be adjusted by cross‐validation. For the purpose of finding the model that best describes the data, we tested values of λ in the [0.1–0.9] range using incremental steps of 0.1 and calculated, for each value of λ, the following goodness‐of‐fit measures: difference of deviance (DiffDeviance), Cox and Snell’s R2, Nagelkerke’s R2, and McFadden’s R2 (Allison, 2014; Walker and Smith, 2016).
The biplot that represents the labeled leaves, the wavelengths, and the boundary of the prediction regions (i.e., the HS biplot) allows for the visual inspection of the PLS–PLR model. The wavelengths were represented by lines colored according to the spectral band to which they belong. The statistical procedures applied in PLS–PLR and HS biplot are detailed in Appendix 1.
The PLS–PLR model yielded estimated values between 0 and 1. Leaves with model values above or below 0.5 were considered to be predicted‐infected or predicted‐healthy, respectively. The goodness‐of‐fit of the logistic regression was calculated using the pseudo R2 measures described above.
First, the penalty parameter of the PLS–PLR model was selected from different λ coefficients. The goodness‐of‐fit values obtained after each interaction are shown in Table 2.
TableGoodness‐of‐fit measures for the PLS–PLR model using incremental penalty values (λ).λ | DiffDeviancea | Cox and Snell’s R2b | Nagelkerke’s R2b | McFadden’s R2b |
0.1 | 88.488 | 0.573 | 0.994 | 0.991 |
0.2 | 88.005 | 0.571 | 0.991 | 0.986 |
0.3 | 87.668 | 0.570 | 0.988 | 0.982 |
0.4 | 87.405 | 0.568 | 0.986 | 0.979 |
0.5 | 87.187 | 0.568 | 0.985 | 0.976 |
0.6 | 86.999 | 0.567 | 0.984 | 0.974 |
0.7 | 86.832 | 0.566 | 0.982 | 0.972 |
0.8 | 86.682 | 0.565 | 0.981 | 0.971 |
0.9 | 86.543 | 0.565 | 0.980 | 0.969 |
aDifference of deviance (Hosmer et al., 1997).
bPseudo R2 indices for binary logistic regression models (Allison, 2014; Walker and Smith, 2016).
We obtained the best goodness‐of‐fit measures (highest DiffDeviance value of 88.488) using a λ value of 0.1, showing that the fitted model maintained the greatest variance. Furthermore, the P value of 6097 × 10−20 shows a significant association between the latent variables and the response variable. The pseudo R2 measures were also estimated; McFadden’s R2 (0.991) indicated a high explicative power of the model, while the Cox and Snell’s R2 (0.573) and Nagelkerke’s R2 (0.994) also indicated a high goodness‐of‐fit.
The penalty λ = 0.1 was applied in each iteration for calculating the coefficients, providing a stable value that maximized the likelihood function while controlling the error. Figure 3 shows the logistic response surface fitted on the space spanned by the first two PLS components. The model is:[Image Omitted. See PDF]where Py is the infected leaf probability, t1 is the first PLS–PLR component, and t2 is the second PLS–PLR component.
3 Figure. Response surface for the PLS–PLR model in BLSD detection. The red line corresponds to the probability equal to 0.5. The control leaves are shown as green points and infected leaves are shown as blue points.
The HS biplot of the training data set is shown in Fig. 4. The first two PLS components contributed 77% of the observed variability.
4 Figure. HS biplot of the training data set. Banana leaves are represented by points. Each wavelength is represented by straight lines colored according to the colors of the electromagnetic spectrum. The diagonal dotted line separates the healthy and infected leaves. The blue ellipse encloses healthy leaves and the red ellipses contain infected leaves. PLS component 1 explains 50.99% of variance, while PLS component 2 explains 26.08% of the variance. The numbered points correspond to banana leaves with a low‐severity infection (Table 3).
The HS biplot indicates three main groupings: a group of non‐infected leaves (blue ellipse) and two other groups of infected leaves (red ellipses). The clustering of non‐infected and infected plants was mostly observed in component 1. The wavelengths that contributed the most to the grouping of the non‐infected samples ranged from 577 to 651 nm (yellow to red range). The first of the two groups of infected leaves was located near the non‐infected group (numbered points inside the red ellipse in Fig. 4) in the HS biplot, mainly influenced by a low density of disease symptoms in the leaves (Table 3). The second group of infected leaves was composed of both presymptomatic and symptomatic leaves. The wavelengths that contributed the most to the grouping of the presymptomatic leaves (turquoise), as well as several leaves at severity levels 1 (blue) and 2 (red), were in the NIR range of the spectrum. However, the grouping of the other leaves at symptomatic levels 1 and 2 was determined by their reflectance in the 577 to 651 nm (yellow to red) range. This was also observed in the external validation data set (Fig. 5). The HS biplot of the test data set showed that the healthy (blue ellipse) and infected (red ellipse) leaves had all been correctly classified, with the exception of two samples, one healthy and one infected.
TableHS biplot description of banana leaves with a low‐severity infection.Leaf sample no.a | Severityb | Observationsc |
14 | 1 | Presents two pixels (severity 1) |
15 | 0 | Without symptoms |
16 | 1 | Presents seven pixels (severity 1) |
17 | 0 | Without symptoms |
18 | 1 | No visible infected pixels |
19 | 1 | Presents 10 pixels (severity 1) |
20 | 1 | Presents six pixels (severity 1) |
21 | 1 | Presents 19 pixels (severity 1) |
aNumbers correspond to the numbers displayed on Fig. 4. On the HS biplot, these leaves were plotted near the healthy leaves due to their less severe symptoms.
bSeverity levels correspond to the scale presented in Table 1.
cNumber of pixels in the infected area.
5 Figure. HS biplot of the external validation data set. The diagonal dotted line separates predicted healthy (blue ellipse) and infected leaves (red ellipse). Each wavelength is represented by straight lines colored according to the electromagnetic spectrum colors. PLS component 1 explains 50.68% of variance, while PLS component 2 explains 26.21% of the variance.
The model was validated using the leave‐one‐out cross‐validation (LOOCV) method, as shown in Appendix S3. During the cross‐validation process, 102 leaves were properly classified as non‐infected or infected, which represented an overall classification accuracy of 98% (Table 4). Only two non‐infected samples were classified as being infected, with a probability of 0.738 and 0.816, respectively. All infected leaves at the different disease stages were correctly classified. The positive predictive value was 98%, while the sensitivity or recall value was 100% (Table 4), indicating that all infected leaves were correctly identified. The estimated global HS biplot goodness‐of‐fit (the squared correlation coefficient between the adjusted and observed values) was 77.07%.
TableConfusion matrix for the classification accuracy assessment of the PLS–PLR model.
True infected leaf |
False non‐ infected leaf |
Prediction metrics | |
Test result | True positive | False positive | Positive predictive value |
88 | 2 | 0.98 | |
False negative | True negative | Negative predictive value | |
0 | 14 | 1 | |
Prediction metrics | Sensitivity | Specificity | Accuracy |
1 | 0.88 | 0.98 |
The PLS–PLR model fitted to the initial training data set was used to predict the presence of the disease in new leaves. A new data set with images of 16 non‐infected and 21 infected leaves was used to evaluate the efficacy of the model. The prediction accuracy of the new samples was 95% (35 successful identifications and two errors). Appendix S4 indicates the predicted probability for the external validation test.
Agricultural crops are constantly threatened by pests and plant diseases that reduce production and quality. New agricultural strategies and techniques to identify plant diseases are based on non‐destructive methods that detect the disease in early stages, allowing timely management to prevent the spread of the disease and minimize the effect of fungicides in the environment. Remote sensing allows early detection of plant diseases using methods based on reflectance in the VIS and NIR regions. Disease detection involves steps such as image acquisition, image pre‐processing, image segmentation, feature extraction, and classification.
In this study, hyperspectral images were used to classify BLSD‐infected and healthy banana plants based on the marked differences in the VIS and NIR spectra of both groups. Our results are consistent with previous reports showing that disease symptoms in plants can change spectral reflectance in the VIS (400–700 nm) and NIR (700–1100 nm) ranges (Ayala‐Silva and Beyl, 2005). General changes in reflectance occurring during plant–pathogen interactions have been associated with impairments in the leaf structure and chemical composition of the tissue during pathogenesis, which can be observed by the succession from chlorotic to necrotic tissue (Mahlein, 2016).
PLS was originally developed for continuous response variables. In the case of binary responses, a linear regression cannot guarantee that response‐fitted values fall at 0 or 1. In this study, we used logistic rather than linear regressions to correlate the response variable with the PLS components. Previous studies on plant disease detection have reported relatively high accuracies and sensitivities similar to PLS models, but the model interpretability is low, and most studies did not include leaves in presymptomatic stages (Rumpf et al., 2010; Mahlein, 2011; Zhu et al., 2017). In our study, the application of PLS–PLR showed a prediction accuracy of 98% in the presymptomatic and early stages of BLSD. The goodness‐of‐fit measures obtained for the PLS–PLR model were similar to those reported in the literature (Huang et al., 2007; Bock et al., 2010; Cevallos‐Cevallos et al., 2018).
As indicated by the HS biplot, which shows leaves as points and wavelengths as lines, the non‐infected leaves were characterized by the prominent presence of wavelengths in the VIS spectrum, whereas infected leaves were associated with wavelengths in both the VIS and NIR ranges, depending on the stage of the disease or the proportion of symptomatic to healthy tissue. These results are in agreement with previous studies, as disease symptoms in plants have been reported to increase spectral reflectance in both the VIS (400–700 nm) and NIR (700–1100 nm) ranges (Ayala‐Silva and Beyl, 2005). General changes in reflectance occurring during plant–pathogen interactions have been associated with impairments in the leaf structure and changes in the chemical composition of the tissue during pathogenesis, which can be observed by the succession of chlorotic and necrotic tissue (Mahlein, 2016).
The clustering of non‐infected and infected plants was mostly observed within PLS component 1. The wavelengths that contributed the most to PLS component 1 ranged from 577 to 651 nm (yellow to red). Changes in the yellow range of the spectrum suggest that detection of plant chlorosis occurs in the initial stages of BLSD. Chlorosis is caused by insufficient chlorophyll accumulated in the plants, leading to the yellowing of the leaf, which is usually measured with yellowness indexes (Adams et al., 1999). Similarly, changes in the orange–red range of the spectrum suggest the succession of chlorotic to necrotic tissue, as observed in the red or brown streaks that appear in stage 2 of BLSD (Fouré, 1986). Interestingly, presymptomatic plants showing neither chlorosis nor necrosis were clustered apart from the non‐infected ones, suggesting changes in the leaf surface of these infected plants despite the lack of visible symptoms. Biotrophic presymptomatic stages do not usually cause observable changes in leaves, but some fungal pathogens can produce structures on the leaf surface that can influence the optical properties of the plant (Mahlein, 2016). Non‐infected or infected banana plants were scattered in two groups in the HS biplot, which is in agreement with previous reports showing two groups of non‐infected and dwarf banana plants due to the natural biological diversity observed in agricultural conditions (Cevallos‐Cevallos et al., 2018). These results confirm the high prediction capacity of the PLS–PLR model and the efficiency of the HS biplot to represent the relationships between the variables and the groups of individuals with non‐infected and infected leaves.
Hyperspectral imaging provides a non‐destructive method for analyzing plants infected with BLSD and possibly other pathogens. The development of technologies offering larger data storage capacities, faster computers, more sensitive detectors, and different analytical techniques for hyperspectral images, combined with suitable statistical techniques, makes it possible to detect plant diseases even at early stages, and enables the capture and modeling of physiological changes of leaves using close‐range hyperspectral data. The early detection of infectious diseases plays a crucial role in both treatment and prevention strategies.
PLS–PLR and HS biplot visual representation are promising techniques for the analysis of hyperspectral data, even after considering the high reduction of the dimensionality after preprocessing of the raw data. The PLS–PLR model provides excellent predictive power, which is complemented by the high level of visual interpretation offered by the HS biplot.
The system presented here combines hyperspectral technology with advanced data analysis and statistical methods to accurately predict plant disease by measuring the reflectance differences resulting from the biophysical and biochemical characteristic changes following infection. It is currently configured as a stationary imaging system and could be used in a laboratory for the recognition of foliar diseases in other plants and for food quality evaluations. In the future, however, the system could be transformed, with few changes, into an airborne imaging system for scanning images of crop fields.
Future research should concentrate on improving the detection of different severity levels of the disease and on the more detailed analysis of the wavelengths that have greater influence, taking into account other factors that can cause spectral changes.
This research was supported VLIR ‐ UOS grant “VLIR Network Ecuador."
D.O.D. and J.C.C. formulated the research problem. D.O.D., J.C.C., J.L.V.V., and J.U.F. designed the approaches. D.O.D., R.C.B., and O.B.A. collected the data. J.L.V.V., D.O.D., J.C.C., M.M.Z., R.C.B., O.B.A., and J.U.F. developed the processing workflow. J.L.V.V. and J.U.F. performed the data analysis. All authors contributed to the writing and development of the manuscript. All authors read and approved the final manuscript.
The two data sets used in this study (i.e., Training data set [data_banana.txt] and Validation data set [data_banana_test.txt]) are available at:
Let be the binary response variable and a set of predictors. For a sample of size n, the data can be organized in a response vector and a matrix of predictors , where is either 0 or 1 for the presence or absence of the main characteristic and is the value of the ith individual on the jth predictor. The columns of are supposed to be centered and possibly standardized.
We are searching for components that are linear combinations of the predictors and that best explain the response (in a logistic regression manner). Let th the vector containing the scores of each individual on one of those combines components, then with being the vector of coefficients. Normally we will use m of those components that are mutually orthogonal.
The logistic PLS regression model is written as[Image Omitted. See PDF]or[Image Omitted. See PDF]
Or in matrix form , where is the vector of fitted probabilities of the presence of the disease in each individual, and are the coefficients of the regression on the components. The model is a standard logistic regression on the PLS components. The constant c0 must be kept because the binary variable cannot be centered. In terms of the original variables,[Image Omitted. See PDF]where and are the coefficients on the observed variables. The problem now is to estimate T, W, c, c0, and b.
Here, we use the algorithm developed by Bastien et al. (2005), with slight modifications.
- Calculation of t1, the first PLS component.
- For each predictor (, compute the regression coefficient of , in the logistic regression of y on , to obtain
- Normalize the vector .
- Compute the component scores
- Calculation of , the hth PLS component. The components have been already obtained.
- For each predictor (, compute the regression coefficient of , in the logistic regression of on and , to obtain .
- Normalize the vector .
- Compute the residual matrix of the linear regression of on .
- Compute the component scores .
- X is factorized as .
- Logistic regression of y on the retained PLS components [Image Omitted. See PDF]
- Expression of the model in terms of the original predictors .
- Although the original algorithm is not clear, we think that all the logistic models in steps 1a, 2a, and 3 must include a constant.
- When the model is good, i.e., when it is able to discriminate almost perfectly among presences and absences, the maximum likelihood method for logistic regression does not converge and the estimators tend to infinity. This is known as the separation problem (Albert and Anderson, 1984; Santner and Duffy, 1986), and is easily solved using a penalty. Here, we use the ridge (Le Cessie and Van Houwelingen, 1992) because of its simplicity. Additionally, when there are many variables that influence the response, as is the case of analysis of this work, this penalty (ridge) offers better results, whereas the lasso penalty (Fu, 1998) is more efficient when fewer variables influence the result. The ridge penalty is equal to the sum of the squares of the coefficients (L2 norm) by the penalty parameter λ. The parameter λ may have a value between 0 and 1 and can be adjusted by cross‐validation.
PLS–PLR technique was selected and implemented to resolve the bias, overfitting, data separation, and multicollinearity problems. Dimensionality reduction by PLS eliminates perfect or quasi‐perfect correlation between predictors. Furthermore, it makes use of the output variable, which reduces the bias, and therefore avoids underfitting. A ridge penalty in logistic regression limits the growth of the regression coefficients, which reduces variance, avoids overfitting, and controls the effects of data separation.
The goodness‐of‐fit measures used are described below.
The deviance difference (DiffDeviance) is interpreted as a measure of the variation of the data explained by the model with predictors and the model without predictors (only the constant). This statistic has a chi‐square distribution with degrees of freedom equal to the difference in the numbers of the model parameters. Thus, the null hypothesis will be rejected for the significance level α when DiffDeviance > χ2, which is equivalent to the P value of the contrast being less than the fixed level of α (Hosmer et al., 1997).[Image Omitted. See PDF]where is the log likelihood of the model and is the log likelihood of the null model.
Pseudo R2 tells us how well the model can explain/predict the dependent variable based on the independent variables. Several different values were calculated as detailed below:
- McFadden’s R2 is defined as one minus the ratio between the logarithms of the likelihood for the model with respect to the log likelihood for the intercept only model (null model), with its theoretical range of values being 0 ≤ McFadden’s R2 ≤ 1. It is usually considered a good quality of fit when 0.2 ≤ McFadden’s R2 ≤ 0.4 and higher values show an excellent fit. [Image Omitted. See PDF]where LLM is the log likelihood of the model and LL0 is the log likelihood of the null model.
- Cox and Snell’s R2 is a goodness‐of‐fit measure that generalizes the R2 of the linear regression. It is based on the comparison of the likelihood of the model (LM) with the likelihood of the null model (L0). Its range of values is between 0 and (1− L0)2/n. [Image Omitted. See PDF]where LM is the likelihood of the model and L0 is the likelihood of the null model.
- Nagelkerke’s R2 is the value of Cox and Snell’s R2 that is standardized based on the maximum value it could take. The maximum value of this pseudo R2 is therefore 1 (Allison, 2014; Walker and Smith, 2016). [Image Omitted. See PDF]where LM is the likelihood of the model and L0 is the likelihood of the null model.
Technically a biplot is a decomposition of a matrix X in the product of two low‐rank (usually two or three) matrices and an error matrix.[Image Omitted. See PDF]in such a way that rows and columns can be jointly represented on a scatter diagram using T and P as markers, respectively. In this case, we use the factorization obtained from the logistic PLS regression as a biplot representation. The biplot shows the directions of the space spanned by the columns of X that better separate the presences and absences for the dependent variable.
So, we have a low‐rank approximation of the matrix X that captures the part that better explains the response. The percent of variability captured by the approximation is[Image Omitted. See PDF]
It is possible to identify the variables related to the PLS components calculating the amount of variance of each column captured by the approximation as follows:[Image Omitted. See PDF]where and are the jth columns of the fitted and original matrix, respectively. Only the columns with high percentages are related to the response. These quantities are called contributions of the components to the variables or predictiveness.
The row scores are also used to predict the binary response in step 4 of the algorithm, then the binary variable can also be projected on the biplot using an external logistic biplot, proposed by Demey et al. (2008) based on the proposal of Vicente‐Villardón et al. (2006). The main difference is that, in the original proposal, the scores for the individuals are obtained from the principal coordinates, whereas here they are obtained from the logistic PLS regression. The vector c of logistic regression coefficients defines the direction in the space spanned by the columns of T, which better separates the presences and absences and the expected probabilities of having the presence of the characteristic[Image Omitted. See PDF]where is the ith row of T. The expected probability is obtained by projecting the point onto the vector c. The point on that direction that predicts an expected probability of 0.5 in a two‐dimensional biplot has the coordinates[Image Omitted. See PDF]
If we predict presence when the expected probability is greater than 0.5, the direction of c divides the representation into two regions, one predicting the presence and the other predicting absence. The boundary of the two regions is a straight line perpendicular to and passing through the point (x, y). For more details, see Demey et al. (2008) or Vicente‐Villardón et al. (2006).
The goodness of fit of the logistic regression is measured using pseudo R2 measures or the deviance. Although those measures are not completely adequate with the ridge penalizations, they can still be used as descriptive indicators equivalent to the contributions of the continuous biplot.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Premise
Black Sigatoka is one of the most severe banana (Musa spp.) diseases worldwide, but no methods for the rapid early detection of this disease have been reported. This paper assesses the use of hyperspectral images for the development of a partial‐least‐squares penalized‐logistic‐regression (PLS–PLR) model and a hyperspectral biplot (HS biplot) as a visual tool for detecting the early stages of black Sigatoka disease.
Methods
Young (three‐month‐old) banana plants were inoculated with a conidia suspension of the black Sigatoka fungus (Pseudocercospora fijiensis). Selected infected and control plants were evaluated using a hyperspectral imaging system at wavelengths in the range of 386–1019 nm. PLS–PLR models were run on the hyperspectral data set. The prediction power was assessed using leave‐one‐out cross‐validation as well as external validation.
Results
The PLS–PLR model was able to predict the presence of the disease with a 98% accuracy. The wavelengths with the highest contribution to the classification ranged from 577 to 651 nm and from 700 to 1019 nm.
Discussion
PLS–PLR and HS biplot effectively estimated the presence of black Sigatoka disease at the early stages and can be used to graphically represent the relationship between groups of leaves and both visible and near‐infrared wavelengths.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details







1 Facultad de Ciencias Naturales y Matemáticas (FCNM), Escuela Superior Politécnica del Litoral (ESPOL), Guayaquil, Ecuador
2 Facultad de Ingeniería Eléctrica y Computación (FIEC), Escuela Superior Politécnica del Litoral (ESPOL), Guayaquil, Ecuador
3 Centro de Investigaciones Biotecnológicas del Ecuador (CIBE), Escuela Superior Politécnica del Litoral (ESPOL), Guayaquil, Ecuador; Facultad de Ciencias de la Vida (FCV), Escuela Superior Politécnica del Litoral (ESPOL), Guayaquil, Ecuador
4 Centro de Investigaciones Biotecnológicas del Ecuador (CIBE), Escuela Superior Politécnica del Litoral (ESPOL), Guayaquil, Ecuador
5 Department of Statistics, Salamanca University (USAL), Salamanca, Spain