1. Summary
One of the most challenging problems currently facing the global citrus industry is the Huanglongbing (HLB) disease, also known as citrus greening [1]. The disease is associated with three different variants of the Candidatus Liberibacter (Clas) [2,3,4]. Mexico is the fifth largest producer of oranges globally, with the region of San Luis Potosí representing the third most significant contributor within the country [5,6].
Infected trees display a range of leaf characteristics, including the formation of blotchy mottles, hardening of the leaves, growth in the form of rabbit ears, development of nutrient deficiency, and the formation of veins. These changes can result in fruit that is 1 cm smaller, lopsided, lighter in color, and exhibiting an inversion of color [2,4,7].
To the best of our knowledge, a limited number of publicly available datasets contain HLB-infected and healthy leaves, as shown in Table 1. Two of the most popular datasets are the Citrus Diseases image gallery [8] and the Plant Village Dataset [9]. There are other similar databases; however, those are not public or are focused on other diseases, thus are not included in the table [10,11].
The Citrus Diseases gallery has a total of 127 images, including 103 images of leaves affected by various citrus diseases and nutritional deficiencies such as canker, scab, citrus chlorotic dwarf virus, citrus stubborn disease, Tristeza virus, Mg-deficiency, N-deficiency, Zn-deficiency, boron deficiency, and only 21 HLB images. It can be noted that this dataset does not include any healthy leaf images. The Plant Village Dataset has 5507 orange leaf images. Although extensive, it includes only images of HLB-infected and does not contain healthy leaf images.
Rauf et al. [12] published a dataset of 609 leaf images, including 58 healthy and 551 infected with diseases such as Black spots, Canker, Scab, Greening, and Melanose; however, this dataset does not include HLB-infected leaves. Gómez-Flores et al. [13] published a 953-image dataset, including 100 healthy leaves images, 810 from twelve different nutritional deficiencies, and only 43 HLB-infected leaves. Additionally, two datasets are available on Kaggle. One dataset [14] includes images of 184 healthy leaves and 190 HLB-infected leaves. The other dataset (Roboflow repository) [15] has 646 healthy, 2069 black spots, 56 canker, and 285 HLB-infected leaves images. However, the Kaggle datasets do not provide details about the acquisition methodology and characteristics of the images. Compared with the available datasets, our dataset contains a large number of leaf images from both healthy leaves and those infected with HLB, including detailed specifications in the methodology of image acquisition. To the best of our knowledge, this is the most complete dataset acquired in Latin America. We believe this dataset will become a valuable resource for developing new machine-learning applications for plant disease detection.
2. Data Description
The dataset includes 649 orange leaves, which can be classified into two categories: HLB and Control. The first category (HLB) includes 270 images of leaves with symptoms of HLB.
The second category (Control) comprises 379 images of leaves without symptoms of HLB from healthy trees. These images serve as a control group for the dataset.
The dataset is presented in two formats: (1) Raw data with a white background and (2) processed data using image standardization to minimize the influence of the background and ensure a consistent frame of reference for comparison. It is possible to see some images from the database of leaves of orange trees in Figure 1.
3. Methods
This study was conducted in San Luis Potosi state, Mexico (Ciudad Fernández and Rioverde), where 4.7% of the production value from the state [10] is orange growing, the most important crop in the region.
With the assistance of the technical experts of the Plant Health Committee, the orchards were delineated to collect a sample of the orange leaves. A sample was obtained from each orchard, with one sample acquired for each hectare of orchard. Before this, the category to which each tree belonged was identified by utilizing the information recorded in the orchard and the data previously obtained from producers and the same committee. Special attention was paid to ensuring that the trees were free of any other diseases that could affect them.
Once a tree was selected, the leaf was cut and stored in a zip lock bag to preserve it in good condition. The leaf was then photographed in a controlled environment, with the camera positioned in front of a white sheet of paper to limit external elements’ influence, as shown in Figure 2. The photographs were acquired using cameras on several mobile phones, as described in Table 2.
Image standardization
To minimize the influence of the background in the photos and ensure a consistent frame of reference for comparison, we standardized the images using the methodology outlined in Figure 3, which consists of seven main steps:
Step 1 (convert RGB to HSV): The RGB color model combines red, green, and blue components to define a color. However, using this representation directly for leaf segmentation can be challenging due to how these components interact, especially under varying lighting conditions. In contrast, in the HSV color model, the hue channel represents the color itself (Figure 1: Step 1 Hue Channel), while the value represents brightness (Figure 3: Step 1 Value Channel). This separation makes it easier to distinguish objects based on color, regardless of lighting variations.
Step 2 (color-based segmentation): From visual inspection, the Hue and Value (Brightness) channels are effective for background removal, as the background typically exhibits higher Hue and Value values than the leaf. This is due to the leaf’s pigmentation (yellow and green) corresponding to Hue values in the range [30,130], while the background is generally brighter. Based on this observation, an Otsu thresholding method [11] was applied to automatically determine segmentation thresholds for each channel. The segmentation is defined as
where Hue and Value are the respective channels, and Thue is the thresholds obtained via Otsu.Figure 3 (Step 2) illustrates an example of this segmentation, demonstrating effective background removal. However, in some cases, small background structures remain detectable, or the thresholds may cause under-segmentation, resulting in missing portions of the leaf. Then, to address these issues, manual threshold adjustments were made in case of segmentation errors, incorporating the Saturation channel to enhance segmentation control. Finally, the largest connected component is selected as the leaf from the raw segmentation.
Step 3 (fill holes): The raw segmentation may contain holes due to variability in color and brightness. To address this, an algorithm is applied to fill the holes, ensuring the segmentation encompasses the entire leaf. Figure 3 (Step 3) illustrates an example where the holes in the raw segmentation are corrected, resulting in a complete leaf segmentation.
Step 4 (Leaf direction): Principal Component Analysis (PCA) is a statistical technique used to identify orthogonal vectors that capture most of the variance (information) in the data. It is commonly applied to reduce data dimensionality while preserving variance. In this study, PCA is used to determine the direction of maximum variability of the leaf, which corresponds to its orientation. To achieve this, the coordinates (x,y) of each pixel within the segmentation are extracted, and PCA is applied. The first principal component represents the primary direction of the leaf’s variability, which is taken as its main orientation. Additionally, the mean of the coordinates in each dimension. Figure 3 (Step 4) illustrates the detected center of the leaf. The blue arrow represents the direction of the first principal component (main direction), while the second arrow corresponds to the second principal component. This approach effectively and accurately identifies the leaf’s orientation.
Step 5 (RGB background removal): To remove the background from the RGB image, the segmentation mask obtained in Step 3 (with intensity values of 1 for the leaf and 0 for the background) is multiplied elementwise with the original RGB image. This operation retains only the leaf region while setting the background to zero. Figure 3 (Step 5) shows the result, where the background has been successfully removed from the original RGB image.
Step 6 (Align principal component with the x-axis): The primary principal component (blue arrow) forms an angle with respect to the x-axis. To align it, the RGB image (with the background removed) is rotated clockwise around the center, as determined in Step 4 by angle . The result is an image where the principal component (indicated by a blue arrow) aligns with the x-axis, as shown in Figure 3 (Step 6).
Step 7 (Final image): The final step involves detecting the bounding box that encloses only the leaf. In Figure 3 (Step 6), the bounding box is represented by a yellow rectangle. The final cropped image is presented in Figure 3 (Step 7).
4. Conclusions
The dataset described in this paper includes 649 images of orange leaves divided into HLB-infected (n = 270) and Healthy (n = 379). This dataset addresses a gap in HLB research by providing standardized, background-removed images of orange leaves from Mexico. It is the second largest public database for HLB detection which includes images from healthy leaves. This dataset can contribute to training new machine learning or deep learning classifiers for HLB detection, particularly in the early stages. Also, it allows the performance of comparative studies of HLB based on the geographic region. Image standardization can also contribute to the prepossessing pipeline for future HLB image analysis.
Despite our dataset including a small number of HLB images compared to other public datasets, the fact that it includes healthy images, balanced classes, a controlled acquisition process, and a preprocessing stage increases its reliability for the development of classifier models. Considering the variability of image acquisition across multiple smartphone cameras and with high resolution when compared with other similar databases enhances its potential for applications in a real-world scenario. In addition to classification tasks, this dataset can be used for applications such as segmentation for leaf morphology analysis, detecting anomalies associated with HLB symptoms or nutritional deficiency, and disease progression monitoring to complement diagnosis. Moreover, our dataset will be expanded as new images become available.
The authors recognize that the most accurate method for determining HLB’s presence is the Quantitative Real-time Polymerase Chain Reaction (qPCR). However, the cost associated with the synthesis of probes for 649 distinct tree species is prohibitively high, limiting the feasibility of this approach. For this reason, we collaborate with the Plant Health Committee, which provides technical experts who monitor pests and diseases in the region daily. The areas where HLB is present and the areas where the trees are healthy. Most plants have been diagnosed by the government secretariat focused on disease control. For those plants that were not diagnosed by qPCR, the experts were confident in indicating which ones had symptoms of HLB and which were healthy. A further limitation of this study is that the images could not be acquired in situ due to the influence of external factors such as sunlight, shadows, and brightness, which affected their quality.
Future work will include expanding the dataset with in-field images and qPCR-validated samples, integrating multispectral or hyperspectral imaging for multimodal analysis, and developing classifier models optimized for mobile environments toward a smartphone-based acquisition methodology.
In summary, our dataset addresses an important gap in HLB disease research, offering a foundation for both academic and applied model innovations in agricultural diagnostics, offering more than 250 images of each class acquired with standardization and in high resolution.
Conceptualization: J.C.T.-G. and M.G.R.-E.; Data curation: J.C.T.-G., J.A.O., X.G.Á.C., L.M.C.I. and P.M.M.O.; Formal analysis: J.C.T.-G. and P.H.H.; Funding acquisition: J.C.T.-G. and M.G.R.-E.; Investigation: J.C.T.-G., M.G.R.-E. and P.H.H.; Methodology: J.C.T.-G. and P.H.H.; Project administration: J.C.T.-G.; Resources: J.C.T.-G., J.A.O., X.G.Á.C., L.M.C.I. and P.M.M.O.; Software: J.C.T.-G. and P.H.H.; Supervision: J.C.T.-G. and M.G.R.-E.; Validation: P.H.H.; Visualization: J.C.T.-G., M.G.R.-E. and P.H.H.; Writing—original draft: J.C.T.-G.; Writing—review and editing: J.C.T.-G., P.H.H., V.A.-G., M.G.R.-E., E.G., J.S.M., E.R.A.-S. and A.A. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Not applicable.
Repository name: Orange Leaves Images Dataset for the Detection of Huanglongbing. Data identification number: DOI:
The authors would like to express their gratitude to “Secretaría de Desarrollo Agropecuario y Recursos Hidráulicos (SEDARH)”, especially to Noel Isaí Pérez Robles, and “Comité Estatal de Sanidad Vegetal of San Luis Potosí” for their invaluable support. This work has been made possible partially by grant number 2023-329644 from the Chan Zuckerberg Initiative DAF, an advised fund of the Silicon Valley Community Foundation. JCTG also extends acknowledgments to “Consejo Nacional de Humanidades, Ciencias y Tecnologías” (CONAHCYT) for postdoctoral fellowship 4630373 and for the National System of Researchers (SNII) under 346243.
The authors declare that they have no known competing financial interest or personal relationships that could have appeared to influence the work reported in this paper.
The following abbreviations are used in this manuscript:
HLB | Huanglongbing |
ML | Machine Learning |
DL | Deep Learning |
Clas | Candidatus Liberibacter |
RGB | Red Green Blue |
HSV | Hue Saturation Value |
PCA | Principal Component Analysis |
qPCR | quantitative real-time polymerase chain reaction |
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1 Images from the database. The images of the top row correspond to leaves with symptoms of HLB, while the images of the bottom row are from healthy leaves of orange trees. The images of panels (a,c,e,g) are taken with the camera of different smartphones, while panels (b,d,f,h) are from the processed data with a standardization that minimizes the influence of the background.
Figure 2 Flow chart of the database acquisition with leaves of orange trees showing symptoms of HLB and those that are healthy.
Figure 3 Workflow illustrating the seven main steps for image standardization: (1) convert the image from the RGB color model to HSV, (2) segment the leaf using a thresholding method applied to each HSV channel, (3) fill holes to ensure the entire leaf is detected, (4) identify the principal direction of the leaf, (5) remove the background from the RGB image using the segmentation, (6) align the leaf with the x- and y-axes based on its principal direction, and (7) crop the image to a bounding box that tightly encloses the leaf.
The table illustrates public databases; however, it exclusively displays the number of images that comprise each dataset of HLB and healthy classes, despite the existence of additional categories.
Database Name/Author | Reference | HLB Images | Healthy Images | Total Images | Minimum Resolution | Maximum Resolution |
---|---|---|---|---|---|---|
Citrus Diseases Image Gallery | [ | 21 | 0 | 127 | 496 × 397 | 1882 × 1201 |
Plant Village Dataset | [ | 5507 | 0 | 5507 | 256 × 256 | 256 × 256 |
Rauf et al. | [ | 204 | 58 | 262 | 256 × 256 | 256 × 256 |
Gómez-Flores et al. | [ | 43 | 100 | 143 | 4128 × 3096 | 4128 × 3096 |
Citrus leaves images divided in Huanglongbing (HLB) infected and healthy | [ | 190 | 184 | 374 | 720 × 1280 | 4032 × 2268 |
Roboflow repository | [ | 285 | 646 | 931 | 640 × 640 | 640 × 640 |
Our dataset | This work | 270 | 379 | 649 | 1800 × 4000 | 4624 × 3468 |
Smartphone, images, and camera features. The characteristics of each cellphone utilized to capture the images of the leaves, as well as the number of images acquired by each camera, are delineated in the table, along with the average and standard deviation of contrast and percentage of luminescence.
Cellphone Used | Camera Features | Images Acquired by Cellphone | |||||
---|---|---|---|---|---|---|---|
Healthy | HLB | Total | Image Resolution | Contrast | Luminiscence (%) | ||
iPhone 13 | 12MP ƒ/1.6 aperture | 4 | 50 | 54 | 1800 × 4000 | | 69.26 ± 0.25 |
Motorola Edge 40 Neo | 50 MP wide-angle, ƒ/1.8 aperture | 203 | 32 | 235 | 4624 × 3468 | 65.0 ± 5.42 | |
Xiaomi Poco C65 | 50 MP, ƒ/1.8 aperture | 21 | 77 | 98 | 4624 × 3468 | | 68.71 ± 3.91 |
Samsung Galaxy A32 | 64 MP, ƒ/1.8 | 100 | 61 | 161 | 2084 × 4624 | | 67.58 ± 5.06 |
Samsung Galaxy A52 | 64 MP, ƒ/1.8 | 51 | 50 | 101 | 3468 × 4624 | | 71.87 ± 4.81 |
Total | 379 | 270 | 649 | --- | --- | --- |
1. Silva, J.R.D.; Boaretto, R.M.; Lavorenti, J.A.L.; dos Santos, B.C.F.; Coletta-Filho, H.D.; Mattos, D. Effects of Deficit Irrigation and Huanglongbing on Sweet Orange Trees. Front. Plant Sci.; 2021; 12, 731314. [DOI: https://dx.doi.org/10.3389/fpls.2021.731314] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34721459]
2. EPPO Global Database. Available online: https://gd.eppo.int/ (accessed on 21 June 2024).
3. Home. Available online: https://www.cabi.org/ (accessed on 21 June 2024).
4. Bové, J.M. Huanglongbing: A Destructive, Newly-Emerging, Century-Old Disease of Citrus. J. Plant Pathol.; 2006; 88, pp. 7-37.
5. Producción de Cítricos en México. Available online: http://www.gob.mx/publicaciones/articulos/produccion-de-citricos-en-mexico?idiom=es (accessed on 2 July 2024).
6. Secretaría de Agricultura y Desarrollo Rural. México, Quinto Productor Mundial de Naranja. Available online: http://www.gob.mx/agricultura/articulos/mexico-quinto-productor-mundial-de-naranja (accessed on 3 November 2024).
7. Floyd, J.; Krass, C. New Pest Response Guidelines: Citrus Greening Disease; Animal and Plant Health Inspection Service: Riverdale, MD, USA, 2008.
8. Citrus Diseases Image Gallery. Available online: https://idtools.org/citrus_diseases/ (accessed on 20 June 2024).
9. PlantVillage Dataset. Available online: https://www.kaggle.com/datasets/emmarex/plantdisease (accessed on 5 July 2024).
10. Syed-Ab-Rahman, S.F.; Hesamian, M.H.; Prasad, M. Citrus disease detection and classification using end-to-end anchor-based deep learning model. Appl. Intell.; 2022; 52, pp. 927-938. [DOI: https://dx.doi.org/10.1007/s10489-021-02452-w]
11. Qiu, R.-Z.; Chen, S.-P.; Chi, M.-X.; Wang, R.-B.; Huang, T.; Fan, G.-C.; Zhao, J.; Weng, Q.-Y. An automatic identification system for citrus greening disease (Huanglongbing) using a YOLO convolutional neural network. Front. Plant Sci.; 2022; 13, 1002606. [DOI: https://dx.doi.org/10.3389/fpls.2022.1002606] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/36605957]
12. Rauf, H.T.; Saleem, B.A.; Lali, M.I.U.; Khan, M.A.; Sharif, M.; Bukhari, S.A.C. A citrus fruits and leaves dataset for detection and classification of citrus diseases through machine learning. Data Brief; 2019; 26, 104340. [DOI: https://dx.doi.org/10.1016/j.dib.2019.104340] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31516936]
13. Gómez-Flores, W.; Garza-Saldaña, J.J.; Varela-Fuentes, S.E. CitrusUAT: A dataset of orange Citrus sinensis leaves for abnormality detection using image analysis techniques. Data Brief; 2024; 52, 109908. [DOI: https://dx.doi.org/10.1016/j.dib.2023.109908] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/38093853]
14. Citrus Leaves Images Divided in Huanglongbing (HLB) Infected and Healthy. Available online: https://www.kaggle.com/datasets/oarcanjomiguel/citrus-greening (accessed on 5 July 2024).
15. Dimitra Citrus Dataset Dataset > Overview. Available online: https://universe.roboflow.com/dimitra-el1gk/citrus-dataset-hvhep (accessed on 5 July 2024).
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In agriculture, machine learning (ML) and deep learning (DL) have increased significantly in the last few years. The use of ML and DL for image classification in plant disease has generated significant interest due to their cost, automatization, scalability, and early detection. However, high-quality image datasets are required to train robust classifier models for plant disease detection. In this work, we have created an image dataset of 649 orange leaves divided into two groups: control (n = 379) and huanglongbing (HLB) disease (n = 270). The images were acquired with several smartphone cameras of high resolution and processed to remove the background. The dataset enriches the information on characteristics and symptoms of citrus leaves with HLB and healthy leaves. This enhancement makes the dataset potentially valuable for disease identification through leaf segmentation and abnormality detection, particularly when applying ML and DL models.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details




1 Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, Av. Chapultepec 1570, Privadas del Pedregal, San Luis Potosí 78295, Mexico; [email protected] (J.C.T.-G.);
2 Comité Estatal de Sanidad Vegetal de San Luis Potosí, Rioverde 796133, Mexico
3 Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, Av. Chapultepec 1570, Privadas del Pedregal, San Luis Potosí 78295, Mexico; [email protected] (J.C.T.-G.);, Laboratorio Nacional-Centro de Investigación, Instrumentación e Imagenología Médica, Facultad de Ciencias, Universidad Autónoma de San Luis Potosí, Av. Chapultepec 1570, Privadas del Pedregal, San Luis Potosí 78295, Mexico