Content area
Societal Impact Statement
Diverse gene pools are fundamental to crop improvement, biodiversity maintenance and environmental management. The UKCropDiversity‐HPC high‐performance computing resource enables seven UK institutes to perform plant and conservation research with increased efficiency, cost‐effectiveness and environmental sustainability. It supports research across numerous areas, including bioinformatics, genetics, phenomics and conservation ‐ including Artificial Intelligence approaches. Its utilisation supports many United Nations Sustainable Development Goals, including Goals‐2 (Zero Hunger), −13 (Climate Action), −15 (Life on Land), −9 (Industry, Innovation and Infrastructure) and −4 (Quality Education). Accordingly, UKCropDiversity‐HPC helps maximise the societal impact of research undertaken at our seven institutes, driving positive change for future generations.
INTRODUCTION
In recent years, the scientific community has benefited from the development of more accurate, affordable and scalable technologies for biological assay, leading to the accumulation of an ever-increasing volume of data (Chen et al., 2021; Kelling et al., 2009; Marx, 2013; Vitorino, 2023). This includes data generated directly from high-throughput sequencing platforms (e.g. Walkowiak et al., 2020), large-scale imaging technologies (e.g. Shen et al., 2024) and real-time environmental sensors (Shakoor et al., 2019), as well as the intermediary files generated during their analysis and interpretation. These data provide a wealth of scientific opportunity but also bring considerable challenges (Kelling et al., 2009; Marks et al., 2021; Sahu et al., 2023; Wordsworth et al., 2018), including: (i) effective data management, analysis and interpretation; (ii) access to compute resources capable of storing and processing these data; and (iii) acquiring and training the expertise needed to perform these tasks. UKCropDiversity-HPC (high-performance computing) was established in 2019 as a response to the growing demand for the computational infrastructure necessary to address such needs. UKCropDiversity-HPC serves seven United Kingdom (UK)-based research institutes (the James Hutton Institute, Niab, Natural History Museum, Royal Botanic Gardens Kew, Scotland's Rural College, Royal Botanic Garden Edinburgh and the University of St Andrews) united by their focus on agricultural and environmental research and conservation. In addition, these institutes provide training in and use of novel machine learning models. Collectively the research is focused on improving our understanding of the diversity underpinning natural and managed environments (e.g. Bachman et al., 2024; Gagnon et al., 2023), crop biology (e.g. Graham et al., 2022; Tsang et al., 2024) and accelerating pre-breeding and breeding technologies (e.g. Schreiber et al., 2024; Wang et al., 2023).
The establishment of collective computing capabilities serving synergistic research areas comes with numerous benefits, including the efficient sharing of resources, the provision of joint training and reduced individual cost and environmental impact. Specifically, UKCropDiversity-HPC sets out to address four main overarching requirements:
- To meet the increasing demand for high-throughput computational capacity to support genomics, phenotyping, imaging and the application of artificial intelligence (AI) for plant and conservation research, in line with United Nations Sustainable Development Goals (SDGs) 2 (Zero Hunger), 13 (Climate Action) and 15 (Life on Land) (United Nations, 2015).
- To provide training of scientists in the best use/practice of the facilities, programming languages, bioinformatics, data analysis, computer vision and machine learning, supporting SDG 4 (Quality Education).
- To further refine and grow an extensive network of research collaborations across the institutes and their collaborators, including industry partners, in alignment with SDG 9 (Industry, Innovation and Infrastructure).
- To minimise financial cost and energy consumption associated with individual computing infrastructures, thereby reducing our environmental footprint, thus supporting SDG 7 (Affordable and Clean Energy) and SDG 12 (Responsible Consumption and Production).
Despite such benefits, scientific reporting on the implementation of shared compute resources is uncommon. To address this gap, here we describe the recent establishment and utilisation of the shared HPC infrastructure ‘UKCropDiversity-HPC’, including details of the hardware and software ecosystem, and how it maximises the capacity to perform data analysis in the fields of plant science and biodiversity across seven institutes based in the UK. This paper presents UKCropDiversity-HPC as a model for the establishment of similar collaborative frameworks in other research settings. To this end, we consider the role UKCropDiversity-HPC has played in addressing our escalating collective demand for computational resources. We describe the technical implementation of the resource, our simple collaborative framework, how it contributes to transformative research across our institutes, and detail key lessons learned. Furthermore, we highlight the environmental sustainability and cost efficiency of this resource, underscoring its positive societal impact in relation to the United Nations SDGs. By lowering costs, promoting partnerships, enhancing research productivity and removing the requirement for pay-per-use access for users, UKCropDiversity-HPC illustrates the value of shared scientific resources. Accordingly, it serves as a recent successful example to help advocate for the wider establishment of such resources in the scientific research and development sector.
IMPLEMENTATION
UKCropDiversity-HPC is designed to enable the processing and storage of large volumes of data generated by cutting-edge ‘omics’, imaging and sensing technologies. In addition, the resource enables the application of AI, deep learning and computer vision approaches by providing access to significant Core and Graphical Processing Unit (CPU and GPU) capabilities, including high-memory nodes. The management of these resources is coordinated by a steering committee that agrees and implements policies on use and access. The committee meets quarterly via conference calls. The guiding principle is to support the development of our research staff - as they are key to the delivery of our science. Users actively engage with each other and collaborate through a dedicated online collaborative platform, where they can ask and respond to questions, share experiences and raise issues. This real-time communication between users, administrators and technical staff helps to improve the collaborative network and promotes interaction between researchers from different institutes.
The UKCropDiversity-HPC platform incorporates the following key features:
Computational power
With a combined total of >6 thousand CPU cores and >490 thousand GPU cores, the system can process information at a maximum theoretical peak performance of 2.17 petaflops (1015 operations per second). Forty-five terabytes of memory are accessible in the cluster, with specific individual nodes able to access up to 4 TB of memory when needed. Approximately 85% of the cluster's performance comes from dedicated GPU accelerator cards. These are essential for the handling of large-scale data processing and complex model training and contribute to a great performance-to-energy usage balance.
Storage and network infrastructure
The platform offers 8 PB of primary storage, necessary for accommodating the large volumes of data being generated. The storage system utilises the BeeGFS filesystem () to deliver scalable, high-performance primary and secondary storage, with networking speeds of up to 100 Gbps. Additionally, 128 TB of all-flash storage is allocated for accelerating file operations during active job processing.
Software ecosystem
The platform utilises a range of open-source solutions, including Rocky Linux for the operating system, BeeGFS () for storage and Slurm (Yoo et al., 2003) for job scheduling. Conda and Mamba (Grüning et al., 2018; ) are used to simplify the installation of dependencies and manage software packages and environments. Apptainer (formerly Singularity) and Docker (Dykstra, 2022; Merkel, 2014) are used to create and manage container environments. Ansible () is used for configuration management, and Prometheus (Rabenstein & Volz, 2015) for system monitoring. Bespoke tools and pipelines () have been developed and deployed across the platform to benefit both local users and public-facing services, ensuring optimal performance and user experience (Figure 1). The system's help documentation () is also an open, collaborative resource.
[IMAGE OMITTED. SEE PDF]
By combining these foundational elements with well-engineered workflow managers such as Snakemake (Mölder et al., 2021) and Nextflow (Di Tommaso et al., 2017), robust, comprehensive and user-friendly access to bioinformatics and data processing tools are being established for all of our users - including those with limited expertise.
Security
Data is stored within restricted-access data centres. Primary storage servers are protected by multiple levels of physical disk redundancy and mirroring, and backups are taken daily. Daily snapshots are maintained for 30 days, with additional monthly snapshots kept for 18 months. Backups are kept in physically separate buildings. Data held in the archive is kept for a minimum of ten years and duplicated across two geographically separate sites.
COLLABORATION
Collaboration and efficient data sharing in science have been historically important for fostering innovation, sharing resources and tackling complex challenges. Established in 2019, our UKCropDiversity-HPC currently has >550 users and has seen >21 million submitted jobs using >11,500 CPU years of processing time. It enables research in crop and plant bioinformatics, genomics, conservation genetics, agroforestry, taxonomy and plant pathogens, and was also utilised during the Scottish COVID-19 response (Pooley et al., 2022). By enabling shared access to large datasets, UKCropDiversity-HPC not only facilitates collaboration but also allows for more comprehensive analyses and insights across different scientific disciplines. Promoting such ways of working will be critical in addressing many of the world's current grand challenges, such as the effects of climate change (SDG 13: Climate Action), sustainable food production (SDG 2: Zero Hunger) and loss of biodiversity (SDG 15: Life on Land).
RESEARCH OUTCOMES
The UKCropDiversity-HPC has enabled multiple significant research outcomes including 132 publications, and active engagement with researchers from >40 collaborating research organisations (Figure 2). These collaborations have contributed to advancements in plant science and biodiversity research with a focus on, crop genetics, genomics, phenomics and biodiversity conservation.
- Genomic Insights: UKCropDiversity-HPC is being used for research in disease resistance, physiology and developmental biology, evolution, classification, risk of extinction and the development of biological resources and methods (SDG 2: Zero Hunger, SDG 9: Industry, Innovation and Infrastructure, SDG 13: Climate Action, and SDG 15: Life on Land). Recent work has seen genetic loci linked to fruit development stages in red raspberries identified and candidate genes within these loci described (Graham et al., 2022) while Tsang et al. (2024) have shed light on genetic components influencing wheat root hair growth and its impact on plant performance. In the field of crop disease resistance, significant breakthroughs have been made. SMRT–AgRenSeq-d, a method to identify candidate genes for nematode resistance in potato has been developed (Wang et al., 2023), while Kaur et al. (2024) have provided insights into the molecular interactions between potato immune receptors and the fungal pathogen that causes potato blight (Phytophthora infestans, the disease that caused the Irish Potato Famine, resulting in ~1 million deaths, social unrest, and to this day remains a major threat to the potato crop). Our understanding of evolution and adaptation has also been enhanced. Researchers have investigated how fruit flies rapidly adapt to changes in sexual selection pressures (Barata et al., 2023), while Louis et al. (2023) have shown, using paleogenomics, that dolphins have repeatedly and rapidly adapted to new coastal habitats. In taxonomy, UKCropDiversity-HPC has provided detailed classification and evolutionary histories of various species. Genomic data was used to elucidate the phylogeny, biogeography and ecological diversification of New Caledonian palms (Pérez-Calle et al., 2024), while Rees et al. (2023) combined genetic and morphological data to refine the classification of Brazil's national tree (Paubrasilia echinata, endemic to the Atlantic Forest that once covered much of Brazil's eastern seaboard). UKCropDiversity-HPC has been utilised in conservation efforts, for instance, Speak et al. (2024) demonstrated using captive-bred Pink Pigeon zoo samples how advancements in bioinformatics allow for the analysis of genetic load in zoo populations of threatened species, using mutation-impact scores to identify optimal mate pairings to maximise the fitness of offspring. UKCropDiversity-HPC has also assisted doctoral students, with Marsh et al. (2023) assessing the performance of biological relatedness inference software specifically designed for degraded genetic material; a topic of significance for reintroduction programs and when monitoring the expansion and origin of invasive species. Finally, UKCropDiversity-HPC has driven the development of robust methods for analysing complex biological data with a high degree of accuracy. For instance, Cock et al. (2023) introduced the fast, cautious and accurate metabarcoding analysis pipeline ‘THAPBI PICT-a’ that processes Illumina paired-end amplicon reads to generate user-friendly species classification reports, and Smith et al. (2024) have released ‘Resistify,’ a tool that rapidly and accurately annotates plant immune receptor genes in genomic DNA sequence. These genomic studies not only enhance our understanding of biology but also support global food and biodiversity security efforts.
- Accelerated Breeding Programmes: Advanced computational tools for genetic and genomic analysis are increasingly being employed to propel breeding efforts forward (SDG 2: Zero Hunger). To this end, research enabled via the use of UKCropDiversity-HPC is contributing on multiple fronts. Notable recent examples include: The use of high-throughput workflows, powered by Snakemake, to streamline plant disease resistance gene discovery and improve analytical reproducibility and accessibility (Adams et al., 2023). The release of comprehensive barley genomic resources that facilitate genome-wide association studies and gene expression analyses, empowering breeders with valuable insights for targeted breeding efforts (Schreiber et al., 2024). Genetic loci have been discovered for key agronomic traits using a genetically diverse nested association mapping (NAM) population derived from synthetic hexaploid wheat (specifically generated wheat germplasm that re-creates the speciation event that occurred in Neolithic farmers' fields when a progenitor of pasta wheat hybridised with a wild wheat species to form modern bread wheat) (Wright et al., 2024). This useful resource for identifying beneficial genetic loci from synthetic hexaploid wheat is now being used by breeders to improve modern UK elite wheat cultivars. The refinement of the barley reference transcriptome through BaRTv2.18 is enabling greater precision in RNA-seq quantification, in turn facilitating deeper investigations into barley breeding (Coulter et al., 2022). The resource has also been applied in animal breeding by studying the benefits of imputing genotypes across dairy cattle breeds rather than simply within breeds, which is expected to encourage an increase in the number of farmers practising cross-breeding – and a consequential increase in genotyping of livestock. In addition to these genotyping efforts, James et al. (2024) have used whole-genome and RNA sequence data combined with deep phenotyping for ~2,100 Holstein cattle, to identify candidate genes for feed efficiency, providing insights for breeding and genotyping chip design. These research projects highlight the importance of using advanced genomics computational tools and analysis to accelerate breeding programs and provide breeders with cutting edge resources needed to traverse the complex landscape of crop and livestock improvement.
- Biodiversity Genomics: UKCropDiversity-HPC has helped catalyse advancements in biodiversity genomics and conservation efforts. Research in these areas is crucial to help address biodiversity decline in natural and managed environments, which impacts all orders of life including plants (SDGs 13: Climate Action, and 15: Life on Land). Work in this research sector includes generating barcodes and reference genomes for all UK species through the Darwin Tree of Life project to empower environmental DNA monitoring and population genetics (Darwin Tree of Life Project Consortium, 2022) through to studies into the drivers of diversity in tropical systems (Forrister et al., 2023) and crop wild relatives (Gagnon et al., 2023). Resources for genetic studies in tropical plants have been produced (Campos-Dominguez et al., 2022), allowing the analysis of nuclear, chloroplast and mitochondrial phylogenetic genetic patterns across megadiverse genera (Ardi et al., 2022). The low cost to the user and supportive environment provided via UKCropDiversity-HPC has enabled students from the global south to produce detailed studies of diversity patterns in their own region's flora (Phang et al., 2023). Similarly, UK-based students have made use of the facility for biodiversity research, for example, investigation into the evolutionary history of the antelope genus Hippotragus using degraded museum specimens (Plaxton et al., 2023). The resource has enabled the large-scale (and thus computationally demanding) analysis of flowering plants as a whole: for example, in a phylogenetic study which provided new insights into their evolutionary history (Zuntini et al., 2024); and in the automated prediction of extinction risk for every flowering plant species (Bachman et al., 2024).
- AI-powered trait analysis: The use of AI technologies is emerging as an important tool for the plant sciences, for example enabling scientists to gain a deeper understanding of complex trait interactions (SDG 9: Industry, Innovation, and Infrastructure). Cutting-edge AI and computer vision techniques empowered by UKCropDiversity-HPC's CPU and GPU clusters are being used to develop multi-scale plant phenotyping and scalable trait analysis algorithms by the joint UK-China data science laboratories at Niab and Nanjing Agricultural University (NAU). These include the assessment of growth-related traits for genetic mapping in rice (Sun et al., 2022), drone-based detection of rice panicles under field conditions (Teng et al., 2023) and the establishment of dynamic phenotyping platforms to study nitrogen responsiveness in wheat (Ding et al., 2023). Moreover, UKCropDiversity-HPC enables the establishment and training of powerful AI models with complicated learning architectures and large-scale annotated datasets, revolutionising our approach to extracting meaningful information from large phenotyping datasets for the evaluation of key agronomic traits in global crop improvement programmes.
[IMAGE OMITTED. SEE PDF]
For a comprehensive list of UKCropDiversity-HPC publications, see
Transformative research
UKCropDiversity-HPC running costs are split equally between our seven partner institutes, regardless of use by any individual user, project or institute. In contrast to a commercial cloud compute model (for which access is generally “pay for use”), this enables researchers such as MSc and PhD students with limited resources to undertake ambitious projects rather than being confined to smaller, less resource-intensive endeavours (SDG 4: Quality Education).
We anticipate that this HPC infrastructure will have an even more pronounced impact as research continues to move into the era of AI approaches - as access to very large compute resources is crucial for the development and training of such models. To address this, recent further investment in UKCropDiversity-HPC GPU accelerators has significantly enhanced our ability to facilitate cutting-edge research in crop genetics, phenomics and biodiversity conservation.
TRAINING
Skills development for plant science and biodiversity research is at the heart of our community HPC resource (SDG 4: Quality Education). Training programs are delivered that provide users with the knowledge and expertise necessary to maximise the use of the resource. We focus on delivering group training that will equip researchers with the knowledge needed to install software, curate data and analyse datasets (Table 1). In addition, skills in specific topic areas, such as transcriptomics, population genetics, machine learning, image analysis and statistical modelling have been shared one-to-one by experienced bioinformaticians and data scientists.
TABLE 1 Group training courses overview: topics covered and course descriptions.
| Training | Description |
| Scheduling jobs using Slurm | Covers the fundamentals of using Slurm, a popular open-source job scheduler for HPC environments. Participants learn how to efficiently submit, monitor and manage computational jobs. |
| Using Linux | Often taught alongside the general use of the HPC and scheduling jobs with Slurm, participants are introduced to the Linux operating system, by teaching them essential commands and techniques needed for effective navigation and file manipulation. |
| Python for biologists | Aimed at biologists, this course provides an introduction to the python programming language. Participants learn Python's syntax, data types, loops, functions and libraries. |
| Installing software with Conda and Mamba | Teaches participants how to set up and manage software environments using Conda. It also covers the use of Mamba, a faster alternative. Topics include creating isolated environments, installing, updating and removing software packages and importing and exporting environments. |
| Reproducible research | An extension to the above, this module extends users' knowledge into creating and using containers within the HPC environment, concentrating specifically on Docker and Apptainer, both of which can be deployed by end users. |
| GPU-accelerated processing | Compares and contrasts CPUs and GPUs, looking at how massively parallel jobs can be optimised, before giving an overview of topics such as deep learning, machine learning and AI, as well as the frameworks and libraries to assist in their use. |
Training courses include using Slurm for job management in HPC environments, using the Linux operating system and managing software environments with Conda and Mamba. Other courses focus on reproducible research with containerisation tools like Docker and Apptainer, and comparing GPUs to CPUs to highlight the benefits of parallel processing in AI applications.
Our written training materials for UKCropDiversity-HPC are accessible at: and . These materials serve as a valuable reference, providing detailed guidance on various topics. Combined with the group and one-to-one training offered, this package of training strategies plays a crucial role in developing the skill sets of our collective researchers.
ENVIRONMENTAL IMPACT AND COST EFFICIENCY
While there will always be a cost in running compute resources, the UKCropDiversity-HPC helps each institute with its drive to net zero (SDG 7: Affordable and Clean Energy; SDG 12: Responsible Consumption and Production). The resource has essentially rationalised compute provision and in doing so reduced environmental impact. The James Hutton Institute physically hosts the resource in a modern, purpose-built and energy-efficient data centre co-located within the James Hutton Institute's International Barley Hub facility. This is powered by a combination of renewable energy from a nearby solar meadow or sources of electricity backed by Renewable Energy Guarantees of Origin certificates, while the new facility provides ample room for future expansion. We champion ‘Green Computing’ (Marra et al., 2011), to lower the environmental footprint of our infrastructure and provide training to maximise efficient use, in addition to employing automated solutions to further reduce power (e.g. shutting down idle nodes). Hardware is prioritised on a performance-per-watt basis, meaning instead of selecting processors with fewer very fast cores, we select those with a greater number of cores that run more efficiently, a perfect fit for many bioinformatics tasks where parallelising jobs can yield huge cost and performance benefits. With our recent increase in GPU capacity, we will be able to migrate some common compute tasks from CPUs where viable, yielding both performance and efficiency gains. This will require new coding and testing but is work that can be shared amongst the collaborating institutions.
LESSONS LEARNED
Since its establishment, UKCropDiversity-HPC has reduced HPC costs for the collaborating institutes (SDG 7: Affordable and Clean Energy; SDG 12: Responsible Consumption and Production). We estimate that each institute saves approximately £190 K each year when compared to each operating and administering their own smaller HPC systems, with additional savings to be made by those relying on cloud-based options. Whilst small clusters are sufficient for sporadic usage, our users are accessing thousands of CPU cores (often on large datasets requiring both high disk and memory use) constantly, and the costs of doing this via third-party cloud services would be exorbitant - especially when analysis of novel datasets requires empirical determination of correct parameters (i.e. analysis reruns with different settings). However, as with any large-scale project, difficulties have arisen, and lessons have been learned (see also Dataset S1.). An example is the potential for inefficiencies in resource use due to users being able to submit large numbers of compute jobs without properly evaluating their resource needs. This can then impact other users by reducing the available compute resources. To address this, training programs are provided to equip users with skills to maximise their resource efficiency. Additionally, weekly feedback is provided in the form of usage reports for each user with their usage ranking. We find these reports to be a “light-touch mechanism” which encourages subsequent self-monitoring, users' own identification of inefficiencies, and the motivation to run their jobs more efficiently - while avoiding any tension with the system administration team. These efforts have resulted in improved system performance, greater overall efficiency and more skilled users (staff and students), and benefit users by avoiding pay-per-use requirements.
CONCLUSIONS
Here, the UKCropDiversity-HPC has been used as an example to highlight the utility of shared compute resources and to promote the establishment of such resources elsewhere. Primarily, UKCropDiversity-HPC has helped transform the way in which research is conducted at a collective of seven UK institutes: access to UKCropDiversity-HPC has enabled researchers to use innovative computational tools and methodologies on big data while avoiding pay-per-use access requirements. This has yielded significant research results (SDG 9: Industry, Innovation and Infrastructure) across the participating institutes' remit, ranging from genomic insights into crop diversity and accelerated breeding programs to biodiversity conservation genomics (SDGs 2: Zero Hunger, 13: Climate Action and 15: Life on Land). Collaborations across different institutions have been fostered and facilitated by the ease with which large datasets and analyses can be shared, and less well-funded researchers, such as students, have been able to perform complex and computationally demanding analyses on big data. The resource also plays a key role in providing training opportunities in the skills and knowledge necessary to analyse complex research data (SDG 4: Quality Education). In addition to driving scientific advancements, environmental sustainability and cost-effectiveness are maximised through the sharing of computing resources. As research continues to become increasingly data-rich, we hope that the concepts, examples, benefits and lessons learned presented in the recent example of a joint compute resource presented here can be used as a model for the establishment of similar resources to provide efficient support for future scientific research and development.
AUTHOR CONTRIBUTIONS
M.C., I.M., M.D.C., C.A.K. and P.J.K. conceived the idea of creating the shared HPC resource ‘UKCropDiversity-HPC’, and I.M. built the HPC infrastructure. L.P-A., J.C., M.D.C. and I.M. drafted the first outline of the manuscript; LP-A, IB, MDC, JC, MPC, SJ, PJK, CAK, CK, BL, WAM, JZ, MC, IM subsequently contributed additional text, edited and approved the manuscript.
ACKNOWLEDGEMENTS
UKCropDiversity-HPC is supported by the Biotechnology and Biological Sciences Research Council (BBSRC) Advanced Life Sciences Research Technology Initiative (ALERT) grants BB/S019669/1 and BB/X019683/1, and The Department of Business, Energy and Industrial Strategy Public Sector Research Establishment Infrastructure Fund.
CONFLICT OF INTEREST STATEMENT
The authors declare that they have no conflicts of interest.
DATA AVAILABILITY STATEMENT
The data that support this article are available from the corresponding author upon reasonable request.
Adams, T. M., Smith, M., Wang, Y., Brown, L. H., Bayer, M. M., & Hein, I. (2023). HISS: Snakemake‐based workflows for performing SMRT‐RenSeq assembly, AgRenSeq and dRenSeq for the discovery of novel plant disease resistance genes. BMC Bioinformatics, 24(1), 204. https://doi.org/10.1186/s12859-023-05335-8
Ardi, W., Campos‐Dominguez, L., Chung, K. F., Dong, W. K., Drinkwater, E., Fuller, D., Gagul, J., Garnett, G., Girmansyah, D., Goodall‐Copestake, W., & Hughes, M. (2022). Resolving phylogenetic and taxonomic conflict in begonia. Edinburgh Journal of Botany, 79, 1–28. https://doi.org/10.24823/ejb.2022.1928
Bachman, S. P., Brown, M. J. M., Leão, T. C. C., Nic Lughadha, E., & Walker, B. E. (2024). Extinction risk predictions for the world's flowering plants to support their conservation. New Phytologist, 242(2), 797–808. https://doi.org/10.1111/nph.19592
Barata, C., Snook, R. R., Ritchie, M. G., & Kosiol, C. (2023). Selection on the Fly: Short‐term adaptation to an altered sexual selection regime in Drosophila pseudoobscura. Genome Biology and Evolution, 15(7), evad113. https://doi.org/10.1093/gbe/evad113
Campos‐Dominguez, L., Pellicer, J., Matthews, A., Leitch, I. J., & Kidner, C. A. (2022). Evolutionary patterns of genome size and chromosome number variation in Begoniaceae. Edinburgh Journal of Botany, Begonia Special Issue, 79, 1–28. https://doi.org/10.24823/ejb.2022.1876
Chen, T., Chen, X., Zhang, S., Zhu, J., Tang, B., Wang, A., Dong, L., Zhang, Z., Yu, C., Sun, Y., Chi, L., Chen, H., Zhai, S., Sun, Y., Lan, L., Zhang, X., Xiao, J., Bao, Y., Wang, Y., … Zhao, W. (2021). The genome sequence archive family: Toward explosive data growth and diverse data types. Genomics, Proteomics & Bioinformatics, 19(4), 578–583. https://doi.org/10.1016/j.gpb.2021.08.001
Cock, P. J. A., Cooke, D. E. L., Thorpe, P., & Pritchard, L. (2023). THAPBI PICT‐a fast, cautious, and accurate metabarcoding analysis pipeline. PeerJ, 11, e15648. https://doi.org/10.7717/peerj.15648
Coulter, M., Entizne, J. C., Guo, W., Bayer, M., Wonneberger, R., Milne, L., Schreiber, M., Haaning, A., Muehlbauer, G. J., McCallum, N., Fuller, J., Simpson, C., Stein, N., Brown, J. W. S., Waugh, R., & Zhang, R. (2022). BaRTv2: A highly resolved barley reference transcriptome for accurate transcript‐specific RNA‐seq quantification. The Plant Journal, 111(4), 1183–1202. https://doi.org/10.1111/tpj.15871
Darwin Tree of Life Project Consortium. (2022). Sequence locally, think globally: The Darwin tree of life project. Proceedings of the National Academy of Sciences of the United States of America, 119(4), e2115642118. https://doi.org/10.1073/pnas.2115642118
Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., & Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature Biotechnology, 35(4), 316–319. https://doi.org/10.1038/nbt.3820
Ding, G., Shen, L., Dai, J., Jackson, R., Liu, S., Ali, M., Sun, L., Wen, M., Xiao, J., Deakin, G., Jiang, D., Wang, X. E., & Zhou, J. (2023). The dissection of nitrogen response traits using drone phenotyping and dynamic phenotypic analysis to explore N responsiveness and associated genetic loci in wheat. Plant Phenomics, 5, 1–18. https://doi.org/10.34133/plantphenomics.0128
Dykstra, D. (2022). Apptainer Without Setuid. Preprint available at arXiv. https://doi.org/10.48550/arXiv.2208.12106
Forrister, D. L., Endara, M. J., Soule, A. J., Younkin, G. C., Mills, A. G., Lokvam, J., Dexter, K. G., Pennington, R. T., Kidner, C. A., Nicholls, J. A., Loiseau, O., Kursar, T. A., & Coley, P. D. (2023). Diversity and divergence: Evolution of secondary metabolism in the tropical tree genus Inga. New Phytologist, 237(2), 631–642. https://doi.org/10.1111/nph.18554
Gagnon, E., Baldaszti, L., Moonlight, P., Knapp, S., Lehmann, C. E. R., & Särkinen, T. (2023). Functional and ecological diversification of underground organs in solanum. Frontiers in Genetics, 14, 1231413. https://doi.org/10.3389/fgene.2023.1231413
Graham, J., Smith, K., MacKenzie, K., Milne, L., Jennings, N., Mateos, B., & Hackett, C. (2022). Developmental QTL in a red raspberry Primocane X biennial raspberry population that exhibit Primocane fruiting. Journal of Horticulture, 9, 308.
Grüning, B., Dale, R., Sjödin, A., Chapman, B. A., Rowe, J., Tomkins‐Tinch, C. H., Valieris, R., Köster, J., & Bioconda Team. (2018). Bioconda: Sustainable and comprehensive software distribution for the life sciences. Nature Methods, 15(7), 475–476. https://doi.org/10.1038/s41592-018-0046-7
James, C., Fang, L., Wall, E. Coffey, M., & Li, B. (2024). Integrating deep phenotyping and whole‐genome sequencing to decipher the genetic basis of feed efficiency in dairy cattle. Proceedings for the 7th International Conference of Quantitative Genetics (ICQG), Vienna, 22–26 July, 2024.
Kaur, A., Singh, V., Byrne, S., Armstrong, M., Adams, T., Harrower, B., Gilroy, E., Mullins, E., & Hein, I. (2024). Transcriptional profiling during infection of potato NLRs and Phytophthora infestans effectors using cDNA enrichment sequencing. The Crop Journal. Advance online publication. https://doi.org/10.1016/j.cj.2024.09.013
Kelling, S., Hochachka, W. M., Fink, D., Riedewald, M., Caruana, R., Ballard, G., & Hooker, G. (2009). Data‐intensive science: A new paradigm for biodiversity studies. Bioscience, 59(7), 613–620. https://doi.org/10.1525/bio.2009.59.7.12
Louis, M., Korlević, P., Nykänen, M., Archer, F., Berrow, S., Brownlow, A., Lorenzen, E. D., O'Brien, J., Post, K., Racimo, F., Rogan, E., Rosel, P. E., Sinding, M. S., van der Es, H., Wales, N., Fontaine, M. C., Gaggiotti, O. E., & Foote, A. D. (2023). Ancient dolphin genomes reveal rapid repeated adaptation to coastal waters. Nature Communications, 14(1), 4020. https://doi.org/10.1038/s41467-023-39532-z
Marks, R. A., Hotaling, S., Frandsen, P. B., & VanBuren, R. (2021). Representation and participation across 20 years of plant genome sequencing. Nature Plants, 7(12), 1571–1578. https://doi.org/10.1038/s41477-021-01031-8
Marra, O., Mirto, M., Cafaro, M., & Giovanni, A. (2011). Green Computing and power saving in HPC data centers. CMCC Research Paper, 121, https://doi.org/10.2139/ssrn.2194478
Marsh, W. A., Brace, S., & Barnes, I. (2023). Inferring biological kinship in ancient datasets: Comparing the response of ancient DNA‐specific software packages to low coverage data. BMC Genomics, 24(1), 111. https://doi.org/10.1186/s12864-023-09198-4
Marx, V. (2013). The big challenges of big data. Nature, 498, 255–260. https://doi.org/10.1038/498255a
Merkel, D. (2014). Docker: Lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2.
Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins‐Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., Nahnsen, S., & Köster, J. (2021). Sustainable data analysis with Snakemake. F1000Research, 10, 33. https://doi.org/10.12688/f1000research.29032.2
Pérez‐Calle, V., Bellot, S., Kuhnhäuser, B. G., Pillon, Y., Forest, F., Leitch, I. J., & Baker, W. J. (2024). Phylogeny, biogeography and ecological diversification of new Caledonian palms (Arecaceae). Annals of Botany, 134, 85–100. https://doi.org/10.1093/aob/mcae043
Phang, A., Pezzini, F. F., Burslem, D. F., Khew, G. S., Middleton, D. J., Ruhsam, M., & Wilkie, P. (2023). Target capture sequencing for phylogenomic and population studies in the southeast Asian genus Palaquium (Sapotaceae). Botanical Journal of the Linnean Society, 203(2), 134–147. https://doi.org/10.1093/botlinnean/boad022
Plaxton, L., Hempel, E., Marsh, W. A., Portela Miguez, R., Waurick, I., Kitchener, A. C., Hofreiter, M., Lister, A. M., Zachos, F. E., & Brace, S. (2023). Assessing the identity of rare historical museum specimens of the extinct blue antelope (Hippotragus leucophaeus) using an ancient DNA approach. Mammalian Biology, 103, 549–560. https://doi.org/10.1007/s42991-023-00373-4
Pooley, C. M., Doeschl‐Wilson, A. B., & Marion, G. (2022). Estimation of age‐stratified contact rates during the COVID‐19 pandemic using a novel inference algorithm. Philosophical Transactions. Series a, Mathematical, Physical, and Engineering Sciences, 380(2233), 20210298. https://doi.org/10.1098/rsta.2021.0298
Rabenstein, B., & Volz, J. (2015). Prometheus: A next‐generation monitoring system (talk). USENIX Association.
Rees, M., Neaves, L. E., Lewis, G. P., de Lima, H. C., & Gagnon, E. (2023). Phylogenomic and morphological data reveal hidden patterns of diversity in the national tree of Brazil, Paubrasilia echinata. American Journal of Botany, 110(11), e16241. https://doi.org/10.1002/ajb2.16241
Sahu, S. K., Waseem, M., & Aslam, M. M. (2023). Editorial: Bioinformatics, big data and agriculture: A challenge for the future. Frontiers in Plant Science, 14, 1271305. https://doi.org/10.3389/fpls.2023.1271305
Schreiber, M., Wonneberger, R., Haaning, A. M., Coulter, M., Russell, J., Himmelbach, A., Fiebig, A., Muehlbauer, G. J., Stein, N., & Waugh, R. (2024). Genomic resources for a historical collection of cultivated two‐row European spring barley genotypes. Scientific Data, 11(1), 66. https://doi.org/10.1038/s41597-023-02850-4
Shakoor, N., Northrup, D., Murray, S., & Mockler, T. C. (2019). Big data‐driven agriculture: Big data analytics in plant breeding, genomics, and the use of remote sensing technologies to advance crop productivity. The Plant Phenome Journal, 2(1), 1–8. https://doi.org/10.2135/tppj2018.12.0009
Shen, L., Ding, G., Jackson, R., Ali, M., Liu, S., Mitchell, A., Shi, Y., Lu, X., Dai, J., Deakin, G., Freles, K., Cen, H., Ge, Y.‐F., & Zhou, J. (2024). GSP‐AI: An AI‐powered platform for identifying key growth stages and the vegetative‐to‐reproductive transition in wheat using trilateral drone imagery and meteorological data. Plant Phenomics, 6, 0255. https://doi.org/10.34133/plantphenomics.0255
Smith, M., Jones, J. T., & Hein, I. (2024). Resistify‐A rapid and accurate annotation tool to identify NLRs and study their genomic organisation. Preprint available at bioRxiv. https://doi.org/10.1101/2024.02.14.580321
Speak, S. A., Birley, T., Bortoluzzi, C., Clark, M. D., Percival‐Alwyn, L., Morales, H. E., & van Oosterhout, C. (2024). Genomics‐informed captive breeding can reduce inbreeding depression and the genetic load in zoo populations. Molecular Ecology Resources, 24, e13967. https://doi.org/10.1111/1755-0998.13967
Sun, G., Lu, H., Zhao, Y., Zhou, J., Jackson, R., Wang, Y., Xu, L. X., Wang, A., Colmer, J., Ober, E., Zhao, Q., Han, B., & Zhou, J. (2022). AirMeasurer: Open‐source software to quantify static and dynamic traits derived from multiseason aerial phenotyping to empower genetic mapping studies in rice. The New Phytologist, 236(4), 1584–1604. https://doi.org/10.1111/nph.18314
Teng, Z., Chen, J., Wang, J., Wu, S., Chen, R., Lin, Y., Shen, L., Jackson, R., Zhou, J., & Yang, C. (2023). Panicle‐cloud: An open and AI‐powered cloud computing platform for quantifying Rice panicles from drone‐collected imagery to enable the classification of yield production in Rice. Plant Phenomics, 5, 0105. https://doi.org/10.34133/plantphenomics.0105
Tsang, I., Thomelin, P., Ober, E., Rawsthorne, S., Atkinson, J. A., Wells, D. M., Percival‐Alwyn, L., Leigh, F. J., & Cockram, J. (2024). A novel root hair mutant, srh1, affects root hair elongation and reactive oxygen species levels in wheat. Frontiers in Plant Science, 15, 1490502. https://doi.org/10.3389/fpls.2024.1490502
United Nations. (2015). General Assembly Resolution A/RES/70/1. Transforming Our World, the 2030 Agenda for Sustainable Development. Available from: https://sdgs.un.org/2030agenda
Vitorino, R. (2023). Special issue: "bioinformatics and omics tools". International Journal of Molecular Sciences, 24(14), 11625. https://doi.org/10.3390/ijms241411625
Walkowiak, S., Gao, L., Monat, C., Haberer, G., Kassa, M. T., Brinton, J., Ramirez‐Gonzalez, R. H., Kolodziej, M. C., Delorean, E., Thambugala, D., Klymiuk, V., Byrns, B., Gundlach, H., Bandi, V., Siri, J. N., Nilsen, K., Aquino, C., Himmelbach, A., Copetti, D., … Pozniak, C. (2020). Multiple wheat genomes reveal global variation in modern breeding. Nature, 588(7837), 277–283. https://doi.org/10.1038/s41586-020-2961-x
Wang, Y., Brown, L. H., Adams, T. M., Cheung, Y. W., Li, J., Young, V., Todd, D. T., Armstrong, M. R., Neugebauer, K., Kaur, A., Harrower, B., Oome, S., Wang, X., Bayer, M., & Hein, I. (2023). SMRT‐AgRenSeq‐d in potato (Solanum tuberosum) as a method to identify candidates for the nematode resistance Gpa5. Horticulture Research, 10(11), uhad211. https://doi.org/10.1093/hr/uhad211
Wordsworth, S., Doble, B., Payne, K., Buchanan, J., Marshall, D. A., McCabe, C., & Regier, D. A. (2018). Using "big data" in the cost‐effectiveness analysis of next‐generation sequencing technologies: Challenges and potential solutions. Value in Health, 21(9), 1048–1053. https://doi.org/10.1016/j.jval.2018.06.016
Wright, T. I. C., Horsnell, R., Love, B., Burridge, A. J., Gardner, K. A., Jackson, R., Leigh, F. J., Ligeza, A., Heuer, S., Bentley, A. R., & Howell, P. (2024). A new winter wheat genetic resource harbors untapped diversity from synthetic hexaploid wheat. Theoretical and Applied Genetics, 137(3), 73. https://doi.org/10.1007/s00122-024-04577-1
Yoo, A. B., Jette, M. A., & Grondona, M. (2003). SLURM: Simple Linux utility for resource management. Lecture Notes in Computer Science, 2862, 44–60. https://doi.org/10.1007/10968987_3
Zuntini, A. R., Carruthers, T., Maurin, O., Bailey, P. C., Leempoel, K., Brewer, G. E., Epitawalage, N., Françoso, E., Gallego‐Paramo, B., McGinnie, C., Negrão, R., Roy, S. R., Simpson, L., Toledo Romero, E., Barber, V. M. A., Botigué, L., Clarkson, J. J., Cowan, R. S., Dodsworth, S., … Baker, W. J. (2024). Phylogenomics and the rise of the angiosperms. Nature, 629, 843–850. https://doi.org/10.1038/s41586-024-07324-0
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.