Content area
Premise
Biodiversity researchers often need to answer the question: “Which species of taxon X have been documented in (or near) spatial polygon Y?” Online databases with billions of occurrence records, including vouchered specimens and citizen science records, can provide the answer; however, quick spatial processing of huge biodiversity datasets can be difficult, and many general‐purpose tools are constrained by dataset size.
Methods and Results
infinitylists is a Shiny application and R package that allows users to generate species checklists for a user‐specified taxon and area. It downloads taxon–country datasets (e.g., Madagascan geckos) from biodiversity data providers and uses an open source, column‐oriented data file for fast retrieval and visualization. Available as a mobile‐friendly web tool with preloaded data, it can also be run locally in R for very flexible applications.
Conclusions
infinitylists is an easy‐to‐use tool with applications including supplementing survey data, planning collecting expeditions, and informing gap‐filling. infinitylists is a complementary tool to existing databases to help field ecologists and naturalists globally.
We are in an era of unprecedented volumes of biodiversity data (Farley et al., 2018; Kays et al., 2020). As of December 2023, the Global Biodiversity Information Facility (GBIF; ) held ~2.6 billion occurrence records, an increase of 1 billion since March 2021 (Feng et al., 2022). Historically, most biodiversity occurrence records have been vouchered specimens stored in museums and herbaria, with the universe of natural history specimens constituting billions of specimens. In recent years, however, the overwhelming majority of records contributed to databases such as GBIF or the Atlas of Living Australia (ALA; ) are “born digital” data, including observational records and digital vouchers such as photographs or sound recordings (Kays et al., 2020). Although this disparity is partially driven by the challenge of digitizing the vast quantities of vouchered specimens globally (Ball-Damerow et al., 2019), it also reflects the incredible amounts of data being generated by citizen science. The most popular global citizen science (also widely referred to as community science) platforms such as iNaturalist () and eBird () now generate tens of millions and hundreds of millions of records in a single year, respectively (Di Cecco et al., 2021; Rosenblatt et al., 2022). While physical vouchers remain the gold standard for biodiversity data (Funk et al., 2018), these citizen science data are increasingly used in research and conservation around the globe, and it is clear that the integration of these two complementary data streams—specimen-based records and citizen science records—is of high value for understanding contemporary species distributions (Spear et al., 2017; Soroye et al., 2018; Dimson et al., 2023; Ackerfield et al., 2024; Wenk et al., 2024a).
Given the many threats faced by biodiversity globally, including habitat destruction, climate change, and invasive species (Brondízio et al., 2019; Bellard et al., 2022), these data are more important than ever for informing research, conservation, and management practice and policy. However, the increasingly large size of these datasets can make downloading and processing occurrence records a time-consuming and resource-intensive task, even for relatively simple requests such as place-based regional checklists (Saran et al., 2022). Although there are now powerful R packages available for downloading and cleaning species occurrence data from platforms such as GBIF and the ALA, including rgbif (Chamberlain and Boettiger, 2017), bdc (Ribeiro et al., 2022), and galah (Westgate et al., 2022), the size and complexity of the databases mean that spatial queries may be slow, computationally intensive either in the cloud or for the client, and/or require coding or data skills. GBIF currently contains over 3 billion occurrence records, and that number will only grow with time; users need more solutions and tools to leverage these data.
We identify a specific query that is common for ecologists, conservationists and land managers, citizen scientists, and herbarium and museum staff: “Which species of taxon X have been documented in (or near) spatial polygon Y?” Rather than build another general-purpose tool, we focused on making the execution of this explicit question as easy for the user as possible.
Here, we present infinitylists, an interactive online tool for rapidly generating place-based regional species checklists. Our approach to the 3 billion occurrence records problem is to download a specific taxon–country combination of data (e.g., butterflies and moths from Peru, or plants from Australia) from data providers like GBIF to match the scope of an individual naturalist or field ecologist. Next, by leveraging the speed of the Apache Parquet data format, infinitylists quickly subsets, visualizes, and summarizes the relevant records for any given query. This two-step process—initial download followed by fast local exploration—provides an efficient, user-friendly way to investigate biodiversity data at a user-defined scale. We provide a web implementation with specific taxon–country combinations already downloaded. A slightly more proficient user can also quickly run this locally in R with any desired taxon–country combination.
METHODS AND RESULTS
Overview
infinitylists is a Shiny application that can generate local species lists for any taxon and location in the world. An online version allows mobile users to instantly generate a species checklist for butterflies and moths from Peru, cephalopods from the Philippines, geckos from Madagascar, and any location in Australia for one of five taxa: plants, marsupials, cicadas, butterflies, and odonates (dragonflies and damselflies). As a standalone R package, it further allows users to generate species checklists for any taxon–country combination in the world. infinitylists retrieves species occurrence records from either the ALA—Australia's national biodiversity database and the Australian node of GBIF (Belbin et al., 2021; Roger et al., 2023)—or GBIF's global node, and generates four outputs: a text statement, a map, a table, and a downloadable CSV file (Figure 1). It was developed using R (R Core Team, 2022) and the shiny package in R (Chang et al., 2023). The online version of infinitylists is available at (see Data Availability Statement). The R package version of infinitylists can be run natively on a user's computer and is downloadable from GitHub at , with all data and code for release 2.0.5 available at Zenodo (10.5281/zenodo.14998402; Cornwell et al., 2025). It can also be installed directly in R using remotes::install_github(“traitecoevo/infinitylists”). When launched via the R package, users can download Australian records for any taxon of interest and still use the same infinitylists interface and functionality using the function download_ala_obs(“taxon_name”). Users can download data and use infinitylists for other countries using the download_gbif_obs(“taxon_name”, “country_code”) function.
[IMAGE OMITTED. SEE PDF]
Conceptualization and key uses
Regional species checklists are a valuable resource (Denelle et al., 2023), including for monitoring declines in pollinators (Potts et al., 2016), assessing beta-diversity (König et al., 2017), understanding the relationship between native and alien floras (Bach et al., 2022), and documenting local extinctions (Finn et al., 2023). As noted by Sikes et al. (2016), however, “because taxonomy organizes data by taxon rather than region, it is easier to determine where a species occurs than to determine how many and which species occur in a region.” While data repositories such as GBIF contain incredible quantities of biodiversity data, they effectively present species-focused visualization tools rather than place-focused extraction functionality; generating regional checklists usually involves additional steps such as advanced search tools, or data filtering and processing using other programs. These data are also often more difficult, or impossible, to access from mobile devices such as smartphones, potentially limiting their use in the field. To help address barriers to data accessibility, we conceptualized infinitylists as a place-based taxonomic tool, operable on a desktop or mobile device, that allows users to instantly generate regional species checklists by taxon and region simultaneously.
infinitylists can be used during desktop surveys as a useful starting point, or post-survey to supplement checklists with species that may have been overlooked or unrecorded by surveyors. It is a powerful planning tool for museum or herbarium collecting expeditions, providing data on which species to expect in the focus area, informing collectors of which sites and habitats should be targeted, and providing location data for difficult-to-find taxa. Of particular value is the ability to use infinitylists in the field to assess and relocate the most recent record for each species. infinitylists also has high value as a tool to inform and inspire citizen scientists. This includes using the application to relocate previously recorded species and as a gap-filling tool to target unrecorded species to help build local checklists.
Approach to data filtering
Given that the primary aim of infinitylists is to generate place-based species checklists at a regional scale, we implemented data filters to minimize spatial uncertainty and ensure included records are verifiable and reflective of likely current diversity. Although we acknowledge that some of the records removed by these filters may be useful for some applications or project requirements, we do not provide the functionality to turn these filters off or allow inclusion of the removed records directly within the Shiny app, as the raw downloads are directly available from the ALA or GBIF to address that use case. However, given that our code is open source, it is possible for any user to use infinitylists via the R package and alter our filters to include additional records for their particular use cases. Broadly, however, we see the function of infinitylists as providing access to a subset of the data specifically curated for one important use case. Accordingly, we removed five types of occurrence records.
First, we removed all records with a coordinate uncertainty of >1000 m. Given the place-based focus of infinitylists, we remove records with high spatial uncertainty. This is especially important for areas such as small reserves, as large spatial uncertainty reduces confidence in whether a record occurred within the region of interest. We chose 1000 m as a cutoff as this is the approximate minimum generalization value used to mask the locations of sensitive species records in the ALA. This exclusion setting therefore removes all species with sensitive or otherwise obscured locations from infinitylists. These data can instead be accessed through specialized portals such as the national Restricted Access Species Data Service () in the case of Australian records or similar national sensitive species data providers for other regions of the world. This cutoff also disproportionately removes older vouchered specimens, especially those collected in the late 1800s and early 1900s; many of these records are associated with large uncertainty values (e.g., 10,000 m or 25,000 m) due to often imprecise locality names needing to be converted to an approximate set of coordinates during digitization (Wenk et al., 2024a).
While some legitimate records are removed by this filter, especially for areas such as large national parks, false positive records are generally more impactful than false negatives and thus more important to remove from checklists (Molinari-Jobin et al., 2012; Groom and Whild, 2017). If a species’ absence from a checklist is suspected to be a false negative, more survey effort can be invested in the area; however, false positives are more difficult to disprove and can linger in checklists indefinitely (Groom and Whild, 2017).
Second, we removed all unvouchered records. We only include records associated with a physical voucher (specimen) stored in a museum or herbarium or with a digital voucher (photograph[s] or audio file[s]) uploaded to iNaturalist (i.e., verifiable records). This allows users to inspect any record retrieved by infinitylists and assess whether it is correctly identified, misidentified, or if there is insufficient evidence for a species identification. While many survey-based, non-vouchered occurrence records are accurate, observer errors in biodiversity surveys (e.g., misidentifications) are nonetheless ubiquitous (Groom and Whild, 2017; Morrison, 2021). Any occurrence record without an associated physical specimen, photograph(s), or sound recording is inherently impossible to verify.
Third, we removed all records pre-dating 1923. If a species has not been collected or photographed at a location for more than 100 years, we assume an increased likelihood it is no longer present. Given one of the primary uses of infinitylists is for compiling checklists of current diversity, we apply a 100-year cut-off starting from 2023 when the idea for infinitylists was conceived.
Fourth, we removed any records considered to have spatial issues by the ALA or GBIF. Records for which the supplied coordinates are zero, the interpreted occurrence coordinates do not match the indicated country, the coordinate values cannot be interpreted, or the coordinates are outside the possible range of values are excluded from infinitylists.
Finally, we removed records identified to a rank coarser than species. Because most checklists are interested in taxa at a species level, we omit records identified to any coarser rank. infinitylists includes records identified to infraspecific taxa, but these are displayed within the application as the species.
Data selection process
Users can select from four spatial filters: Preloaded Place, Upload KML, Use current location, and Choose a lat/long (Figure 2).
[IMAGE OMITTED. SEE PDF]
We offer 17 Preloaded Places from across Australia, Peru, the Philippines, and Madagascar, including national parks, nature reserves, and conservation properties, to demonstrate the functionality of infinitylists. Users can also select the Upload KML filter to import a KML file from anywhere in these four countries (or from other countries if using infinitylists via the R package), although the file is not retained if the application is refreshed in the browser.
The Use current location filter allows users to choose one of seven radius values between 100 m and 50 km and then filters records to the user's current location. Before using this filter, users must allow location access on their mobile device or desktop for the browser they intend to use, otherwise the application will not work. The “Coords” tab on the output screen indicates the retrieved coordinates and their positional accuracy. The Choose a lat/long filter applies the same filtering as for Use current location, but users manually enter a set of coordinates.
Users have the additional choice of adding a buffer zone to their target area, with the same seven radius values as for the Use current location and Choose a lat/long filters. This zone replicates the shape of the target area at low radius values, but approaches a circle at the highest values. Applying a buffer will retrieve records for species that have been recorded in the buffer zone but not in the target area. The addition of a buffer zone in conjunction with the Use current location or Choose a lat/long filters therefore facilitates queries for areas up to a maximum total radius of 100 km.
Users can choose one of eight available taxa within the Shiny application—Plantae (plants), Cicadoidea (cicadas), Marsupialia (marsupials), Odonata (odonates; dragonflies and damselflies), or Papilionoidea (butterflies) for Australia; Lepidoptera (butterflies and moths) for Peru; Cephalopoda (octopuses, squids, and relatives) for the Philippines; Gekkonidae (geckos) for Madagascar—and then either select a single family or genus within that taxon or retain all families and genera. A specific family or genus can also be selected for any other taxa when using the R package. We chose these default taxa because (1) they enabled us to demonstrate the ability of infinitylists to handle both photographic and audio-based records (i.e., sound recordings of cicadas); (2) these taxa are of strong interest to citizen scientists (see, e.g., Mesaglio et al., 2021), and thus their inclusion will assist in maximizing their engagement with infinitylists; and (3) for the five Australian groups, we had comprehensive data on establishment means (i.e., native versus introduced).
Data outputs
Four outputs are generated by infinitylists: a text statement, a map, a table, and a downloadable CSV file.
The text statement summarizes the number of species recorded from the target area (including the buffer, if applied), the number of genera and families, and how many of these species are native. Separate species totals are also reported for collection-based and citizen science records.
Within the map, the target area is delineated in red and the buffer, if applied, in orange. Each record is represented by a blue pin on the map. Spatially clustered records are aggregated into circular markers colored based on record density, with a number indicating the total records at that marker. Zooming in resolves these markers into separate pins; markers can also be clicked to force pin resolution. Only the most recent record for each species per voucher type—collection (physical voucher), photograph (digital voucher), or audio (digital voucher)—is displayed on the map.
The table provides a detailed summary of all records that appear on the map, i.e., the most recent record for each species per voucher type. Ten data columns are provided (Table 1), including a hyperlink to the original record for each occurrence. The downloadable CSV file contains all species occurrence records for the selected area and taxon, not just the most recent record(s) for each species.
Table 1 Data columns provided in output table and downloadable CSV file.
| Column name | Description | Output table or CSV |
| Species | Provides the species name for the record. Any record identified to an infraspecific rank (e.g., subspecies) is reported as the species only. | Both |
| Genus | Provides the genus name for the record. | CSV |
| Family | Provides the family name for the record. | CSV |
| Establishment means | Indicates whether the species is native or non-native to Australia. Species that can be native or non-native within different regions of the same country depending on the area of interest (e.g., Acacia baileyana F. Muell., Pittosporum undulatum Vent.) are annotated as native. Note that non-Australian taxa and non-vascular plant species return a value of unknown. | Both |
| Voucher type | Indicates which voucher type is attached to the record. Returned values are collection (physical voucher), photograph (digital voucher), or audio (digital voucher). In the output table, one row is generated for each species per voucher type. | Both |
| In target area | Indicates whether the record is within the selected place (in target), or only in buffer. | Both |
| N | Indicates the total number of records for the species and voucher type. | Table |
| Most recent obs. | Indicates the date of the most recent record for the species and voucher type. | Table |
| Collection date | Indicates the date of the record. This column is only available in the CSV as it includes all records for each species, not just the most recent record. | CSV |
| Lat | Latitude of the record in decimal degrees. | Both |
| Long | Longitude of the record in decimal degrees. | Both |
| Repository | Indicates where the voucher associated with the record is stored. For all photographic and audio-based records, the repository is iNaturalist (iNat). For collection-based records, the repository is indicated by a museum or herbarium code (e.g., AM, UNSW). In the output table, the text in each row of this column is hyperlinked; clicking this link will redirect the user to the original record in either iNaturalist (photographic and audio-based records), the Atlas of Living Australia, or GBIF (collection-based records). In the CSV, this hyperlink is provided in a separate column, ‘Link’. | Both |
| Recorded by | Indicates the name of the record collector. | Both |
System design and features
- 1.
Handling big biodiversity data: In the past, Shiny applications that stored or processed large amounts of data were typically limited in their responsiveness. infinitylists bypasses this bottleneck by storing processed data as Apache Parquet using the arrow package in R (Richardson et al., 2024). Parquet is a columnar memory format for fast reading and compressed storage of big data. A unique feature of Parquet files is that they contain metadata that allows users to narrow down to the relevant parts of the data without loading entire datasets into memory—a feature often called “lazy loading” (Ripley, 2004). There are nonetheless still constraints to infinitylists regarding data size, especially as downloads or selected polygons become larger, that we expect will relate to the available computer memory (RAM) of the user's device. Given the wide variety of computing environments in the world, it is difficult to provide specific guidelines on when these limits will be reached. However, using Parquet files and selective loading into memory should reduce the cases where these limits are reached relative to conventional data science approaches in R or Python that load the entire file into RAM.
When a user submits a query, infinitylists performs four very fast operations: (1) find the right Parquet file, (2) load only the required part of the Parquet file into memory based on the user's taxonomic and spatial selections, (3) summarize this part of the data for the table, and (4) plot the locations on the map. The speed of these operations comes from a combination of the arrow (Richardson et al., 2024), data.table (Barrett et al., 2024), and leaflet (Cheng et al., 2023) packages in R.
For display to the user, we make two important simplifications. The first is to subset the columns to those deemed most useful in the field (Table 1). The second is that, for graphical and tabular displays, we subset all species occurrences to only the most recent record of each species for each voucher type. This is based on the premise that, generally, the location of the most recent observation for any given species is the most likely location to re-find it.
- 2.
Preserving links to vouchers: All species occurrences retrieved by infinitylists are linked to the original record in the ALA, GBIF, or iNaturalist. This is important for data validation and error flagging, allowing users to assess the quality of all records themselves.
- 3.
Offline, local execution to allow additional flexibility and long-term viability: Users can clone our GitHub repository and run infinitylists locally in R to allow for both additional taxa–country combinations and up-to-date species occurrence data. All code is open, archived with a DOI, and has been tested using continuous integration, maintaining long-term viability as the cloud computing environment changes. Additional taxa can be downloaded via the galah package, which interfaces with both the ALA and GBIF global nodes (Westgate et al., 2022). Download speed is fast for taxa with a small number of observations and slower for taxa with millions of observations. Users may leverage RStudio's background job functionality or high-performance computing if necessary.
- 4.
Sorting native versus introduced taxa: infinitylists’ Australian use case relies on the functionality of assessment by the Australian Plant Census via the software interface APCalign (Wenk et al., 2024b) for records that are downloaded via the ALA.
- 5.
Approach to identifying likely but as yet unobserved taxa in the area of interest: While many platforms, such as Map of Life (), approach the identification of likely but currently unobserved taxa by integrating expert range maps with species distribution models (Jetz et al., 2012; Merow et al., 2017; Mainali et al., 2020), we instead implement a buffer tool (see Young et al., 2021 for a similar approach), although we see our approach as complementary. Our key aim in using a buffer as a predictive tool is to report only verifiable species records with no extrapolation, which should minimize false positives introduced by range map estimation. This approach may be more prone to false negatives in areas that have historically received less scientific attention. In these cases, we recommend using a larger buffer or larger polygon to search for possible species.
- 6.
Testing: infinitylists uses both internal and external testing to ensure the application and R package runs smoothly. Our first line of internal testing is standard R package testing protocols (R CMD check) to verify the installation of our package across multiple operating systems (Windows, MacOS, and Ubuntu) and R versions (latest and previous release, development version). Next, we included a series of unit tests using the testthat package in R (Wickham, 2011) to verify the outputs generated by our functions. These unit tests ensure future updates and maintenance to the software do not break previous capabilities. We enlisted GitHub Actions, a continuous integration and deployment platform, to trigger our internal testing pipelines each time a source code change is made. For external testing, we conducted a series of beta tests where one of the authors (T.M.) and invited users intensively interacted with the software to uncover as many issues as possible and provide suggestions for useful new features.
- 7.
Generalizable framework: infinitylists relies on the GBIF network for occurrence record data. Using the global GBIF application programming interface, users from around the world can download data and leverage the infinitylists design and interface for their own use cases. The download_gbif_obs(“taxon_name”, “country_code”) function allows the user to specify the country from which they want to request occurrence data. As for Australia, users can also download data for other taxa in addition to the pre-loaded groups. A detailed tutorial is available on the GitHub page and via our R package vignette browseVignettes(“infinitylists”).
Case studies
National Herbarium of New South Wales collecting expedition
Located in the Northern Beaches area of Sydney, North Head Sanctuary is of high ecological significance, containing the largest extant fragment (~69 ha) of Eastern Suburbs Banksia Scrub (Lambert and Lambert, 2015). The North Head Sanctuary flora had historically been poorly collected, in part because the Sanctuary is part of Commonwealth Land, requiring a different type of collecting permit than the general scientific license used by many researchers and organizations across New South Wales. The Sydney Harbour Federation Trust engaged the National Herbarium of New South Wales to conduct a botanical collecting expedition in 2023 at North Head Sanctuary, providing an invaluable opportunity to test infinitylists in the field.
First, we used infinitylists to download all plant records for North Head Sanctuary on 6 September 2023 and rapidly generate a preliminary species checklist. The use of infinitylists to incorporate iNaturalist observations into the checklist was especially valuable given the dearth of herbarium collections. On 7 September 2023, T.M., H.S., and three other botanists conducted a pre-expedition scoping trip to locate and photograph new plant species in preparation for collection. The preliminary checklist greatly assisted this exercise, informing the recording of an additional 49 species for the area.
We used infinitylists again to download all plant records for North Head Sanctuary on 11 September 2023 and supplemented this dataset with floristic data from pre- and post-fire quadrats and observational studies conducted in the area (Perkins et al., 2012; Lambert and Lambert, 2015; Hammill, 2021) to create an updated checklist.
On 28 September 2023, the National Herbarium of New South Wales conducted a collecting expedition at North Head Sanctuary that aimed to collect species that had been photographed but never physically vouchered, as well as species entirely unrecorded for the site. Each collecting team used infinitylists throughout the expedition to relocate species that were as yet uncollected but that had been previously recorded in the area on iNaturalist. This proved especially useful for rare and easily overlooked species. For example, Boronia parviflora Sm., a small subshrub in Rutaceae, and Patersonia fragilis (Labill.) Asch. & Graebn., a small herb in Iridaceae, were both only known from a single location in the sanctuary, with photographic vouchers uploaded to iNaturalist during the scoping trip. The team assigned to that zone used infinitylists to easily find the coordinates of both plants and relocate them for collection. Given the collecting expedition was limited to just eight hours, the ability to rapidly relocate species of interest was of great benefit for maximizing total vouchers, with 231 collections made that day representing at least 132 distinct plant species.
Individual tests
T.M. also conducted a series of opportunistic field trials of infinitylists, demonstrating its value across four different use cases:
- 1.
Facilitate opportunistic collections: After observing an uncommon invasive species in Sydney, T.M. selected the Use current location modality; 5 km and 10 km radius searches revealed no vouchers, and a 50 km radius search yielded a single collection from 25 years ago, prompting T.M. to voucher a specimen.
- 2.
Inform targets for documentation of new species occurrences: Before visiting a nature reserve in Western Sydney, T.M. uploaded a KML file for the reserve to infinitylists and downloaded a species checklist. The list was used to cross-reference each species encountered in the reserve to ensure newly observed species were photographed and recorded, resulting in photographic vouchers for 49 species that had never been recorded from the reserve.
- 3.
Confirm continued presence of a species: During a bushwalk in the Blue Mountains west of Sydney, it was evident the trail had been significantly burnt during the Black Summer bushfires of 2019–2020. T.M. used infinitylists to find species that had been collected from the area before the fires occurred, but which had not been recorded since. One species had last been vouchered in 2006; after focused searching informed by infinitylists, T.M. found the species and confirmed its continued presence in the area.
- 4.
Discover new species occurrences from similar habitats: While botanizing in northern New South Wales, T.M. used the Choose a lat/long modality to find all plant species with physical vouchers within a 5 km radius, set a 5 km buffer, and searched for species only in the buffer zone. Among the results was a 1980 collection of a rare aquatic plant species from almost 10 km away. Focused searching of similar habitat in the area resulted in finding a new population of the species.
CONCLUSIONS
It is clear that species checklists are a valuable tool for identifying possible local extinctions, supplementing biodiversity surveys, and informing collecting expeditions and gap-filling exercises. The most robust checklists are generated by integrating vouchered specimen records with citizen science records, especially given the exponential increase in the latter over the past decade. We designed infinitylists as a complementary tool to existing databases such as GBIF and the ALA, allowing users to easily generate place-based species checklists on both desktop and mobile devices, with fast performance driven by our storage of data as Apache Parquet. We anticipate broad use of infinitylists by researchers, land managers, herbarium and museum staff, and citizen scientists and encourage the development of similar tools for other regions of the world.
AUTHOR CONTRIBUTIONS
T.M., F.K., H.S., and W.K.C. conceived the ideas and designed methodology; F.K. acquired the financial support for the project leading to this publication; F.K. and W.K.C. wrote the code and led the software development; and T.M. led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.
ACKNOWLEDGMENTS
The authors thank Geoff Lambert, Judy Lambert, Peter Jensen, and the Botanic Gardens of Sydney staff, especially the collection trip organizers Russell Barrett, Marie-B Foyard, Hannah McPherson, and Peter Jobson for assistance with the North Head expedition; Dorothy Luther, Nick Lambert, Peter Crowcroft, Daniel Falster, and Guy Taseski for testing infinitylists and making valuable suggestions; and two anonymous reviewers whose comments improved this manuscript. F.K. is supported by a UNSW Research Infrastructure Scheme grant which funded the production and deployment of infinitylists. Open access publishing facilitated by University of New South Wales, as part of the Wiley - University of New South Wales agreement via the Council of Australian University Librarians.
DATA AVAILABILITY STATEMENT
The online version of infinitylists is available at . The R package version of infinitylists can be run natively on a user's computer and is downloadable from GitHub at , with all data and code for release 2.0.5 available at Zenodo (10.5281/zenodo.14998402; Cornwell et al., 2025).
Ackerfield, J., B. M. Boom, E. Gandy, and L. Paradiso. 2024. EcoFloras elucidate insights from biodiversity data: Evaluating the strengths and limitations of iNaturalist observations and herbarium specimens. Preprints.org [Preprint]. https://doi.org/10.20944/preprints202410.0017.v1 [posted 1 October 2024; accessed 8 May 2025].
Bach, W., H. Kreft, D. Craven, C. König, J. Schrader, A. Taylor, W. Dawson, et al. 2022. Phylogenetic composition of native island floras influences naturalized alien species richness. Ecography 2022(11): e06227.
Ball‐Damerow, J. E., L. Brenskelle, N. Barve, P. S. Soltis, P. Sierwald, R. Bieler, R. LaFrance, et al. 2019. Research applications of primary biodiversity databases in the digital age. PLoS ONE 14(9): e0215794.
Barrett, T., M. Dowle, A. Srinivasan, J. Gorecki, M. Chirico, and T. Hocking. 2024. data.table: Extension of ‘data.frame‘. R package version 1.15.0. Available at https://CRAN.R-project.org/package=data.table [accessed 12 May 2025].
Belbin, L., E. Wallis, D. Hobern, and A. Zerger. 2021. The Atlas of Living Australia: History, current state and future directions. Biodiversity Data Journal 9: e65023.
Bellard, C., C. Marino, and F. Courchamp. 2022. Ranking threats to biodiversity and why it doesn't matter. Nature Communications 13(1): 2616.
Brondízio, E. S., J. Settele, S. Díaz, and H. T. Ngo [eds.]. 2019. Global assessment report of the Intergovernmental Science‐Policy Platform on Biodiversity and Ecosystem Services. IPBES, Bonn, Germany.
Chamberlain, S., and C. Boettiger. 2017. R Python, and Ruby clients for GBIF species occurrence data. PeerJ PrePrints 5: e3304v1. https://doi.org/10.7287/peerj.preprints.3304v1
Chang, W., J. Cheng, J. Allaire, C. Sievert, B. Schloerke, Y. Xie, J. Allen, et al. 2023. shiny: web application framework for R. R package version 1.7.4.1. Available at https://CRAN.R-project.org/package=shiny [accessed 12 May 2025].
Cheng, J., B. Schloerke, B. Karambelkar, and Y. Xie. 2023. leaflet: Create interactive web maps with the JavaScript ‘Leaflet’ library. R package version 2.2.1. Available at https://CRAN.R-project.org/package=leaflet [accessed 12 May 2025].
Cornwell, W., F. Kar, and T. Mesaglio. 2025. Traitecoevo/infinitylists: March 2025 v3 release. Available at Zenodo repository: https://doi.org/10.5281/zenodo.14998402 [posted 10 March 2025; accessed 8 May 2025].
Denelle, P., P. Weigelt, and H. Kreft. 2023. GIFT—An R package to access the Global Inventory of Floras and Traits. Methods in Ecology and Evolution 14(11): 2738–2748.
Di Cecco, G. J., V. Barve, M. W. Belitz, B. J. Stucky, R. P. Guralnick, and A. H. Hurlbert. 2021. Observing the observers: How participants contribute data to iNaturalist and implications for biodiversity science. BioScience 71(11): 1179–1188.
Dimson, M., L. Berio Fortini, M. W. Tingley, and T. W. Gillespie. 2023. Citizen science can complement professional invasive plant surveys and improve estimates of suitable habitat. Diversity and Distributions 29(9): 1141–1156.
Farley, S. S., A. Dawson, S. J. Goring, and J. W. Williams. 2018. Situating ecology as a big‐data science: Current advances, challenges, and solutions. BioScience 68(8): 563–576.
Feng, X., B. J. Enquist, D. S. Park, B. Boyle, D. D. Breshears, R. V. Gallagher, A. Lien, et al. 2022. A review of the heterogeneous landscape of biodiversity databases: Opportunities and challenges for a synthesized biodiversity knowledge base. Global Ecology and Biogeography 31(7): 1242–1260.
Finn, C., F. Grattarola, and D. Pincheira‐Donoso. 2023. More losers than winners: Investigating Anthropocene defaunation through the diversity of population trends. Biological Reviews 98: 1732–1748.
Funk, V. A., R. Edwards, and S. Keeley. 2018. The problem with (out) vouchers. Taxon 67(1): 3–5.
Groom, Q. J., and S. J. Whild. 2017. Characterisation of false‐positive observations in botanical surveys. PeerJ 5: e3324.
Hammill, K. 2021. Eastern Suburbs Banksia Scrub restoration project, North Head Sanctuary, Manly. Data analysis report on the vegetation monitoring of the 2018 burn block North Fort Road, North Head. Prepared for Sydney Harbour Federation Trust, Sydney, New South Wales, Australia.
Jetz, W., J. M. McPherson, and R. P. Guralnick. 2012. Integrating biodiversity distribution knowledge: Toward a global map of life. Trends in Ecology & Evolution 27(3): 151–159.
Kays, R., W. J. McShea, and M. Wikelski. 2020. Born‐digital biodiversity data: Millions and billions. Diversity and Distributions 26(5): 644–648.
König, C., P. Weigelt, and H. Kreft. 2017. Dissecting global turnover in vascular plants. Global Ecology and Biogeography 26(2): 228–242.
Lambert, G., and J. Lambert. 2015. Progress with restoration and management of Eastern Suburbs Banksia Scrub on North Head, Sydney. Ecological Management & Restoration 16: 95–105.
Mainali, K., T. Hefley, L. Ries, and W. F. Fagan. 2020. Matching expert range maps with species distribution model predictions. Conservation Biology 34(5): 1292–1304.
Merow, C., A. M. Wilson, and W. Jetz. 2017. Integrating occurrence data and expert maps for improved species range predictions. Global Ecology and Biogeography 26(2): 243–258.
Mesaglio, T., A. Soh, S. Kurniawidjaja, and C. Sexton. 2021. ‘First Known Photographs of Living Specimens’: The power of iNaturalist for recording rare tropical butterflies. Journal of Insect Conservation 25: 905–911.
Molinari‐Jobin, A., M. Kéry, E. Marboutin, P. Molinari, I. Koren, C. Fuxjäger, C. Breitenmoser‐Würsten, et al. 2012. Monitoring in the presence of species misidentification: The case of the Eurasian lynx in the Alps. Animal Conservation 15(3): 266–273.
Morrison, L. W. 2021. Nonsampling error in vegetation surveys: Understanding error types and recommendations for reducing their occurrence. Plant Ecology 222(5): 577–586.
Perkins, I., J. Diamond, G. SanRoque, L. Raffan, B. Digby, P. Jensen, and D. Hirschfeld. 2012. Eastern Suburbs Banksia Scrub: Rescuing an endangered ecological community. Ecological Management & Restoration 13(3): 224–237.
Potts, S. G., V. Imperatriz‐Fonseca, H. T. Ngo, J. C. Biesmeijer, T. D. Breeze, L. V. Dicks, L. A. Garibaldi, et al. 2016. The assessment report on pollinators, pollination and food production: Summary for policymakers. Secretariat of the Intergovernmental Science‐Policy Platform on Biodiversity and Ecosystem Services, Bonn, Germany.
R Core Team. 2022. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. Website https://www.R-project.org/ [accessed 8 May 2025].
Ribeiro, B. R., S. J. E. Velazco, K. Guidoni‐Martins, G. Tessarolo, L. Jardim, S. P. Bachman, and R. Loyola. 2022. bdc: A toolkit for standardizing, integrating and cleaning biodiversity data. Methods in Ecology and Evolution 13(7): 1421–1428.
Richardson, N., I. Cook, N. Crane, D. Dunnington, R. François, J. Keane, D. Moldovan‐Grünfeld, et al. 2024. arrow: Integration to ‘Apache’ ‘Arrow’. R package version 16.1.0. https://arrow.apache.org/docs/r/, https://github.com/apache/arrow/ [accessed 12 May 2025].
Ripley, B. D. 2004. Lazy loading and packages in R 2.0. 0. R News 4(2): 2–4.
Roger, E., C. Slayter, D. J. Kellie, P. Brenton, E. Wallis, O. Torresan, and A. Zerger. 2023. Open access research infrastructures are critical for improving the accessibility and utility of citizen science: A case study of Australia's national biodiversity infrastructure, the Atlas of Living Australia (ALA). Citizen Science Theory and Practice 8(1): 1–15.
Rosenblatt, C. J., A. A. Dayer, J. N. Duberstein, T. B. Phillips, H. W. Harshaw, D. C. Fulton, N. W. Cole, et al. 2022. Highly specialized recreationists contribute the most to the citizen science project eBird. Ornithological Applications 124(2): p.duac008.
Saran, S., S. K. Chaudhary, P. Singh, A. Tiwari, and V. Kumar. 2022. A comprehensive review on biodiversity information portals. Biodiversity and Conservation 31(5–6): 1445–1468.
Sikes, D. S., K. Copas, T. Hirsch, J. T. Longino, and D. Schigel. 2016. On natural history collections, digitized and not: A response to Ferro and Flick. ZooKeys 618: 145–158.
Soroye, P., N. Ahmed, and J. T. Kerr. 2018. Opportunistic citizen science data transform understanding of species distributions, phenology, and diversity gradients for global change research. Global Change Biology 24(11): 5281–5291.
Spear, D. M., G. B. Pauly, and K. Kaiser. 2017. Citizen science as a tool for augmenting museum collection data from urban areas. Frontiers in Ecology and Evolution 5: 86.
Wenk, E., T. Mesaglio, D. Keith, and W. Cornwell. 2024a. Curating protected area‐level species lists in an era of diverse and dynamic data sources. Ecological Informatics 84: 102921.
Wenk, E. H., W. K. Cornwell, A. Fuchs, F. Kar, A. M. Monro, H. Sauquet, R. E. Stephens, and D. S. Falster. 2024b. APCalign: An R package workflow and app for aligning and updating flora names to the Australian Plant Census. Australian Journal of Botany 72(4): BT24014.
Westgate, M., M. Stevenson, D. Kellie, and P. Newman. 2022. galah: Atlas of Living Australia (ALA) data and resources in R. R package version 2.0.2. Available at https://CRAN.R-project.org/package=galah [accessed 12 May 2025].
Wickham, H. 2011. testthat: Get started with testing. The R Journal 3: 5–10.
Young, B. E., M. T. Lee, M. Frey, K. Barnes, and P. Hopkins. 2021. Using citizen science observations to develop managed area watch lists. Natural Areas Journal 41(4): 307–314.
© 2025. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.