Digitization protocol for scoring reproductive

Full text

Turn on search term navigation

The digitization of herbarium specimens and their associated data have advanced our ability to understand complex and changing biological systems (Johnson, ; Pearse et al., ; Willis et al., ). Digitizing herbarium records (capturing the taxon name, date of collection, location, and/or digital image) has advanced our ability to track changes in the distributions of organisms (Lavoie, ), but herbarium specimens are rich with additional information regarding plant health, reproductive condition, and morphology that is generally not captured in current digitization workflows (Nelson et al., ). Because the utility of specimens for research is accelerating, it is essential that we structure digital data collection in ways that best facilitate longevity and integration across data sources.

Of particular interest is the enormous potential of herbarium specimens as a resource for information on plant phenology (the timing of seasonal events such as flowering or fruiting). Plant phenology has complex, cascading effects on multiple levels of biological organization from individuals to ecosystems (Bertin, ). Temporal mismatches between plants and pollinators can quickly drive populations extinct, cause rapid evolutionary shifts, and result in billions of dollars of agricultural losses (Visser and Both, ; Both et al., ; Körner and Basler, ; Miller‐Rumpff et al., ; Ozgul et al., ; Struttmann et al., ). Phenology has also been used to study the impact of climate change in a range of organisms and vegetation types (Bowers, ; Houle, ; Anderson et al., ; Lavoie, ). Consequently, maximizing the use of herbarium specimens for phenological research is not only important for improving our understanding of evolutionary change, it is also a matter of great practical concern for addressing environmental problems.

Recent studies have demonstrated the potential of herbarium specimens to be used in evaluating temporal and spatial variation in plant phenology (see Willis et al., for a review of these studies) despite known biases of herbarium records (Meyer et al., ; Pearse et al., ; Daru et al., ). These studies have provided three valuable outcomes. First, for several species, we now have a quantitative historical understanding of their phenological change over time (Rivera and Borchert, ; Primack et al., ; Miller‐Rushing et al., ; Gallagher et al., ; Robbirt et al., ; Park ; Davis et al., ). Second, for some species, relationships between temporal or spatial variation in phenology and climate (e.g., local temperature and/or precipitation) have been detected; these relationships, in turn, provide a basis for forecasting the effects of ongoing climate change on the seasonal cycles of these taxa (Badeck et al., ; Franks et al., ; Matthews and Mazer, ; Prevéy et al., ). Third, we have an improved understanding of the specific advantages of herbarium specimens for phenological research, such as filling gaps in long‐term or observational data sets, either for a period of time (Panchen et al., ) or for underrepresented regions (Gallagher et al., ; Panchen and Gorelick, ).

Given the ecological importance of phenology, the demonstrated value of herbarium specimens for phenological research, and the potential for digitization efforts to maximize herbarium records as a resource, it is necessary to develop robust standards for how phenological data are captured during or after the digitization process. There are currently two principal limitations to accessing and using phenological data from herbarium specimens: (1) the paucity of high‐quality images accompanying digitized specimen records and (2) the lack of standardized methodology for capturing specimens’ reproductive traits and sharing the resulting data. If any phenological information is present on a label or visible on a specimen, it is parsed in numerous—and often arbitrary—ways during digitization. For example, phenological data embedded in a label might be “on a south facing slope in full flower,” but this information might be digitally captured in the ‘habitat,’ ‘notes,’ ‘plant description,’ or other field of a local database.

Even if a local database does contain a field explicitly for phenological characters, each institution independently decides how to record the states present on the sheet. For example, in the SEINet collaborative (http://swbiodiversity.org/seinet/), which consists of 251 U.S. collections and 11.8 million records, there are 2.6 million (22%) records with text present in a database field called ‘reproductiveCondition.’ The majority of terms found within this field specify flowering, fruiting, sterile, spores, and/or cones; however, these terms are expressed in over 4000 unique text strings (Table ). Some collections specify “flowering,” while other records state “flws.” Some are ambiguous (12,000 are coded as merely “u,” presumably meaning “unknown” or “unrecorded”). The lack of a controlled vocabulary for this field makes aggregating these data for research purposes onerous. Local databases often share their data with data aggregators such as iDigBio (https://www.idigbio.org/) or directly with users as a suite of Darwin Core Archive files, an exchange standard described more fully below (Wieczorek et al., ; http://rs.tdwg.org/dwc/). However, the relevant Darwin Core fields are equally diverse, with most phenological traits being placed into the fields ‘occurrenceRemarks,’ ‘organismRemarks,’ ‘dynamicProperties,’ ‘reproductiveCondition,’ or ‘fieldNotes.’

Most frequently found text strings in the Darwin Core field ‘reproductiveCondition’ and the number of specimens in SEINet with that exact string. Nearly all of these can be quickly scored according to the protocol proposed here using the Attribute Mining Tool in Symbiota‐based databases

‘reproductiveCondition’ text samples	Specimen count
Flowering¹	434,637
Flowering and fruiting	285,865
flower¹	174,751
Fruiting	160,853
fl¹	136,544
fr	97,098
flowering = early¹	90,578
fl‐fr	90,132
flowers¹	86,372
fruit	85,260
flowering + fruiting = mid	75,128
vegetative	47,785
flr¹	44,529
fertile	39,309
Flo¹	36,867
veg	36,258
fl,fr	35,032
Flr & Frt	33,199
spores	31,824
sterile	30,478
Flower: Y Fruit: N¹ Vegetation: Y Bud: N¹	27,508
Flower \| Fruit	27,254
flowers & fruit	26,301
fru	19,833

¹Records that refer only to open flowers or flowering; these can be scored simultaneously with the Attribute Mining Tool, resulting in over 1 million new phenological records from a single scoring effort.

It is clear that there is a huge potential for using phenological data from herbarium specimens (Willis et al., ). We propose a method here to (1) broaden the scope and longevity of digitization efforts through a standardized methodology for scoring reproductive characters from herbarium specimens and (2) provide a means of sharing the resulting data in a Darwin Core format. The protocol we describe here will unlock the potential of herbaria for phenological research by facilitating comparability among herbaria, research groups, and other methodologies used to collect phenological data (e.g., citizen science observations, satellite imagery, and stationary camera images).

METHODS AND RESULTS

To make progress toward developing standards that reflect community‐wide goals and feasible implementation, iDigBio sponsored a working group called Coding Phenological Data from Herbarium Sheets in March 2016 at the University of California, Berkeley (https://www.idigbio.org/wiki/index.php/Coding_Phenological_Data_from_Herbarium_Sheets).

This workshop brought together 37 participants from the United States, Scotland, England, Sweden, Canada, Germany, and Australia. The group represented a range of phenological interests, including phenological researchers (those who use the downstream data products obtained from specimens), herbarium collection personnel (those responsible for preparing and curating the specimens and their data), in situ phenological observers who record the phenological status of living plants, and data standards experts. A participant list can be found on the iDigBio wiki page (https://www.idigbio.org/wiki/index.php/Coding_Phenological_Data_from_Herbarium_Sheets) and in Appendix S1.

Prior to the workshop, we developed a survey to assess needs of the phenological community and herbarium data users and to review the current ways phenological data were being captured. We received 76 responses to the survey, and the respondents identified themselves as being from collections, monitoring, or research areas (not mutually exclusive). With this survey and input from participants at the workshop, we reviewed the ways in which herbaria currently capture phenological traits. The two most‐scored traits from specimens are the presence of open flowers and the presence of fruit (14 million specimens represented in this survey). Most respondents also felt that of all possible traits, open flowers and fruits were the most important traits to score on a specimen. Participants of the workshop echoed this sentiment. We reviewed previous phenological research that was based on data derived from herbarium specimens in order to identify the types of raw data necessary and sufficient to achieve a variety of research goals. These findings are summarized in Willis et al. ().

When developing a scoring protocol, we considered the challenges and limitations to scoring specimens (de novo scoring and as part of regular digitization workflows) and the potential solutions to these limitations. We considered hard‐to‐see floral parts, trained vs. untrained scorers, the limited resources of most herbaria, and the likelihood of community‐wide adoption. We also considered the costs and benefits of recording qualitative data (e.g., “open flowers present/absent”) vs. quantitative data (e.g., counts or proportions of unopened flowers, open flowers, fruits).

One of our primary concerns is that any resulting data from attempts to score phenological traits should be shareable in Darwin Core–formatted files to help ensure the usefulness and longevity of these data. Representatives from the data standards community, Biodiversity Information Standards (TDWG), including Darwin Core and Apple Core, provided input for representing phenological stages using current biodiversity standards.

Finally, to ensure that phenological traits from specimens can be integrated with other sources, participants included members of the USA National Phenology Network, the California Phenology Network, the National Ecological Observatory Network, the Royal Botanic Garden Edinburgh, the Pan‐European Phenology Network, and the Plant Phenology Ontology.

We propose that reproductive traits for specimens of seed plants be scored according to the following hierarchical categories/questions (Tables and ). Our protocol uses terminology from the Plant Ontology to represent plant parts (e.g., flower, fruit) (http://www.obofoundry.org/ontology/po.html) and traits that correspond to plant phenological traits in the Plant Phenology Ontology (PPO; http://htmlpreview.github.io/?https://github.com/PlantPhenoOntology/ppo/blob/master/documentation/ppo.html) (e.g., reproductive structures present, unopened flowers present, open flowers present) (Walls et al., ; Stucky et al., ). By using a vocabulary that directly maps to ontologies, data collected with this method can be easily ingested into data stores using those ontologies and thereby integrated with other sources of phenological data such as direct observations in situ or remote sensing (www.plantphenology.org).

Proposed phenological scoring protocol for angiosperms. Single quote terms are defined in the Plant Ontology for plant parts (e.g., flower, fruit) and for traits that correspond to plant phenological traits in the Plant Phenology Ontology (e.g., reproductive structures present, unopened flowers present, open flowers present). Second‐order questions are not mutually exclusive and are answered in the affirmative or left blank. Third‐order questions define a specimen's position in the phenological cycle. Third‐order categories are mutually exclusive with one another

First‐order	Second‐order	Third‐order
Are ‘reproductive structures’ present?	Are ‘unopen flowers’ present?	Mostly ‘unopen flowers?’ (or counts)
(yes/no/not scorable)	Are ‘open flowers’ present?	Mostly ‘open flowers?’ (or counts)
		Mostly ‘post‐mature flowers’? (or counts)
	Are ‘fruits’ present?	Mostly ‘immature fruits’? (or counts)
		Mostly ‘mature fruits’? (or counts)
		Mostly ‘post‐mature fruits’? (or counts)

Proposed phenological scoring protocol for gymnosperms. Words in single quotes are defined in the Plant Ontology for plant parts (e.g., pollen cone) and for traits that correspond to plant phenological traits in the Plant Phenology Ontology (e.g., reproductive structures present, mature seed cones). Second‐order questions are not mutually exclusive and are answered in the affirmative or left blank. Third‐order questions define a specimen's position in the phenological cycle. Third‐order categories in this example are not mutually exclusive with one another because pollen and seed cones develop independently

First‐order	Second‐order	Third‐order
Are ‘reproductive structures’ present?	Are ‘pollen cones’ present?	Mostly ‘immature pollen cones?’ (or counts)
(yes/no/not scorable)		Mostly ‘mature pollen cones?’ (or counts)
		Mostly ‘post‐mature pollen cones?’ (or counts)
	Are ‘seed cones’ present?	Mostly ‘immature seed cones?’ (or counts)
		Mostly ‘mature seed cones?’ (or counts)
		Mostly ‘post‐mature seed cones’ present?

First‐order scoring

The question “Are ‘reproductive structures’ present? (yes/no/not scorable),” while the broadest question, was still determined to have value for scoring specimen records. Having this information allows researchers to filter millions of records quickly to find those that contribute to phenological research. It is also relatively easy for users with different levels of botanical training (e.g., curators, volunteers, and citizen scientists) to score. A “yes” means that some reproductive structures of some kind (e.g., flowers, fruits, or cones) are present. A “no” means that the specimen is sterile and strictly vegetative. It is important to note that this first‐order scoring can apply to all taxonomic groups, even beyond seed plants. Some taxonomic groups may exhibit specialized structures that make it more difficult for non‐experts to complete this process (i.e., vegetative propagules that look like fruits), but we anticipate that this challenge will be limited. Minimally, first‐order scoring will allow for records to be filtered and then subsequently scored in more detail.

Second‐order scoring

For specimens that are scored as having reproductive structures present, it is valuable to characterize which reproductive structures are present. Most research thus far has used specimens with open flowers. For flowering plants, we propose the following second‐order, non‐mutually exclusive questions: “Are ‘unopened flowers’ present?,” “Are ‘open flowers’ present?,” “Are ‘fruits’ presents?” For gymnosperms the questions are: “Are ‘pollen cones’ present?” and “Are ‘seed cones’ present?” (Tables and ). The term “bud/s” can confuse floral buds with leaf buds; therefore, the PPO and this protocol refer to unopened flowers only.

The second‐order questions are not mutually exclusive. If unopened flowers, open flowers, and fruits are all present on a sheet, all questions can be answered in the affirmative. Having these data allows researchers to quickly identify the records that pertain to their individual research questions. The second‐order questions require greater training for personnel to accurately discriminate unopened flowers, open flowers, and fruits. For many taxa (e.g., grasses, sedges, rushes), floral structures are small and distinguishing between unopened flowers, open flowers, and fruits can be challenging. Additionally, it is important that scorers are trained to distinguish between leaf buds and flower buds (which contain unopened flowers). As training materials are developed for various plant groups they should be shared widely across the community.

Third‐order scoring

Third‐order scoring further subdivides the categories of the second‐order scorings. While second‐order scorings will determine which specimens should be included in phenological research, it is often valuable to know the specimen's specific phenophase. Analyses can be more precise if we can distinguish between specimens in full flower from those specimens at the beginning or end of the flowering cycle. The third‐order scorings are intended to place individual specimens at a specific point in phenological development. As such, these subcategories and the units used to report them may vary depending on the institutional or research priorities that generate them. We do not specify exactly what the third‐order categories should be, as these will be determined by research priorities and staff time, but rather we explain how these questions are most commonly expressed or could be expressed within our proposed framework. Although we do not specify third‐order categories, we do strongly recommend that researchers clearly define their categories and make their definitions broadly accessible, along with pertinent metadata. For example, the Simple Knowledge Organization System (SKOS) provides a framework for representing controlled vocabularies that easily lends itself to being shared online. The New England Vascular Plant (NEVP) project has devised a vocabulary following these guidelines and has published their vocabulary using SKOS (http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#00) (Table ). Furthermore, regardless of the nature of third‐order categories, we strongly recommend that researchers share these data via the Darwin Core extensions explained below. For quantitative phenological data, some research groups require count data from a specimen, such as the numbers of unopened flowers, open flowers, and fruits on each sheet. Counts would be considered third‐order scoring. For some analyses, raw count data may be transformed to express the proportions of reproductive organs represented by unopened flowers, open flowers, and fruits, thereby distinguishing specimens that represent early‐, peak‐, or late‐flowering individuals. Even if count data are not precisely recorded from a sheet, the degree of flowering can be binned into categories representing early, peak, and late flowering (see NEVP project third‐order categories in Table ). Note that in the example shown in Table , the third‐order scorings, if categorical, are mutually exclusive. A sheet cannot simultaneously be ‘mostly unopened flowers’ and ‘mostly open flowers.’ Finally, in order to integrate third‐order scorings with other sources of phenological data, we recommend use of the PPO. Counts, binned data, or other data representations can be mapped to the PPO and by so doing, assure interoperability with field monitoring data.

An example of the proposed scoring protocol applied by the New England Vascular Plant (NEVP) thematic collections network. Definitions of reproductive phenological terms for flowering plants are displayed (draft version 1.2)

Scoring	NEVP definition	URI identifier
First‐order
Is the material on the sheet sterile?	No reproductive structures present (no unopened flowers, open flowers, or fruits)	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#02
Is there reproductive material present on the sheet?	At least one reproductive structure of any kind is present (unopened flowers, open flowers, or fruits)	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#03
Not scorable	Not possible to score reproductive condition using material present	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#16
Second‐order
Unopened flowers present?	At least one unopened flower is present	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#04
Open flowers present?	At least one open flower is present	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#06
Fruits present?	At least one immature or mature fruit is present	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#11
Third‐order
Mostly unopened flowers	Mostly unopened flowers (less than half open) but with at least one open flower; this category is mutually exclusive with mostly open and mostly old flowering stages and with mostly young, mostly mature, and past maturity fruiting stages	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#08
Mostly open flowers	Mostly open flowers (more than half open) with few unopened flowers or old flowers that have lost their petals; this category is mutually exclusive with mostly unopened and mostly old flowering stages and with mostly young, mostly mature, and past maturity fruiting stages	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#09
Mostly old flowers	Mostly old flowers (less than half open) that have lost their petals, but at least one flower still open; this category is mutually exclusive with mostly unopened and mostly open flowering stages and with mostly young, mostly mature, and past maturity fruiting stages	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#10
Mostly young fruit	Mostly immature fruits present (less than half mature) but at least one mature fruit present, mutually exclusive with mostly mature and past maturity fruiting stages and with mostly unopened, mostly open, and mostly old flowering stages	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#13
Mostly mature fruit	Mostly mature fruits present (more than half mature), mutually exclusive with mostly young and past maturity fruiting stages and with mostly unopened, mostly open, and mostly old flowering stages	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#14
Mostly past mature fruit	Fruits have fallen from stalks, withered, or dehisced and lacking seeds (less than half mature) but at least one mature fruit present, mutually exclusive with mostly young and mostly mature fruiting stages and with mostly unopened, mostly open, and mostly old flowering stages	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#15
Number of open flowers present	Number of open flowers present on specimen: 0 is an acceptable value	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#07
Number of fruits present	Number of mature fruits, 0 is an acceptable value	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#12

There are some taxonomic groups that have very small floral structures that either require an onerous amount of time to score or require expertise to determine what kinds of organs are actually present on the sheet. Members of Poaceae, Cyperaceae, and Juncaceae, as well as certain Asteraceae, are a few examples for which second‐order scoring may be more challenging. However, it should be relatively easy to apply first‐order scorings to these groups, thereby greatly increasing the utility of these specimens for phenological research. Our protocol does not address the presence/absence or abundance of male vs. female flowers, or distinguish between perfect and imperfect flowers in gynodioecious, gynomonoecious, or andromonoecious species, largely due to the fact that these categories have seldom been included in phenological research.

The timing of reproduction is not the only important phenological event of interest to be tracked in plants. Leaf bud break and leaf‐out are important phenomena for deciduous forests, as is autumn senescence. These vegetative characters are often tracked via satellite imagery and in situ monitoring efforts. Scoring phenological leaf traits on herbarium specimens is rare (but see Zohner and Renner, ; Gallinat et al., ), but it provides valuable insights into the effects of climate change (Chmielewski and Rotzer, ; Everill et al., ). A similar scoring protocol is recommended for foliar structures, although we do not specify a protocol here.

Sharing of phenological scorings

Darwin Core has emerged as a key standard for describing species occurrences. Online documentation, including definitions and examples, is provided for each term used in the Darwin Core (TDWG website: http://rs.tdwg.org/dwc/terms/). Any attempts to share phenological or other trait data from specimens should utilize Darwin Core fields to assure that the basic specimen occurrence information is standardized. However, phenological trait descriptions are not part of the Darwin Core and therefore other mechanisms are needed to support narrower or broader data‐sharing approaches.

We are proposing to share phenological data using the Darwin Core Extended MeasurementOrFact (eMoF) extension (Table ) (https://tools.gbif.org/dwca-validator/extension.do?id=http://rs.iobis.org/obis/terms/ExtendedMeasurementOrFact). This extension provides a mechanism whereby many measurements or facts can be shared for each specimen record in a Darwin Core Archive. This extension allows for sharing of metadata associated with each phenological (or any other trait) scoring. For example, when evaluating data quality, it can be useful to know when, how, and by whom scorings were recorded. Accuracy may be affected by whether the specimen was scored from a web‐based image or the physical specimen. An eMoF record can contain a definition of the type of measurement, the value and units of the measurement, the method of measurement, and by whom and when it was measured.

Phenological scorings³ in the extended MeasurementOrFact Darwin Core extension: an example from Arizona State University Fabaceae specimen records

coreid	measurementType	measurementTypeID	measurementValue	measurementValueID	measurementUnit	measurementDeterminedDate	measurementDeterminedBy	measurementRemarks
652438	Phenology (ver 1.2)	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#01	Reproductive	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#03		2017‐04‐15T01:05:07Z	egbot
652438	Phenology: reproductive	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#03	Open Flowers	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#06		2017‐04‐15T01:05:08Z	egbot
652439	Phenology (ver 1.2)	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#01	Reproductive	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#03		2017‐04‐15T01:10:54Z	egbot
652439	Phenology: reproductive	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#03	Open Flowers	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#06		2017‐04‐15T01:10:55Z	egbot
652439	Phenology: reproductive	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#03	Fruiting	http://purl.org/nevp/vocabulary/reproductive-phenology_1.2#11		2017‐04‐15T01:10:56Z	egbot

Note

measurementType = the nature of the measurement, fact, characteristic, or assertion; measurementTypeID = an identifier for the measurementType; measurementValue = the value of the measurement, fact, characteristic, or assertion; measurementValueID = an identifier for facts stored in the column measurementValue; measurementUnit = the units associated with the measurementValue, not used in this example; measurementRemarks = not used in this example, but could contain comments or notes accompanying the MeasurementOrFact; measurementDeterminedDate and measurementDeterminedBy = the person, date, and time that the scoring was applied.

³In the near future, these phenological mappings will be aggregated by iDigBio, GBIF, and other public repositories through Symbiota's Darwin Core Archive publishing services. The Darwin Core Archive publishing services are available within all Symbiota portals and are the central mechanized archive used by iDigBio to harvest and maintain updates of specimen data published from a Symbiota portal instance.

Table shows an example Darwin Core Archive file of the eMoF extension of two herbarium sheets that were scored using our protocol (measurementType called Phenology [ver.1.0]). The first record was scored with a measurementValue = ‘Reproductive.’ Additionally that same record is scored as measurementValue = ‘Open flowers.’ The second record, united by catalog number, is scored ‘Reproductive,’ ‘Open flowers,’ and ‘Fruiting.’

Using the eMoF extension has a potential disadvantage, namely that it does not allow the measurement to be rigorously tied to a particular aspect of the core record. This means that any user can define a new and non‐standard ‘measurementType’ and ‘measurementValue’ (e.g., potentially called “Flowering Time” and “having flowers” or “Flws”), which could lead to difficulty compiling data. Unless various measurementTypes and measurementValues are rigorously defined, an excessive number of unique text strings could be generated. To address this, we are working toward defining these terms within Apple Core. Apple Core is a set of best practice guidelines for publishing botanical specimen information for herbaria. A goal of the guidelines is to mitigate the generality of Darwin Core by providing detailed guidelines for publishing botanical specimen information in Darwin Core. These guidelines will include recommended terms, specific definitions, multiple examples, common issues, and controlled vocabularies where appropriate that are specific to herbarium specimens. Apple Core is a community‐curated resource that is still being refined, and interaction with phenological researchers will help to strengthen this resource. Finally, use of this approach is complementary with broader sharing initiatives that utilize ontologies, such as the Plant Phenology Ontology.

In the near future, using the eMoF extension will allow for phenological scorings to be published in iDigBio, the Global Biodiversity Information Facility (GBIF), and other public repositories. Darwin Core Archive publishing services are available within all Symbiota portals (Gries et al., ; http://symbiota.org) and form the basis from which iDigBio harvests specimen data from these portals (http://swbiodiversity.org/seinet/collections/datasets/datapublisher.php and http://sernecportal.org/portal/collections/datasets/datapublisher.php). Adherence to our protocol at local institutions will facilitate the search functions provided and developed by large aggregators such as iDigBio and GBIF.

Conclusions

Advantages of the proposed protocol

The questions presented here provide important data for researchers while also requiring minimal effort from herbarium curators. Phenological questions are easily integrated into standard label digitization workflows or could be subsequently scored from images. Due to the nested nature of the questions, a third‐order question can be scored initially, with the appropriate second‐ and first‐order questions automatically populated. For example, a report of “fruits present” on a specimen would automatically score a “yes” for the first‐order question, indicating that reproductive structures are present. To answer first‐order questions, the person who is performing the initial data entry for a specimen need only look at the sheet and check a box indicating whether reproductive structures of any kind are present. For databases that do not have the infrastructure to accommodate this type of scoring, a few alternatives are presented below.

Workflow implementation

Phenological scores can be recorded at a number of steps in a digitization workflow. In the case of an object‐to‐data workflow, scores could be made directly from the sheet as label data are being captured. With an image‐based workflow, the scoring of specimens can be achieved by visual inspection of their images. The latter approach provides the option of making the image available online where the public (e.g., citizen scientists using Notes from Nature, CrowdCurio, or other platforms) can record phenological observations. Machine learning approaches are likely to facilitate our ability to score images at scale in the near future. Database fields in local databases need to be modified to accommodate the proposed structure. Implementation of controlled vocabularies can be facilitated with drop‐down menus or pick‐lists (see Figs. and for an example from a Symbiota portal); however, providing such functionality might require changes to database management software. Fortunately, a number of tools (described below) have been developed for scoring the phenological status of specimens.

View Image - Example of Symbiota's Attribute Mining Tool. Here, a local database's text field ‘Reproductive Condition’ was searched for all text strings containing “fl” in the Fabaceae. Highlighted references are text strings referring to both flowers and fruits. These were selected, and second‐order scorings of “open flowers present” and “fruit present” were then applied to all specimens simultaneously.

Example of Symbiota's Attribute Mining Tool. Here, a local database's text field ‘Reproductive Condition’ was searched for all text strings containing “fl” in the Fabaceae. Highlighted references are text strings referring to both flowers and fruits. These were selected, and second‐order scorings of “open flowers present” and “fruit present” were then applied to all specimens simultaneously.

View Image - Example of Symbiota's Image Scoring Tool. Images of Fabaceae specimens were searched. The user can apply the desired level of scoring to each image that appears.

Example of Symbiota's Image Scoring Tool. Images of Fabaceae specimens were searched. The user can apply the desired level of scoring to each image that appears.

For curators who do not have a database with Symbiota‐type functionality that provides phenological checkboxes corresponding to our proposed protocol, we suggest that users enter phenological information into an appropriate text field within their existing database with the expectation that new tools will enable users to search these text fields and score the specimens appropriately (see Tools that facilitate scoring: Symbiota's “Attribute Mining Tool” below). Ideally, every institution's home database will include a text field dedicated exclusively to information pertaining to phenology. However, including phenological information as text anywhere within a given specimen's label data is better than not capturing any phenological traits. To choose the best text field within a local database, it is important to know how the specimen data appear when shared using a Darwin Core Archive. If, for example, one's local database conforms to Darwin Core, reproductive traits should be included in the ‘reproductiveCondition’ field. The words entered into the text field should be unambiguous and should correspond to the protocol above (e.g., “unopened flowers,” “open flowers,” “fruit”). This is an action that all curators can immediately integrate into their current digitization workflows.

Conclusions for curators

Those managing or implementing digitization workflows should consider incorporating the scoring of phenological data into their workflows. At the very least, first‐ or second‐order phenological data (as described above) should be considered for capture. Doing so will facilitate future scoring of the specimens. If time does not permit training herbarium personnel to record challenging second‐order scorings, then simply adding the word “reproductive” somewhere in a relevant database field will aid future work and research use.

Tools that facilitate scoring

Although it is not the primary focus of this paper, we think it is useful to readers to make brief mention of how our protocol could be implemented and what tools are available for doing so.

Symbiota's “Attribute Mining Tool”

Part of the NEVP project was the development of a tool to score phenological traits using digitized label text (Fig. ). This tool allows a user to search for specific words in database fields and map these to the proposed vocabulary. For example, using this tool to search the field ‘reproductiveCondition’ within SEINet resulted in over 4000 unique text strings (Table ). The Attribute Mining Tool allows one to select all records containing text that refer solely to a single scoring category. For example, if a user were scoring “open flowers present” only, the user could select all the highlighted rows in Table and click “Open flowers present.” In the example from SEINet presented in Table , this single scoring event would result in the selection and scoring of 1,031,786 records. In a separate scoring event, the user could select all records that make reference to both open flowers and fruits and then select “open flowers present” and “fruits present.” Because a curator is responsible for mapping free text strings from the database to a controlled vocabulary, this method does not rely on computerized inference. The ability to apply phenological scoring to any specimen within a Symbiota portal is highly efficient, and these types of tools should be developed within other database platforms.

Scoring images

Many platforms have been developed for remotely scoring images of specimens, and we review them below. It is vital that future scoring platforms conform as closely as possible to the proposed protocol to facilitate data integration. Furthermore, it is vital that specimen trait data, even when scored outside of the local database, remain associated with the original specimen record. This will allow trait data and occurrence data to travel together through the data aggregation process, preventing duplicated scoring efforts.

Symbiota's “Image Scoring Tool”

The new Image Scoring Tool, developed as part of the NEVP project, allows Symbiota network users to filter images and apply a phenological score to them (Fig. ). This approach has facilitated the scoring of over 240,000 images of New England specimens to date. Phenological scorings are being shared with end users through the Consortium of Northeastern Herbaria portal via the Darwin Core Extended MeasurementOrFact extension and Darwin Core Archives, as outlined above. This functionality will soon be available to all Symbiota‐based databases.

Notes from Nature

Notes from Nature is an online citizen science platform (Hill et al., ) originally developed to support the transcription of specimen labels, but it has expanded to include phenological classifications. Notes from Nature extends the Zooniverse (https://www.zooniverse.org/) model by providing a simple way for curators or researchers to bundle and upload images, set targets for transcriptions or scoring, launch new expeditions, and engage volunteers (Fig. ). Notes from Nature addresses data quality by requiring a minimum of three replicated classifications (provided by different participants) for each imaged specimen. When expeditions are completed, a suite of tools are available for reporting outcomes of efforts, and automated reconciliation occurs to produce a “best classification.” This includes data for phenological categories and for counts of reproductive structures.

(A) A typical specimen image presented as part of a phenology expedition on Notes from Nature. (B) Classification task requesting that volunteers record the number of fruits that are visible on the herbarium specimen. (C) Classification task requesting that volunteers record the number of open flowers visible on the herbarium specimen. Note that parts B and C display tools that help volunteers to complete the task (e.g., pan, zoom, rotate, tutorial, and help).

Notes from Nature phenology expeditions have so far solicited reports of flowering and fruiting as well as counts of reproductive structures for Quercus L., Coreopsis L., and Cakile Mill. Notes from Nature has launched expeditions asking for simple annotations of open flowers or fruits present, to more complex expeditions where users are asked to count numbers of unopened flowers, open flowers, and fruits. Asking users to report first‐ and second‐order scorings generated large volumes of accurate phenological data, whereas expeditions asking for more complex scorings, such as counts, had lower participation from the community of citizen science annotators and took much longer.

CrowdCurio

CrowdCurio is a new online platform designed to give researchers the ability to design and implement crowdsourcing projects tailored to their specific interests and data sources (Willis et al., ; https://www.crowdcurio.com/). Most recently, a CrowdCurio project, titled “Thoreau's Field Notes,” demonstrated that the platform was an effective tool for crowdsourcing the collection of phenological data from digitized herbarium specimens (Willis et al., ). Participants are presented with an image of a herbarium specimen and asked to annotate the image by clicking on each visible unopened flower, open flower, and fruit. These annotations are then transformed into counts that can be used to approximate the phenological stage of a given individual specimen.

In a preliminary study of the efficiency and quality of CrowdCurio data collection, Willis et al. () compared data collected by expert (herbarium curators) and non‐expert (anonymous Amazon Mechanical Turk [Mturk] workers) participants for two common New England species: greater celandine (Chelidonium majus L.) and lowbush blueberry (Vaccinium angustifolium Aiton). They found that non‐expert counts were similar to expert counts, but that non‐experts were able to record nearly twice as much data at less cost over the same amount of time.

Data collected via crowdsourcing, however, are not without limitations. Although Willis et al. () found no difference in average counts between experts and non‐experts, non‐expert counts tended to be more variable per specimen. This in part depended on the specimen being assessed—specimens with more objects to count had higher error rates. As with any crowdsourcing project, care should be taken when choosing which specimens and taxa to include (e.g., are the flowers easy to identify?). Additionally, CrowdCurio is in the process of implementing additional features to improve data quality, such as filtering users based on their ability to repeat the same task. The phenological data generated within CrowdCurio can be expressed according to the protocol outlined in this paper and shared via Darwin Core Archives.

Integration with other data sources

One ultimate goal is to combine herbarium specimen data with other sources of phenological data to make possible the detection of phenological changes across geographic, temporal, and taxonomic scales. The PPO provides an opportunity for herbarium data to be combined with disparate data sources, such as in situ phenological monitoring or satellite imagery. The PPO is a common vocabulary for describing plant phenological traits and was designed to provide a means to support global‐scale integration of phenological data. Ontologies provide highly structured, controlled vocabularies for data annotation and are particularly useful for standardization, because they not only establish a common terminology but also formalize logical relationships between terms such that they can be analyzed using machine reasoning. For example, logical term relationships in the PPO specify that any plant with “expanding leaves” must necessarily also have “non‐senescing leaves.” This logical structure means that data can be integrated at different levels of detail and software can be used to establish new facts about the data that were not expressed in the original data sets. This structure in turn enables large‐scale integration among a wide range of study types, including: (i) studies addressing similar phenophases but using different methodologies, (ii) studies involving different phenophases, and (iii) studies not specifically addressing phenology but producing other types of data (e.g., trait or climatic data). Thus, the PPO empowers researchers to aggregate larger data sets, at the global scale, and to address broader questions involving the interplay of phenology and other factors. Accordingly, the PPO is already being used to integrate data resources such as those from the USA National Phenology Network (Denny et al., ), the Pan‐European Phenology Network (http://www.pep725.eu/), and herbarium digitization efforts. The PPO and associated integration tools are compatible with the Darwin Core and Apple Core standards and associated data‐sharing tools discussed above.

Never before has an understanding of phenology been so important to humans. We are in a time of massive environmental change, and the organisms upon which we depend will have to adapt or migrate if they are to avoid local or global extinction. Herbarium specimens are critical to understanding and mitigating those changes. We need phenological data from specimens now more than ever, and researchers are ready and eager to analyze high‐quality data sets, particularly those comprising high taxonomic diversity, temporal depth, and a broad geographic range. With minimal additional efforts during or post‐digitization, specimens can be scored quickly and easily and contribute to our understanding of our changing planet and the flora that sustains it.

Acknowledgments

This work was supported by the National Science Foundation (grants DBI‐1547229 [P.S.S.], DBI‐0735191 and DBI‐1265383 [R.L.W.], DBI‐1458550 [R.P.G.], DBI‐1410087 [A.B.M.], DBI‐EF1208835 [C.C.D.], DEB‐1556768 [S.J.M.], DBI‐1458264 [J.R.C.], and DBI‐1209149 [P.W.S.]). Additional support was provided by the Andrew W. Mellon Foundation, the Sibbald Trust, and the Scottish Government. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

Author contributions

All authors contributed equally to the development of the protocol. J.M.Y managed the writing of the paper with contributions from all authors. C.G.W, P.W.S, E.G., and M.W.D provided figures and text from related projects.

Word count: 6355

Show less

© 2018. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Premise of the Study

Herbarium specimens provide a robust record of historical plant phenology (the timing of seasonal events such as flowering or fruiting). However, the difficulty of aggregating phenological data from specimens arises from a lack of standardized scoring methods and definitions for phenological states across the collections community.

Methods and Results

To address this problem, we report on a consensus reached by an iDigBio working group of curators, researchers, and data standards experts regarding an efficient scoring protocol and a data‐sharing protocol for reproductive traits available from herbarium specimens of seed plants. The phenological data sets generated can be shared via Darwin Core Archives using the Extended MeasurementOrFact extension.

Conclusions

Our hope is that curators and others interested in collecting phenological trait data from specimens will use the recommendations presented here in current and future scoring efforts. New tools for scoring specimens are reviewed.

Details

Title

Digitization protocol for scoring reproductive phenology from herbarium specimens of seed plants

Author

Yost, Jennifer M¹; Sweeney, Patrick W²; Gilbert, Ed³; Nelson, Gil⁴; Guralnick, Robert⁵; Gallinat, Amanda S⁶; Ellwood, Elizabeth R⁷; Rossington, Natalie⁸; Willis, Charles G⁹; Blum, Stanley D¹⁰; Walls, Ramona L¹¹; Haston, Elspeth M¹²; Denslow, Michael W¹³; Zohner, Constantin M¹⁴; Morris, Ashley B¹⁵; Stucky, Brian J⁵; Carter, J Richard¹⁶; Baxter, David G¹⁷; Bolmgren, Kjell¹⁸; Denny, Ellen G¹⁹; Dean, Ellen²⁰; Pearson, Katelin D²¹; Davis, Charles C²²; Mishler, Brent D²³; Soltis, Pamela S⁵; Mazer, Susan J⁸

¹ Department of Biological Sciences, California Polytechnic State University, San Luis Obispo, California, USA
² Division of Botany, Peabody Museum of Natural History, Yale University, New Haven, Connecticut, USA
³ Arizona State University, School of Life Sciences, Tempe, Arizona, USA
⁴ iDigBio, College of Communication and Information, Florida State University, Tallahassee, Florida, USA
⁵ Florida Museum of Natural History and Biodiversity Institute, University of Florida, Gainesville, Florida, USA
⁶ Boston University, Department of Biology, Boston, Massachusets, USA
⁷ La Brea Tar Pits and Museum, Los Angeles, California, USA
⁸ Department of Ecology, Evolution and Marine Biology, University of California, Santa Barbara, California, USA
⁹ Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, Massachusetts, USA; University of Minnesota, Department of Biology Teaching and Learning, Minneapolis, Minnesota, USA
¹⁰ Biodiversity Information Standards (TDWG), San Francisco, California, USA
¹¹ CyVerse, University of Arizona, Tucson, Arizona, USA
¹² Royal Botanic Garden Edinburgh, Edinburgh, United Kingdom
¹³ Florida Museum of Natural History and Biodiversity Institute, University of Florida, Gainesville, Florida, USA; Department of Biology, Appalachian State University, Boone, North Carolina, USA
¹⁴ Systematic Botany and Mycology, Department of Biology, Munich University (LMU), Munich, Germany
¹⁵ Department of Biology, Middle Tennessee State University, Murfreesboro, Tennessee, USA
¹⁶ Biology Department, Valdosta State University, Valdosta, Georgia, USA
¹⁷ University and Jepson Herbaria, University of California Berkeley, Berkeley, California, USA
¹⁸ Swedish University of Agricultural Sciences, Unit for Field‐based Forest Research, Lammhult, Sweden
¹⁹ USA National Phenology Network, University of Arizona, Tucson, Arizona, USA
²⁰ UC Davis Center for Plant Diversity, Davis, California, USA
²¹ Department of Biological Science, Florida State University, Tallahassee, Florida, USA
²² Department of Organismic and Evolutionary Biology, Harvard University Herbaria, Cambridge, Massachusetts, USA
²³ University and Jepson Herbaria, University of California Berkeley, Berkeley, California, USA; Department of Integrative Biology, University of California, Berkeley, California, USA

Section

Protocol Notes

Publication year

2018

Publication date

Feb 2018

Publisher

John Wiley & Sons, Inc.

e-ISSN

21680450

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1002/aps3.1022

ProQuest document ID

2265772230

Digitization protocol for scoring reproductive phenology from herbarium specimens of seed plants

Jump to:

Full text

Abstract

Details

Suggested sources