Content area

Abstract

Societal Impact Statement

Investigation of farmers', consumers', and other stakeholders' trait preferences is vital for the adoption and impact of improved crop varieties. While qualitative research methods are known to increase the depth and scope of information from respondents, only 5% of previous trait preference studies used qualitative data in their analyses. We show that AI‐based natural language processing, particularly GPTs, is both a time and cost‐effective mechanism for accurately analyzing open‐ended trait preference data. This will contribute to the selection and prioritization of breeding targets to better meet end‐user needs, with implications for food security and health outcomes globally.

Crop trait preference research is critical for the development of improved crop varieties, guiding breeding programs in setting trait priorities and targets that represent farmers' and consumers' needs. However, there is a dearth of methodological harmonization in trait preference studies, leading to high heterogeneity in collected data and analysis frameworks, which constrains comparability between studies. Qualitative research tools using open‐ended questions are among the most common methods used to elucidate crop trait preferences, but only a fraction of these data are used in analysis. The ascendance of AI tools in data analysis provides an opportunity to enhance capitalization of these data from open‐ended question types. We use natural language processing (NLP) techniques, including generative pretrained transformer (GPT) models, to elucidate labels from open‐ended question responses and perform multilabel text classification. We compare these labels to pre‐codes from close‐ended questions, as well as to existing crop trait ontology terms. We find that analyzing responses to open‐ended questions using NLP leads to information gain, including an increase in diversity of traits and insight into their social functions. We conclude that using NLP‐based approaches would allow breeding teams to extract trait terms from open‐ended question responses efficiently and to compare these to both existing ontology terms and close‐ended survey data. Our findings reveal the importance of using open‐ended questions to inform survey codes in mixed methods research design for trait preference studies.

Details

1009240
Business indexing term
Location
Company / organization
Title
Applying large language models to extract information from crop trait prioritization studies
Author
Farmer, Erin E. 1   VIAFID ORCID Logo  ; Brown, David 1   VIAFID ORCID Logo  ; Gore, Michael A. 1   VIAFID ORCID Logo  ; Tufan, Hale A. 1   VIAFID ORCID Logo 

 Plant Breeding and Genetics Section, School of Integrative Plant Science, Cornell University, Ithaca, New York, USA 
Publication title
Volume
8
Issue
1
Pages
176-184
Number of pages
10
Publication year
2026
Publication date
Jan 1, 2026
Section
METHODS AND TECHNIQUES
Publisher
John Wiley & Sons, Inc.
Place of publication
Lancaster
Country of publication
United States
Publication subject
e-ISSN
25722611
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-08-06
Milestone dates
2025-05-10 (manuscriptRevised); 2025-12-11 (publishedOnlineFinalForm); 2024-12-20 (manuscriptReceived); 2025-08-06 (publishedOnlineEarlyUnpaginated); 2025-06-04 (manuscriptAccepted)
Publication history
 
 
   First posting date
06 Aug 2025
ProQuest document ID
3281235932
Document URL
https://www.proquest.com/scholarly-journals/applying-large-language-models-extract/docview/3281235932/se-2?accountid=208611
Copyright
© 2026. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-12-11
Database
ProQuest One Academic