Content area
Earth observation data serve as a fundamental resource in Earth system science. The rapid advancement of remote sensing and in situ measurement technologies has led to the generation of massive volumes of data, accompanied by a growing body of geographic textual information. Efficient and accurate classification and management of these geographic texts has become a critical challenge in the field. However, the effectiveness of traditional classification approaches is hindered by several issues, including data sparsity, class imbalance, semantic ambiguity, and the prevalence of domain-specific terminology. To address these limitations and enable the intelligent management of geographic information, this study proposes an efficient geographic text classification framework based on large language models (LLMs), tailored to the unique semantic and structural characteristics of geographic data. Specifically, LLM-based data augmentation strategies are employed to mitigate the scarcity of labeled data and class imbalance. A semantic vector database is utilized to filter the label space prior to inference, enhancing the model’s adaptability to diverse geographic terms. Furthermore, few-shot prompt learning guides LLMs in understanding domain-specific language, while an output alignment mechanism improves classification stability for complex descriptions. This approach offers a scalable solution for the automated semantic classification of geographic text for unlocking the potential of ever-expanding geospatial big data, thereby advancing intelligent information processing and knowledge discovery in the geospatial domain.
Details
Text categorization;
Accuracy;
Labels;
Data processing;
Datasets;
Classification;
Big Data;
Domain specific languages;
Remote sensing;
In situ measurement;
Large language models;
Terminology;
Data augmentation;
Semantics;
Information processing;
Natural language processing;
Annotations;
Information retrieval
1 Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China; [email protected], College of Computer Science and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China