Content area
Unstructured scientific text plays a critical role in preserving, transferring, and developing research knowledge. Valuable outputs are often recorded in forms such as patents, research articles, and project reports. Unlike generic text, scientific literature usually follows specialized formats and terminology. This significant difference leads to greater challenges and opportunities for NLP (Natural Language Processing) researchers. To automate the process of extracting and structuring domain-specific knowledge from unstructured text, this dissertation addresses these challenges by leveraging NLP methods for automated materials science knowledge extraction.
Through three case studies, this dissertation explores the use of deep learning, LLM (Large Language Model) and prompt-based techniques to extract critical materials synthesis knowledge from scientific texts. Building on these efforts, the dissertation introduces an end-to-end, cost-effective framework designed for large-scale knowledge extraction with domain experts in the loop. The framework demonstrates how combining automated methods with light human guidance enables scalable, accurate, and efficient processing of materials science literature. Together, these contributions aim to mitigate key bottlenecks in scientific knowledge extraction and support the development of AI-ready materials data.