Content area

Abstract

The Construction Management Systems (CMS) domain increasingly depends on unstructured text (inspection reports, technical documents, incident logs), creating an opportunity for domain-aware Natural Language Processing (NLP) with Large Language Models (LLMs). Yet general-domain pre-training often misses domain-specific terminology and context, limiting precision and accuracy for tasks in CMS domain. This gap motivates domain-specific LLMs along two tracks: (i) a discriminative, encoder-based Transformer system for classification and regression on agency documents to support risk assessment, resource allocation, and cost estimation (Chapter 2); and (ii) a generative, decoder-based question-answering system that grounds answers in project documents while leveraging model priors (Chapter 3). The retrieval-augmented generation (RAG) and prompt engineering pipeline in Chapter 3 are adapted for various data-mining and analysis tasks across the domain as shown in Chapters 4–5.

Chapter 2 develops the first dedicated CMS corpus and an end-to-end pipeline for pre-training language models on domain text. After domain-specific pre-training and fine-tuning, these models outperform general models on two representative tasks, structural condition assessment and building compliance checking, with F1 improvements of 5.9% and 8.5%, respectively, underscoring the value of domain-specific pre-training.

Chapter 3 builds an agency-specific project-authoring advisor using RAG with prompt engineering (persona, format template, chain-of-thought, few-shots learning). During evaluation grounded in agency documentation, GPT-4 with RAG and optimized prompts scores 88.9/100, versus 75.7 with RAG only and 53.4 without RAG, and significantly surpasses conventional search methods.

Chapters 4 releases a public, metadata-rich dataset of 1,100 CMS publications with annual citations, then compares topic extraction via Latent Dirichlet Allocation (LDA) and an automated LLM-RAG pipeline evaluated against expert-labeled topics. The LLM-RAG approach achieves far higher agreement (85.94× Jaccard, 8.14× BLEU, 32.11× ROUGE) and reveals research trends by analyzing topics and citations.

Chapter 5 adapts the “Sleeping Beauty” framework to perform the first systematic analysis of papers with delayed recognition in CMS, showing delayed recognition is more prevalent than assumed and cautioning against short-horizon citation metrics.

Collectively, this thesis demonstrates that combining domain-specific datasets and pre-training with RAG and specialized prompt engineering delivers accurate, auditable decision support, advancing evidence-based planning, regulatory compliance, and operational efficiency across the CMS sector.

Details

1010268
Business indexing term
Title
Leveraging Large Language Models for Intelligent Construction Management Systems
Number of pages
190
Publication year
2025
Degree date
2025
School code
0779
Source
DAI-B 87/5(E), Dissertation Abstracts International
ISBN
9798265439529
Committee member
Kim, Daeho; Golparvar-Fard, Mani; Colic, Sinisa
University/institution
University of Toronto (Canada)
Department
Civil Engineering
University location
Canada -- Ontario, CA
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32236538
ProQuest document ID
3276241657
Document URL
https://www.proquest.com/dissertations-theses/leveraging-large-language-models-intelligent/docview/3276241657/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic