Content area

Abstract

The proliferation of video content across digital platforms demands automated methods for content segmentation, particularly in long-form broadcasts where traditional visual-based approaches inadequately capture subtle topical transitions. This thesis investigates audio transcript segmentation through supervised topic modeling, comparing clustering-based and transformer-based architectures when adapted for boundary detection tasks.

This research develops a comprehensive pipeline that transforms raw broadcast transcripts into topically coherent segments, introducing a novel synthetic dataset generation methodology that addresses the scarcity of ground-truth annotations. The study implements and evaluates two distinct classification paradigms: BERTopic, which combines contextual embeddings with clustering algorithms, and fine-tuned RoBERTa, leveraging deep transformer representations. A paragraph-level sliding window approach facilitates the detection of topical boundaries.

Experiments conducted on a corpus derived from broadcast news transcripts reveal counterintuitive findings regarding model transferability. While transformer-based models demonstrate superior performance in document-level topic classification, clustering-based approaches exhibit enhanced sensitivity to local discourse transitions, resulting in more accurate boundary detection. This performance inversion challenges conventional assumptions about the relationship between classification accuracy and segmentation effectiveness.

The developed system successfully identifies topical shifts in broadcast content, with practical implications for news media, educational platforms, and streaming services. Integration of the segmentation pipeline into existing content management systems enables enhanced searchability, automated summarization, and improved user navigation. The findings establish empirical baselines for transcript-based segmentation and provide methodological insights for developing multimodal video analysis systems that balance global topical coherence with local transition sensitivity.

Details

1010268
Business indexing term
Title
Audio Transcript Segmentation Via Supervised Topic Modeling
Number of pages
67
Publication year
2025
Degree date
2025
School code
5896
Source
MAI 87/5(E), Masters Abstracts International
ISBN
9798265424792
University/institution
Universidade do Porto (Portugal)
University location
Portugal
Degree
M.Eng.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32306457
ProQuest document ID
3275478367
Document URL
https://www.proquest.com/dissertations-theses/audio-transcript-segmentation-via-supervised/docview/3275478367/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic