Content area

Abstract

In today’s data-driven world, a significant challenge is the extraction of meaningful insights from large volumes of unstructured data. Central to this challenge is handling data with an inherent sequential nature, encompassing types such as textual data, time series, and event sequences.

The power of sequential machine learning techniques has been exhibited through their ability to capture dependencies and patterns, especially within temporal and textual realms. However, the shortage of labeled data, class imbalances, and the intricacies of high-dimensional data present open challenges.

This thesis delves deep into these challenges, primarily focusing on Natural Language Processing, using Sentiment Classification as a case study. While sentiments towards various topics and products flourish, this data's unstructured and unlabeled nature means much of its value goes untapped. Manual labeling of this massive dataset is untenable due to its scale and complexity. Addressing this, our work conceptualizes and deploys Cross-Domain and Multi-Domain Sentiment Classification models. Transfer Learning, specifically unsupervised and supervised language model pre-training, and Active Learning show promising results in tackling the data labeling problem. The thesis proposes models like CRD-SentEnse, MUTUAL, and REFORMIST that achieve robust results with minimal labeled data. This work incorporates hate speech detection to provide more comprehensive sentiment analysis, treating it as a specialized subset of negative sentiment. Such integration ensures that platforms can promptly identify and act upon harmful content. Therefore, a framework for hate speech detection is proposed to investigate the efficiency of pre-trained models and deal with the challenges of extremely negative and polarized sentiments and imbalanced classes.

Transitioning from textual to time series data, we confront challenges, such as high dimensionality and temporal patterns like seasonality and trends. Given its fewer inherent assumptions, sequential machine learning is often considered more adapt-able than traditional machine learning and statistical models. Contrasting sequential machine learning with traditional methods, we present a comprehensive comparative study, highlighting the strengths and limitations of each. Applications like thermal comfort prediction and network traffic forecasting are our experimental foundation.

Lastly, we handle event sequence data with our proposed model, WhatsNextApp, addressing issues like data scarcity and the user cold-start problem, outperforming state-of-the-art models.

In conclusion, the methodologies presented in this thesis enhance our understanding and performance of sequential machine learning in textual, time series, and event log data, setting the stage for future research.

Details

1010268
Title
Sequential Machine Learning For Textual and Time-Series Data
Number of pages
212
Publication year
2025
Degree date
2025
School code
1119
Source
DAI-B 87/1(E), Dissertation Abstracts International
ISBN
9798290615073
University/institution
Technische Universitaet Berlin (Germany)
University location
Germany
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32150883
ProQuest document ID
3256050516
Document URL
https://www.proquest.com/dissertations-theses/sequential-machine-learning-textual-time-series/docview/3256050516/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic