Content area

Abstract

Understanding human emotion and attention during visual behavior offers deep insights into internal cognitive states. Grounded in the action-perception loop, we study how humans process, interpret, and act upon visual information, and how these responses reflect underlying affective and cognitive mechanisms. This thesis focuses on two key challenges: detecting and interpreting emotion in long, naturalistic videos, and modeling gaze behavior in goal-directed visual tasks.

1. Emotion Understanding. Emotion analysis in video presents several challenges, including subtle and transient expressions, overlapping affective signals, and the difficulty of obtaining high-quality annotations. Moreover, spotting and recognizing expressions are often handled in separate stages, which can introduce inefficiencies and hinder performance. To address these issues, we developed a lightweight spotting framework that captures fine-grained motion using phase-based features, enabling robust and efficient detection of micro-expressions. We further proposed a unified end-to-end model that jointly performs expression spotting and recognition, improving accuracy and reducing the need for handcrafted preprocessing. Additionally, we introduced a transformer-based regression approach that models temporal dynamics to estimate emotional intensity directly from raw video frames.

2. Gaze Behavior Modeling. Traditional gaze modeling has largely focused on low-level, pixel-based fixations, which often overlook semantic object structure and task-driven intentions. This limits the interpretability and applicability of such models in real-world settings. To overcome this, we designed an object-level scanpath prediction framework that models gaze as a sequence of attentional shifts over meaningful objects. By incorporating semantic object information, spatial priors, and target representations, the framework more accurately reflects human behavior in structured search tasks.

These contributions deepen our understanding and models of facial expressions and gaze during behavior, offering efficient and interpretable models tailored to naturalistic settings. They lay the groundwork for cognitively-informed behavior modeling and open new directions for incorporating psychological constraints, explainable mechanisms, and adaptive human-in-the-loop learning.

Details

1010268
Title
Understanding Emotion and Gaze During Visual Behavior
Author
Number of pages
123
Publication year
2025
Degree date
2025
School code
1223
Source
DAI-B 87/5(E), Dissertation Abstracts International
ISBN
9798263313029
Committee member
Shi, Ling; Xie, Zhiyao; Xu, Dan; Kawahara, Tatsuya
University/institution
Hong Kong University of Science and Technology (Hong Kong)
University location
Hong Kong
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32407330
ProQuest document ID
3273632607
Document URL
https://www.proquest.com/dissertations-theses/understanding-emotion-gaze-during-visual-behavior/docview/3273632607/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic