Content area

Abstract

Image segmentation, the task of delineating meaningful regions in visual data, has persisted as a central problem in computer vision. While recent advances using deep learning and transformer-based architectures have improved segmentation accuracy, current systems remain limited in their ability to adapt to diverse user interactions and represent the hierarchical, context-dependent nature of scenes. In practice, a single pixel may belong to an object, part, or subpart depending on task or user intent; yet most segmentation models operate at a fixed level of abstraction and rely on rigid input modalities.

This dissertation introduces segmentation methods that are hierarchical, interaction-aware, and user-centric. Motivated by practical research experiences in data annotation, human-computer interaction (HCI), and creative tools, the work addresses two key threads: (1) enabling flexible, multimodal user interaction in hybrid human-machine partnerships, and (2) modeling hierarchical relationships in natural images. 

The first thread (i.e., supporting human-machine partnerships) of this dissertation presents a new dataset and model supporting varied input types (e.g., clicks, scribbles, shapes), enabling more intuitive interactions without requiring explicit user annotations. Additionally, a weakly-supervised fine-tuning framework for interactive segmentation is presented in this dissertation and improves segmentation consistency across user inputs, reducing cognitive load in creative workflows. The second thread (i.e., modeling hierarchical relationships) introduces the first hierarchical semantic segmentation dataset with annotations at object, part, and subpart levels. Building on this, this dissertation proposes the first model that leverages specialized tokens within a large language model to capture “is-part-of” relationships in a single inference pass. 

Together, these contributions aim to reframe segmentation as a collaborative, context-aware process that better aligns with human perception and real-world needs.


Details

1010268
Business indexing term
Title
The Many Hats of Pixels: Supporting Human Interaction and Hierarchical Understanding in Segmentation
Number of pages
162
Publication year
2025
Degree date
2025
School code
0051
Source
DAI-B 87/2(E), Dissertation Abstracts International
ISBN
9798291574058
Committee member
Yeh, Tom; Roncone, Alessandro; Bovik, Alan; Price, Brian
University/institution
University of Colorado at Boulder
Department
Computer Science
University location
United States -- Colorado
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32170305
ProQuest document ID
3244235408
Document URL
https://www.proquest.com/dissertations-theses/many-hats-pixels-supporting-human-interaction/docview/3244235408/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic