The Many Hats of Pixels: Supporting Human Interaction and Hierarchical Understanding in Segmentation

Abstract

Image segmentation, the task of delineating meaningful regions in visual data, has persisted as a central problem in computer vision. While recent advances using deep learning and transformer-based architectures have improved segmentation accuracy, current systems remain limited in their ability to adapt to diverse user interactions and represent the hierarchical, context-dependent nature of scenes. In practice, a single pixel may belong to an object, part, or subpart depending on task or user intent; yet most segmentation models operate at a fixed level of abstraction and rely on rigid input modalities.

This dissertation introduces segmentation methods that are hierarchical, interaction-aware, and user-centric. Motivated by practical research experiences in data annotation, human-computer interaction (HCI), and creative tools, the work addresses two key threads: (1) enabling flexible, multimodal user interaction in hybrid human-machine partnerships, and (2) modeling hierarchical relationships in natural images.

The first thread (i.e., supporting human-machine partnerships) of this dissertation presents a new dataset and model supporting varied input types (e.g., clicks, scribbles, shapes), enabling more intuitive interactions without requiring explicit user annotations. Additionally, a weakly-supervised fine-tuning framework for interactive segmentation is presented in this dissertation and improves segmentation consistency across user inputs, reducing cognitive load in creative workflows. The second thread (i.e., modeling hierarchical relationships) introduces the first hierarchical semantic segmentation dataset with annotations at object, part, and subpart levels. Building on this, this dissertation proposes the first model that leverages specialized tokens within a large language model to capture “is-part-of” relationships in a single inference pass.

Together, these contributions aim to reframe segmentation as a collaborative, context-aware process that better aligns with human perception and real-world needs.

Details

Business indexing term

Subject:

Artificial intelligence

Subject

Artificial intelligence;
Computer engineering;
Computer science;
Information technology

Classification

0800: Artificial intelligence
0489: Information Technology
0984: Computer science
0464: Computer Engineering

Identifier / keyword

Computer vision; Image segmentation; Human-computer interaction; Hybrid human-machine partnerships; Interactive segmentation

Title

The Many Hats of Pixels: Supporting Human Interaction and Hierarchical Understanding in Segmentation

Author

Myers-Dean, Josh

Number of pages

162

Publication year

2025

Degree date

2025

School code

0051

Source

DAI-B 87/2(E), Dissertation Abstracts International

ISBN

9798291574058

Advisor

Gurari, Danna

Committee member

Yeh, Tom; Roncone, Alessandro; Bovik, Alan; Price, Brian

University/institution

University of Colorado at Boulder

Department

Computer Science

University location

United States -- Colorado

Degree

Ph.D.

Source type

Dissertation or Thesis

Language

English

Document type

Dissertation/Thesis

Dissertation/thesis number

32170305

ProQuest document ID

3244235408

Document URL

https://www.proquest.com/dissertations-theses/many-hats-pixels-supporting-human-interaction/docview/3244235408/se-2?accountid=208611

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Database

ProQuest One Academic

The Many Hats of Pixels: Supporting Human Interaction and Hierarchical Understanding in Segmentation

Content area

Abstract

Details