Content area

Abstract

This dissertation presents research findings that significantly advance computer vision and artificial intelligence, specifically in automatic diet assessment. The work introduces novel methodologies across three key areas: robust image classification, a generative model for image-to-image translation, and a foundational analysis of generative adversarial networks (GANs).

The dissertation proposes a novel GAN-based structure specifically designed for shape-preserving image-to-image translation of food images. This architecture, inspired by recent advancements, ensures that generated food images not only appear visually realistic but also maintain the essential shape and structure of the original food items. By integrating a specialized shape preservation module, this architecture enables the synthesis of diverse food images while retaining the original forms. It improves training data sets and improves downstream tasks such as food recognition and volume estimation in automated dietary assessment systems.

This dissertation also provides a detailed analysis that clarifies the fundamental mechanisms that a vanilla GAN could effectively perform image-to-image translation tasks with appropriate loss functions. This investigation aligns with insights into the inherent relationship between GANs and autoencoders. It demonstrates how the adversarial training process compels the generator to learn mappings that preserve common structural and content features between the input and target domains. When properly configured with a sufficiently capable discriminator, the training process could work without complex additional penalty terms. This analysis highlights the powerful, often understated, role of core GAN components in facilitating image-to-image translation.

Finally, a novel image classification algorithm is developed for the real-world dietary assessment, addressing the critical challenge of accurately identifying food items in complex images. This work aims to distinguish images that are particularly captured in low- and middle-income countries (LMICs). Building on existing work, this algorithm leverages a composite machine learning approach, combining deep neural networks (DNNs) with shallow learning networks (SLNs) via a probabilistic interface. This hybrid architecture effectively handles variations in illumination, resolution, and diverse food presentations. Significantly reduces the manual burden of data review by filtering non-food content.

Collectively, this dissertation contributes significantly to enhancing the accuracy, efficiency, and fundamental understanding of AI-driven solutions for automatic dietary assessment.

Details

1010268
Business indexing term
Title
Shape-Constrained Food Image Generation and Mechanistic Insights Into GAN-Based Image-to-Image Translation
Author
Number of pages
123
Publication year
2025
Degree date
2025
School code
0178
Source
DAI-B 87/3(E), Dissertation Abstracts International
ISBN
9798293808908
Committee member
Can-Cimino, Azime; Dallal, Ahmed; Zeng, Bo; Zhan, Liang
University/institution
University of Pittsburgh
Department
Electrical and Computer Engineering
University location
United States -- Pennsylvania
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32121520
ProQuest document ID
3246931161
Document URL
https://www.proquest.com/dissertations-theses/shape-constrained-food-image-generation/docview/3246931161/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic