Multimodal Deep Generative Models for

Document 1 of 1

More like this

Full Text
Dissertation or Thesis
Open Dissertation

Abstract

In this thesis, we develop three Visible-to-Thermal (V2T) facial translation algorithms based on Generative Adversarial Networks (GAN) that given a visible image, generates or translates, it into its thermal pair. In particular, the Visible-to-Thermal Facial GAN (VTF-GAN) operates in No-, Low-, and Hard-Light visible settings by learning a Fourier Transform Loss. We also offer the first V2T Facial Diffusion Model (VTF-Diff) that offers promising results, competitive to the VTF-GAN. However, the generation of a thermal face is meaningless if it misconstrues the individual’s facial identity. This occurs when VT pairs are misaligned, which is a common occurrence during data collection when practitioners capture images using two different cameras (e.g visible and thermal cameras). As a result, we develop an unsupervised VT image registration algorithm called Vista Morph that incorporates generative flows to learn a deformation field between cross spectral pairs. Our work beats the state-of-the-art and offers the first VT facial application of image registration. We demonstrate through biometric thermal vessel extraction, that V2T translation using Vista Morph retains subject identity better than without. Further, Vista Morph works on automated driving street scene data and is robust to geometric warps and erasure.

The generative works of VTF-GAN and Vista Morph culminate in its application on a real-life medical dataset called Intelligent Sight & Sound (ISS), a clinical trial of cancer patient pain. In collaboration with the U.S. National Institutes of Health (NIH), we trained our models on 29,500 VT cancer facial datasets, demonstrating that our approaches succeed under spontaneous settings, challenging head poses, poor resolution, and weak lighting conditions. To augment this work, we also conducted a deep dive into the NIH ISS dataset introducing it as the first of its kind. We proved its utility by developing several multimodal pain detection models to predict chronic cancer pain, a far more challenging scenario than conventional acute pain detection that exists today.

Details

Title

Multimodal Deep Generative Models for Cross-Spectral Image Analysis

Author

Ordun, Catherine Y.

Publication year

2023

Publisher

ProQuest Dissertations & Theses

ISBN

9798381158588

Source type

Dissertation or Thesis

Language of publication

English

ProQuest document ID

2901360946

Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.

Multimodal Deep Generative Models for Cross-Spectral Image Analysis

Jump to:

Abstract

Details

Suggested sources