Abstract

In this thesis, we develop three Visible-to-Thermal (V2T) facial translation algorithms based on Generative Adversarial Networks (GAN) that given a visible image, generates or translates, it into its thermal pair. In particular, the Visible-to-Thermal Facial GAN (VTF-GAN) operates in No-, Low-, and Hard-Light visible settings by learning a Fourier Transform Loss. We also offer the first V2T Facial Diffusion Model (VTF-Diff) that offers promising results, competitive to the VTF-GAN. However, the generation of a thermal face is meaningless if it misconstrues the individual’s facial identity. This occurs when VT pairs are misaligned, which is a common occurrence during data collection when practitioners capture images using two different cameras (e.g visible and thermal cameras). As a result, we develop an unsupervised VT image registration algorithm called Vista Morph that incorporates generative flows to learn a deformation field between cross spectral pairs. Our work beats the state-of-the-art and offers the first VT facial application of image registration. We demonstrate through biometric thermal vessel extraction, that V2T translation using Vista Morph retains subject identity better than without. Further, Vista Morph works on automated driving street scene data and is robust to geometric warps and erasure.

The generative works of VTF-GAN and Vista Morph culminate in its application on a real-life medical dataset called Intelligent Sight & Sound (ISS), a clinical trial of cancer patient pain. In collaboration with the U.S. National Institutes of Health (NIH), we trained our models on 29,500 VT cancer facial datasets, demonstrating that our approaches succeed under spontaneous settings, challenging head poses, poor resolution, and weak lighting conditions. To augment this work, we also conducted a deep dive into the NIH ISS dataset introducing it as the first of its kind. We proved its utility by developing several multimodal pain detection models to predict chronic cancer pain, a far more challenging scenario than conventional acute pain detection that exists today.

Details

Title
Multimodal Deep Generative Models for Cross-Spectral Image Analysis
Author
Ordun, Catherine Y.
Publication year
2023
Publisher
ProQuest Dissertations & Theses
ISBN
9798381158588
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
2901360946
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.