Content area

Abstract

Self-similarity within and across video frames is used in today’s best methods for image/video restoration and generation in the form of attention and warping modules within deep neural networks (DNNs). Self-similarity is a microcosm of the greater mission to understand images by first understanding components within the image. While this design philosophy mirrors how generic DNNs build features, DNNs are at the mercy of many practical limitations such as available training data, specified loss function, network size, and training time. Even when properly scaled up, DNNs are still limited by observed correlations. Robust generalization necessitates that DNNs learn underlying scientific principles, but today’s DNNs have not demonstrated this understanding. In short, DNNs should be learning scientific relationships, but they do not because science is not in the data. This limitation is the inspiration for a larger research goal to shift DNNs from data-only learning to data-driven learning by designing architectures that explicitly incorporate assumptions about our data via inductive bias. This thesis contributes new ideas to principally improve DNNs for images and videos by incorporating our knowledge of self-similarity into modules.

In Chapter 2, we hypothesize the attention operator within DNNs for video denoising acts as an optimal denoiser rather than a complicated or abstract transformation of features. Using this connection, we design a new search module, the shifted neighborhood search, to improve the space-time attention module. This search step is a method to identify selfsimilar regions. We show this simple grid search is of higher quality than existing DNN alternatives. Our particular implementation of the grid search is computationally efficient, and it is accompanied by a user-friendly Python+Pytorch package. Our findings suggest a perhaps obvious greater lesson:explicitly computing a desired quantity is better than learning it from data. The success of attention modules suggest sparse, data-dependent memory access is important, but rather than learn how to run this search we can use simple assumptions about images (self-similarity) to improve it.

As Chapter 2 demonstrates the importance of selecting the best neighbors, Chapter 3 presents a new way to use these selected neighbors. Ordinarily, an attention module reweights each point according to the similarity between the query point and a grid of key points. However, in the presence of noise, this similarity is unreliable. The impact of noise can be mitigated by first clustering pixels into deformably-shaped, self-similar regions. This chapter proposes a re-weighting step to augment neighborhood attention using these clusters of pixels, called superpixels. By viewing superpixel similarities as part of the image formation process, we show this re-weighted attention operator corresponds to an optimal denoiser that is a re-weighted variation of the naive one.

With the hope of extending our single-image superpixel method to space-time, we noticed the fastest space-time superpixels methods execute at only about two frames per second. So in Chapter 4, we present a new method for space-time superpixels which runs at nearly 60 frames per second. We estimate space-time superpixels by hill-climbing to a local mode of a Dirichlet-Process Gaussian Mixture Model (DP-GMM) conditioned on the previous frame’s superpixel information. The DP-GMM model allows for principled splitting and merging of superpixels, which can explain disocclusion due to motion and allows the number of superpixels to adapt to the image’s content. While alternative methods confine each superpixel to a particular square grid within the image, our space-time superpixels are not restricted in this way. Hence our space-time superpixels are “off the grid.” 

Details

1010268
Business indexing term
Title
Self-Similarity in Deep Neural Network Modules for Images and Videos
Number of pages
130
Publication year
2025
Degree date
2025
School code
0183
Source
DAI-B 87/1(E), Dissertation Abstracts International
ISBN
9798290635477
Committee member
Inouye, David; Buzzard, Greg
University/institution
Purdue University
University location
United States -- Indiana
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32124072
ProQuest document ID
3256494190
Document URL
https://www.proquest.com/dissertations-theses/self-similarity-deep-neural-network-modules/docview/3256494190/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic