Content area
The rapid evolution of generative models has revolutionized image and video processing, paving the way for innovative approaches in creation and optimization. This thesis explores the intersection of generative modeling and neural compression, emphasizing their roles in addressing the growing demands of high-quality media processing. Building upon three key works, we synthesize insights to present a unified perspective.
First, the foundational study on neural video compression through generative modeling demonstrates the utility of variational autoencoders and autoregressive flows for efficient rate-distortion tradeoffs. By introducing structured priors and temporal dependencies, this work establishes the viability of generative models in surpassing traditional codecs, marking a significant leap in neural video coding. Second, leveraging the strengths of diffusion probabilistic models for video generation, we address the challenge of forecasting high-dimensional video dynamics. This approach integrates deterministic and stochastic components, refining frame predictions with diffusion-based residual modeling to enhance perceptual quality and probabilistic accuracy. Third, the innovative application of conditional diffusion models in lossy image compression underscores a paradigm shift from deterministic decoders to diffusion-based architectures. By encoding semantic content as latent variables and reconstructing texture via diffusion, this framework achieves remarkable perceptual gains while maintaining competitive distortion metrics.
Collectively, these contributions illustrate the convergence of generative techniques in optimizing rate-distortion performance, enabling high-fidelity reconstructions, and fostering perceptually appealing outputs. This work situates itself within the broader research landscape by addressing the dual objectives of efficiency and realism in media compression and generation. Applications span diverse domains, including video conferencing, autonomous systems, and content delivery networks, underscoring the transformative potential of generative modeling in a data-driven era. Through the synthesis of creation and optimization, this thesis lays the groundwork for novel multimedia processing systems that are both generative and compression-ready.