Content area
In this thesis, we address key limitations of conventional convolutional neural networks by introducing tunable Universal Wavelet Units (UwUs)—a family of wavelet-based downsampling modules designed to enhance computer vision models across tasks such as image classification, object detection, anomaly detection, and instance segmentation. These units employ learnable filter banks that integrate seamlessly into downstream components of modern architectures. We first demonstrate that combining maxpooling with high-frequency components from wavelet decomposition improves regression performance on retinal data, underscoring the importance of preserving fine structural detail. We then present OrthLatt-UwU, which relaxes perfect reconstruction constraints through an orthogonal lattice structure and achieves competitive results on CIFAR10, ImageNet1K, DTD, and MVTeAD. Building on this, we introduce BiorthLatt-UwU and LS-BiorthLatt-UwU, which offer greater modeling flexibility using biorthogonal filter banks implemented via lattice and lifting schemes, respectively. Additionally, we propose a stopband-energy constraint to suppress undesirable frequency responses while preserving salient features, further enhancing spatial fidelity. In medical imaging, UwU-integrated networks yield clinically meaningful improvements in several applications: estimating retinal sensitivity maps for patients with Retinitis Pigmentosa, predicting the success of Internal Limiting Membrane (ILM) removal following Epiretinal Membrane (ERM) surgery, and improving 3D retinal layer segmentation from OCT volumes by replacing maxpooling in MGU-Net. These results position UwUs as a versatile, high-precision framework for advancing both general-purpose vision systems and specialized biomedical applications.