Content area

Abstract

Text-to-Thangka generation requires preserving both semantic accuracy and textural details. Current methods struggle with fine-grained feature extraction, multi-level feature integration, and discriminator overfitting due to limited Thangka data. We present HST-GAN, a novel framework combining parallel hybrid attention with differentiable symmetric augmentation. The architecture features a Parallel Spatial-Channel Attention module (PSCA) for precise localization of deity facial features and ritual object textures, along with a Hierarchical Feature Fusion Network (HLFN) for multi-scale alignment. The framework’s Differentiable Symmetric Augmentation (DiffAugment) dynamically adjusts discriminator inputs to prevent overfitting while improving generalization. On the T2IThangka dataset, HST-GAN achieves an Inception Score of 2.08 and reduces Fréchet Inception Distance to 87.91, demonstrating superior performance over baselines on the Oxford-102 benchmark.

Full text

Turn on search term navigation

© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.