Document Preview Unavailable

DHA: Learning Decoupled-Head Attention from Transformer Checkpoints via Adaptive Heads Fusion

Chen, Yilong; Zhang, Linhao; Shang, Junyuan; Zhang, Zhenyu; Liu, Tingwen; et al.  arXiv.org, Dec 7, 2024.

You might have access to this document