Document Preview Unavailable

MixKD: Towards Efficient Distillation of Large-scale Language Models

Liang, Kevin J; Weituo Hao; Shen, Dinghan; Zhou, Yufan; Chen, Weizhu; et al.  arXiv.org, Mar 17, 2021.

You might have access to this document