- Code release: https://github.com/microsoft/torchscale
- Sep 2022: accepted by NeurIPS 2022
- April 2022: release preprint of X-MoE - On the Representation Collapse of Sparse Mixture of Experts
title={On the Representation Collapse of Sparse Mixture of Experts},
author={Zewen Chi and Li Dong and Shaohan Huang and Damai Dai and Shuming Ma and Barun Patra and Saksham Singhal and Payal Bajaj and Xia Song and Xian-Ling Mao and Heyan Huang and Furu Wei},
booktitle={Advances in Neural Information Processing Systems},