Fix typo in MoE note. #27

fedelebron · 2025-02-07T18:42:13Z

Also TeX'd some variables.

jacobaustin123 · 2025-02-07T20:02:34Z

transformers.md

@@ -251,9 +251,9 @@ So the takeaway is that **dot-product attention FLOPs only become dominant durin

 ### Sparsity and Mixture-of-Experts

-We'd be remiss not to briefly discuss Mixture of Experts (MoE) models<d-cite key="moe"></d-cite>, which replace the single dense MLP blocks in a standard Transformer with a set of independent MLPs that can be dynamically routed between. To a first approximation, **an MoE is a dense model with E MLP blocks per layer**, instead of just one. Each token activates k of these experts, typically k=2. This increases the parameter count by O(E), while keeping the total number of activated parameters roughly the same as the dense model.
+We'd be remiss not to briefly discuss Mixture of Experts (MoE) models<d-cite key="moe"></d-cite>, which replace the single dense MLP blocks in a standard Transformer with a set of independent MLPs that can be dynamically routed between. To a first approximation, **an MoE is a dense model with E MLP blocks per layer**, instead of just one. Each token activates $k$ of these experts, typically $k=2$. This increases the parameter count by $O(E)$, while multipling the total number of activated parameters per inference by $k$, compared with the dense version.


multiplying

also "activated parameters per token"

Fix typo in MoE note.

e7767db

fedelebron requested a review from jacobaustin123 February 7, 2025 19:31

jacobaustin123 approved these changes Feb 7, 2025

View reviewed changes

Typo fix to a typo fix to a ...

ef9a8f0

fedelebron merged commit 9b1f5ea into jax-ml:main Feb 7, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix typo in MoE note. #27

Fix typo in MoE note. #27

fedelebron commented Feb 7, 2025

jacobaustin123 Feb 7, 2025

jacobaustin123 Feb 7, 2025

Fix typo in MoE note. #27

Fix typo in MoE note. #27

Conversation

fedelebron commented Feb 7, 2025

jacobaustin123 Feb 7, 2025

Choose a reason for hiding this comment

jacobaustin123 Feb 7, 2025

Choose a reason for hiding this comment