SimPPL

xjdr on X: \"# Why Training MoEs is So Hard recently, i have found myself wanting a small, research focused training r...

xjdr on why training MoEs under 20B params is hard: flop efficiency, load-balancing/router stability, and data quality/quantity.

Source
https://x.com/_xjdr/status/1997459906719547535?s=12
Tags
trainingtwitter

Permalink: simppl.org/library/item/xjdr-on-x-why-training-moes-is-so-hard-recently-i-have-found-myself-wa-37b1486c

This is a SimPPL canonical link to a reading shared in our newsletter. Browse the rest at simppl.org/library.