Junxian He on X: \"We replicated the DeepSeek-R1-Zero and DeepSeek-R1 training on 7B model with only 8K examples, the...
This is a SimPPL canonical link to a reading shared in our newsletter. Browse the rest at simppl.org/library.
This is a SimPPL canonical link to a reading shared in our newsletter. Browse the rest at simppl.org/library.