SimPPL

GitHub - leoheuler/flashtensors · GitHub

flashtensors: run 100 large models on a single GPU with minimal time-to-first-token impact via tensor swap.

Source
https://github.com/leoheuler/flashtensors
Tags
infrastructurellmsgithub

Permalink: simppl.org/library/item/github-leoheuler-flashtensors-github-ccf2d599

This is a SimPPL canonical link to a reading shared in our newsletter. Browse the rest at simppl.org/library.