Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs.
I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance.
I also talk about Mixture of Depths, a...
Published 04/18/24