Life with AI Filipe Lauar
-
- Technology
-
In this podcast I explain some hard concepts of AI in a way that anyone can understand. I also show how AI is influencing our lives and we don’t know.
-
-
#80- Layer pruning and Mixture of Depths.
Hey guys, continuing the series of episodes about PEFT, in this episode I talk about inference optimization techniques for LLMs.
I talk about layer pruning, where we prune consecutive layers of the LLM without almost not losing model performance.
I also talk about Mixture of Depths, a similar technique to Mixture of Experts, where we have a router that choses which tokens will be processed in which layer of the LLM.
Paper MoD: https://arxiv.org/pdf/2404.02258.pdf
Paper layer pruning: https://arxiv.org/pdf/2403.17887v1.pdf
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai -
#79- LoRA and QLoRA.
Hey guys, this is the first episode in a series of episodes about PEFT, Parameter Efficient Fine Tuning. In this episode I talk about LoRA and QLoRA, two widely used methods that allowed us to fine tune LLMs way faster and in a single GPU without losing performance.
Video sobre QLoRA: https://www.youtube.com/watch?v=6l8GZDPbFn8
LoRA paper: https://arxiv.org/pdf/2106.09685.pdf
QLoRA paper: https://arxiv.org/pdf/2305.14314.pdf
Instagram do podcast: https://www.instagram.com/podcast.lifewithai
Linkedin do podcast: https://www.linkedin.com/company/life-with-ai -
#78- RAFT: Why just to use RAG if you can also fine tune?
Hello, in this episode I talk a Retrieval Aware Fine Tuning (RAFT), a paper that proposes a new technique to use both domain specific fine-tuning and RAG to improve the retrieval capabilities of LLMs.
In the episode I also talk about another paper that is called RAFT, but this time Reward rAnking Fine Tuning, which proposes a new technique to perform RLHF without the convergence problems of Reinforcement Learning.
Retrieval Aware Fine Tuning: https://arxiv.org/abs/2403.10131v1
Reward rAnking Fine Tuning: https://arxiv.org/pdf/2304.06767.pdf
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai -
#77- Ring Attention and 1M context window, is RAG dead?
Hello guys, in this episode I explain how we can scale the context window of an LLM to more than 1M tokens using Ring Attention. In the episode, I also discuss if RAG is dead or not based on these advancements in the context window.
Paper Lost in the Middle: https://arxiv.org/pdf/2307.03172.pdf
Gemini technical report: https://storage.googleapis.com/deepmind-media/gemini/gemini_v1_5_report.pdf
Paper Ring Attention: https://arxiv.org/pdf/2310.01889.pdf
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai -
#76- Solving problems using AI.
Hey guys, in the Brazilian version of the Podcast I interviewed Andre, he is an AI expert on IBM and we talked a lot about how to solve problems using AI.
Brains website: https://brains.dev/
Andre's Linkedin: https://www.linkedin.com/in/andrefelipelopes/
Brains' Linkedin: https://www.linkedin.com/company/brains-brazilian-ai-networks/
Instagram of the podcast: https://www.instagram.com/podcast.lifewithai
Linkedin of the podcast: https://www.linkedin.com/company/life-with-ai