How to fine-tune LLMs for with Tunix

time
a month ago
view
1 views

original uri https://www.youtube.com/watch?v=8essLqkBsX8

Unlock the full potential of your large language models with Tunix, an innovative open-source JAX-based library for post-training. This video explains the two-stage LLM training process, focusing on how Tunix excels in the post-training phase to instill strong reasoning capabilities. See a practical example of using Tunix with reinforcement learning to improve math problem-solving, leveraging its efficiency on accelerators like Google TPUs. Improve your LLM performance with this powerful tool.

Resources: GitHib for Tunix → https://goo.gle/4854A9X Tunix GRPO example → https://goo.gle/46M9UwF Additional examples → https://goo.gle/4nCfIjE DeepSeekMath(GRPO) paper → https://goo.gle/3IA5ukt

Chapters: 0:00 - Introduction to Tunix 0:17 - Understanding LLM training stages 0:35 - Tunix: A JAX-based LLM post-training library 0:50 - Exploring Tunix's capabilities and supported models 1:05 - Reinforcement learning for LLMs overview 1:25 - RLVR for math reasoning demo (GSM8K dataset) 1:50 - Setting up and training with GRPO 2:05 - Tunix performance results and benefits 2:20 - Getting involved with Tunix

Subscribe to Google for Developers → https://goo.gle/developers

Speaker: Wei Wei Products Mentioned: Google AI

Loading comments...
affpapa
sigma-africa
sigma-asia
sigma-europe

Licensed