Chinchilla is a massive language released by DeepMind as part of a recent paper that focuses on scaling large language models in a compute-optimal manner. It outperforms recent models like GPT-3, Gopher, and Megatron-Turing NLG that use hundreds of billions of parameters with only 70 billion parameters. They achieve this by training 400 large models to find the optimal ratio of parameters and amount of training data to train a model given a computation budget.
Outline:
0:00 – Overview
1:51 – Paper Intro
6:15 – Methods
18:14 – Scaling Implications
23:43 – Chinchilla Overview
25:48 – Chinchilla Performance
29:49 – Summary
30:07 – Thoughts & Critiques
Paper (Training Compute-Optimal Large Language Models):