Episodes
Scaling Laws for Neural Language Models https://arxiv.org/abs/2001.08361 Summary: This research paper empirically investigates scaling laws for the performance of Transformer-based language models. The authors find that performance scales predictably as a power law with model size, dataset size, and compute used for training, while showing weak dependence on other architectural details. They establish equations that predict overfitting and training speed, enabling optimal compute budget...
Published 12/01/24
Published 12/01/24
Published 10/13/23