Scaling Laws for Neural Language Models
https://arxiv.org/abs/2001.08361
Summary:
This research paper empirically investigates scaling laws for the performance of Transformer-based language models. The authors find that performance scales predictably as a power law with model size, dataset size, and compute used for training, while showing weak dependence on other architectural details. They establish equations that predict overfitting and training speed, enabling optimal compute budget...
Published 12/01/24