Scaling Laws for Neural Language Models
https://arxiv.org/abs/2001.08361
Summary:
This research paper empirically investigates scaling laws for the performance of Transformer-based language models. The authors find that performance scales predictably as a power law with model size, dataset...
Published 12/01/24