Event Information |
CS Machine Learning Seminar: Faster Neural Network Training, Algorithmically CS Machine Learning Seminar Jonathan Frankle Zoom link: https://umd.zoom.us/j/95197245230?pwd=cDRlVWRVeXBHcURGQkptSHpIS0VGdz09 Password: 828w Abstract I will argue that we should exploit the approximate nature of neural network training to change the math of training in order to improve efficiency. I will discuss how we have put this approach into practice at MosaicML, including the empirical approach to research that we take, the dozens of algorithmic changes we have studied (which are freely available open source), the science behind how these changes interact with each other (the composition problem), and how we evaluate whether these changes have been effective. I will also detail several surprises we have encountered and lessons we have learned along the way. In the year since we began this work in earnest, we have reduced the training times of standard computer vision benchmarks (ResNet-50 on ImageNet, DeepLab-v3 on ADE-20K) by 5-7x and standard language models (BERT and GPT on C4) by 2-4x, and we believe we are just scratching the surface. Biography Relevant references:
* [2110.00476] ResNet strikes back: An improved training procedure in timm (arxiv.org) * Mosaic ResNet Recipe Overview * Mosaic ResNet Recipe Deep Dive * Mosaic Research Methodology
|