%PDF-1.3 1 0 obj << /Kids [ 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 R ] /Type /Pages /Count 10 >> endobj 2 0 obj << /Subject (Neural Information Processing Systems http\072\057\057nips\056cc\057) /Publisher (Curran Associates\054 Inc\056) /Language (en\055US) /Created (2019) /EventType (Poster) /Description-Abstract (Adaptive gradient\055based optimizers such as Adagrad and Adam are crucial for achieving state\055of\055the\055art performance in machine translation and language modeling\056 However\054 these methods maintain second\055order statistics for each parameter\054 thus introducing significant memory overheads that restrict the size of the model being used as well as the number of examples in a mini\055batch\056 We describe an effective and flexible adaptive optimization method with greatly reduced memory overhead\056 Our method retains the benefits of per\055parameter adaptivity while allowing significantly larger models and batch sizes\056 We give convergence guarantees for our method\054 and demonstrate its effectiveness in training very large translation and language models with up to 2\055fold speedups compared to the state\055of\055the\055art\056) /Producer (PyPDF2) /Title (Memory Efficient Adaptive Optimization) /Date (2019) /ModDate (D\07220200213030326\05508\04700\047) /Published (2019) /Type (Conference Proceedings) /firstpage (9749) /Book (Advances in Neural Information Processing Systems 32) /Description (Paper accepted and presented at the Neural Information Processing Systems Conference \050http\072\057\057nips\056cc\057\051) /Editors (H\056 Wallach and H\056 Larochelle and A\056 Beygelzimer and F\056 d\047Alch\351\055Buc and E\056 Fox and R\056 Garnett) /Author (Rohan Anil\054 Vineet Gupta\054 Tomer Koren\054 Yoram Singer) /lastpage (9758) >> endobj 3 0 obj << /Type /Catalog /Pages 1 0 R >> endobj 4 0 obj << /Parent 1 0 R /Contents 14 0 R /Resources 15 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 40 0 R /Type /Page >> endobj 5 0 obj << /Parent 1 0 R /Contents 63 0 R /Resources 64 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 112 0 R /Type /Page >> endobj 6 0 obj << /Parent 1 0 R /Contents 208 0 R /Resources 209 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 217 0 R /Type /Page >> endobj 7 0 obj << /Parent 1 0 R /Contents 258 0 R /Resources 259 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 272 0 R /Type /Page >> endobj 8 0 obj << /Parent 1 0 R /Contents 348 0 R /Resources 349 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 350 0 R /Type /Page >> endobj 9 0 obj << /Parent 1 0 R /Contents 411 0 R /Resources 412 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 425 0 R /Type /Page >> endobj 10 0 obj << /Parent 1 0 R /Contents 486 0 R /Resources 487 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 520 0 R /Type /Page >> endobj 11 0 obj << /Parent 1 0 R /Contents 576 0 R /Resources 577 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 596 0 R /Type /Page >> endobj 12 0 obj << /Parent 1 0 R /Contents 632 0 R /Resources 633 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 640 0 R /Type /Page >> endobj 13 0 obj << /Parent 1 0 R /Contents 656 0 R /Resources 657 0 R /Rotate 0 /MediaBox [ 0 0 612 792 ] /Annots 665 0 R /Type /Page >> endobj 14 0 obj << /Length 4631 /Filter /FlateDecode >> stream xZێ}W%0__`;N؎'68gX"gy:ȯ:su,{WW:U7k/,Z/ / S/.3/"e~e|aXxY^eoNކ^=x?x/S]Swh_= oOm۾B˩zƾC݀l6KK?ͼIN;zqYI)QEeQzUkwQQexqm1>7!RH
!v8\\zǩztӜ}}=a\VG"e ,\i1 |xp9`0;-%R|}'.qyNx8>h=Իv^C(fb̀B
F[
,,ͨ