CompressingGradientOptimizersviaCount-SketchesRyanSpring1AnastasiosKyrillidis1VijaiMohan2AnshumaliShrivastava12AbstractTraininglarge-scalemodelsefficientlyisachallengingtask.Therearenumerouspublica...
LearnedOptimizersthatScaleandGeneralizeOlgaWichrowska1NiruMaheswaranathan23MatthewW.Hoffman4SergioGo´mezColmenarejo4MishaDenil4NandodeFreitas4JaschaSohl-Dickstein1Abstractlearningshowthat,givensuf...