TheVariationalPredictiveNaturalGradientDaTang1RajeshRanganath2Abstractfamily.Thevariationalfamilyplusthemodeltogetherde-finethevariationalobjective.ThevariationalobjectivecanVariationalinferencetra...
TheAnisotropicNoiseinStochasticGradientDescent:ItsBehaviorofEscapingfromSharpMinimaandRegularizationEffectsZhanxingZhu123JingfengWu1BingYu1LeiWu1JinwenMa1Abstract90Understandingthebehaviorofstochas...
StochasticGradientPushforDistributedDeepLearningMahmoudAssran12NicolasLoizou13NicolasBallas1MikeRabbat1Abstracttributedtrainingofdeepnetworks(Goyaletal.,2017;Lietal.,2014).Workernodescomputelocalmi...
SimpleStochasticGradientMethodsforNon-SmoothNon-ConvexRegularizedOptimizationMichaelR.Metel1AkikoTakeda12Abstractwherefj(w)=F(w,ξj)andhasaLipschitzcontinuousGradient.Ourworkfocusesonstochasticgrad...
Semi-CyclicStochasticGradientDescentHubertEichner1TomerKoren1H.BrendanMcMahan1NathanSrebro2KunalTalwar1AbstractSGD,typicallyafewhundreddevicesarechosenrandomlybytheservertoparticipate;critically,ho...
RiemannianadaptivestochasticGradientalgorithmsonmatrixmanifoldsHiroyukiKasai1PratikJawanpuria2BamdevMishra2AbstractADAM(Kingma&Ba,2015),arguablythemostpopularadaptiveGradientmethod,additionallyempl...
QuantileSteinVariationalGradientDescentforBatchBayesianOptimizationChengyueGong1JianPeng2QiangLiu1AbstractorexpensiveobjectivefunctionasarandomvariableandleverageBayesianinference,typicallywithaGau...
OverparameterizedNonlinearLearning:GradientDescentTakestheShortestPath?SametOymak1MahdiSoltanolkotabi2Abstractparameterizedbyθ∈Rptoatrainingdatasetofninput-Manymodernlearningtasksinvolvefittingno...
NonlinearSteinVariationalGradientDescentforLearningDiversifiedMixtureModelsDilinWang1QiangLiu1Abstractwhere✓icanbethecomponentparametersinmixturemod-elsoradditivemodels,ortheposteriorsamplesinBaye...
NonlinearDistributionalGradientTemporal-DifferenceLearningChaoQu1ShieMannor2HuanXu34Abstractintermediatesteptogenerategoodcontrolpolicy(Gelly&Silver,2008;Tesauro,1992).ThevaluefunctionisknownWedevi...
Globalconvergenceofneuronbirth-deathdynamicsGrantM.Rotskoff1SamyJelassi23JoanBruna12EricVanden-Eijnden1AbstractticGradientdescentconvergeasymptoticallytothetargetfunctioninthelargedatalimit.Neuraln...
ModelBasedConditionalGradientMethodwithArmijo-likeLineSearchYuraMalitsky1PeterOchs2AbstractlargestsingularvalueoftheGradientthatdefinesthelinearfunction.Incontrast,relatedproximalminimizationalgo-T...
Learning-to-LearnStochasticGradientDescentwithBiasedRegularizationGiuliaDenevi12CarloCiliberto34RiccardoGrazzi14MassimilianoPontil14Abstracttasksfromaprescribedfamily.Tohighlightthedifferencebetwee...
LearningaCompressedSensingMeasurementMatrixviaGradientUnrollingShanshanWu1AlexandrosG.Dimakis1SujaySanghavi1FelixX.Yu2DanielHoltmann-Rice2DmitryStorcheus2AfshinRostamizadeh2SanjivKumar2AbstractIfd>...
GradientDescentFindsGlobalMinimaofDeepNeuralNetworksSimonS.Du1JasonD.Lee2HaochuanLi34LiweiWang54XiyuZhai6AbstractThesecondmysteriousphenomenonintrainingdeepneuralnetworksis“deepernetworksareharder...
ErrorFeedbackFixesSignSGDandotherGradientCompressionSchemesSaiPraneethKarimireddy1QuentinRebjock1SebastianU.Stich1MartinJaggi1AbstractAlgorithm1EF-SIGNSGD(SIGNSGDwithError-Feedb.)Sign-basedalgorith...
EscapingSaddlePointswithAdaptiveGradientMethodsMatthewStaib12SashankReddi3SatyenKale3SanjivKumar3SuvritSra1AbstractAdagradupdatestheparametersinthefollowingmanner:AdaptivemethodssuchasAdamandRMSPro...
EfficientDictionaryLearningwithGradientDescentDarGilboa12SamBuchanan32JohnWright32Abstractthesemaybehighlysuboptimal,sincenonconvexoptimiza-tionisingeneralanNP-hardprobem(Bertsekas,1999).Randomlyin...
DOUBLESQUEEZE:ParallelStochasticGradientDescentwithDouble-passError-CompensatedCompressionHanlinTang1XiangruLian1ChenYu1TongZhang2JiLiu31Abstract1.IntroductionAstandardapproachinlargescalemachinele...