TheHeavy-TailPhenomenoninSGDMertGürbüzbalaban1UmutS¸ims¸ekli2LingjiongZhu3Abstract1.IntroductionInrecentyears,variousnotionsofcapacityandThelearningprobleminneuralnetworkscanbeexpressedascomple...
Byzantine-ResilientHigh-DimensionalSGDwithLocalIterationsonHeterogeneousDataDeepeshData1SuhasDiggavi1Abstract(Deanetal.,2012)(e.g.,trainingamachinelearningmodelwithoutcollectingtheclients’data,whi...
BASGD:BufferedAsynchronousSGDforByzantineLearningYi-RuiYang1Wu-JunLi1AbstractLeeetal.,2017;Lianetal.,2017;Zhaoetal.,2017;Sunetal.,2018;Wangnietal.,2018;Zhaoetal.,2018;ZhouDistributedlearninghasbeco...
Bias-VarianceReducedLocalSGDforLessHeterogeneousFederatedLearningTomoyaMurata12TaijiSuzuki23Abstractsetofthewholedatasetwhichisnotexplicitlyexchanged.Inrecentfederatedlearning,introducedbyKonecˇny...
AcceleratingGossipSGDwithPeriodicGlobalAveragingYimingChen1KunYuan1YingyaZhang1PanPan1YinghuiXu1WotaoYin1AbstractMETHODEPOCHACC.%TIME(HRS.)CommunicationoverheadhindersthescalabilityPARALLELSGD12076...
SGDLearnsOne-LayerNetworksinWGANsQiLei1JasonD.Lee2AlexandrosG.Dimakis1ConstantinosDaskalakis3Abstractbutionwithlowsamplecomplexity.However,theseresultscannotbealgorithmicallyattainedviapracticalGAN...
OrthogonalizedSGDandNestedArchitecturesforAnytimeNeuralNetworksChengchengWan1HenryHoffmann1ShanLu1MichaelMaire1Abstract...O1WeproposeanovelvariantofSGDcustomizedInput...O2fortrainingnetworkarchitec...
Moniqua:ModuloQuantizedCommunicationinDecentralizedSGDYuchengLu1ChristopherDeSa1Abstract(Lietal.,2014a;b)ortheMPIAllReduceoperation(Groppetal.,1999).Suchadesign,however,putsheavypressureRunningStoc...
MomentumImprovesNormalizedSGDAshokCutkosky12HarshMehta1AbstractthatSGD’sconvergencerateisindependentofthedimen-sionoftheproblem,allowingittoscalemoreeasilytoWeprovideanimprovedanalysisofnormalized...
LandscapeConnectivityandDropoutStabilityofSGDSolutionsforOver-parameterizedNeuralNetworksAlexanderShevchenko1MarcoMondelli1Abstractanover-parameterizednetworkviaSGDtypicallyleadstoasolutionthathass...
FromLocalSGDtoLocalFixed-PointMethodsforFederatedLearningGrigoryMalinovsky1DmitryKovalev2ElnurGasanov2LaurentCondat2PeterRichtárik2Abstractoneplaceistocommunicate,tokeepmovingtowardsthesolutionoft...
FractionalUnderdampedLangevinDynamics:RetargetingSGDwithMomentumunderHeavy-TailedGradientNoiseUmutS¸ims¸ekli12LingjiongZhu3YeeWhyeTeh2MertGu¨rbu¨zbalaban4Abstract1.IntroductionStochasticgradien...
ClosingtheconvergencegapofSGDwithoutreplacementShashankRajput1AnantGupta1DimitrisPapailiopoulos1Abstractthechoiceofasingleorasubsetofsampledfunctionsfi,andαrepresentsthestepsize.With-andwithoutre-...
AdaScaleSGD:AUser-FriendlyAlgorithmforDistributedTrainingTylerB.Johnson†1PulkitAgrawal†1HaijieGu1CarlosGuestrin1Abstracttrainingalgorithms.Duringeachiteration,SGDappliesasmallandnoisyupdatetothem...
AUnifiedTheoryofDecentralizedSGDwithChangingTopologyandLocalUpdatesAnastasiaKoloskova1NicolasLoizou2SadraBoreiri1MartinJaggi1SebastianU.Stich1Abstractetal.,2016;2017;Kairouzetal.,2019)hasemerged,bu...
SGDwithoutReplacement:SharperRatesforGeneralSmoothConvexFunctionsDheerajNagaraj1PraneethNetrapalli2PrateekJain2Abstractf(x;i):Rd→Risthei-thcomponentfunction.Forex-ample,instandardERManddeeplearnin...
SGD:GeneralAnalysisandImprovedRatesRobertM.Gower1NicolasLoizou2XunQian3AlibekSailanbayev3EgorShulgin4PeterRichta´rik324Abstractwhereeachfi:Rd→Rissmooth(butnotnecessarilyconvex).Further,weassumeth...
RandomShufflingBeatsSGDafterFiniteEpochsJeffHaoChen1SuvritSra2Abstract1.IntroductionAlong-standingprobleminoptimizationisWefocusonminimizationofthefinite-sumprovingthatRANDOMSHUFFLE,thewithout-repl...
SGDandHogwild!ConvergenceWithouttheBoundedGradientsAssumptionLamM.Nguyen12PhuongHaNguyen3MartenvanDijk3PeterRichta´rik4KatyaScheinberg1MartinTaka´cˇ1Abstract1.IntroductionStochasticgradientdesce...