StochasticGradientMCMCMethodsforHiddenMarkovModelsYi-AnMa1NicholasJ.Foti1EmilyB.Fox1Abstractdistributioninthepresenceofsuchnoise.Significanthead-wayhasbeenmadeindevelopingsuchcorrectSG-MCMCStochast...
RelativeFisherInformationandNaturalGradientforLearningLargeModularModelsKeSun1FrankNielsen23AbstractTheFIMisnotinvariantanddependsontheparameteri-zation.WecanoptionallywriteI(Θ)asIΘ(Θ)toem-Fishe...
ProvableAlternatingGradientDescentforNon-negativeMatrixFactorizationwithStrongCorrelationsYuanzhiLi1YingyuLiang1AbstractBydoingso,onecanavoidcancellationofdifferentfea-turesandimproveinterpretabili...
LearningtoLearnwithoutGradientDescentbyGradientDescentYutianChen1MatthewW.Hoffman1SergioGo´mezColmenarejo1MishaDenil1TimothyP.Lillicrap1MattBotvinick1NandodeFreitas1Abstractdientdescent,evolutiona...
LearningGradientDescent:BetterGeneralizationandLongerHorizonsKaifengLv1ShunhuaJiang1JianLi1Abstract1.1.ExistingWorkTrainingdeepneuralnetworksisahighlynon-Toaddresstheaboveissue,apromisingapproachis...
LazifyingConditionalGradientAlgorithmsGa´borBraun1SebastianPokutta1DanielZink1AbstractAlgorithm1Frank-WolfeAlgorithm(Frank&Wolfe,ConditionalGradientalgorithms(alsooftencalled1956)Frank-Wolfealgori...
High-DimensionalVariance-ReducedStochasticGradientExpectation-MaximizationAlgorithmRongdaZhu1LingxiaoWang2ChengxiangZhai3QuanquanGu2Abstractstudies(Balakrishnanetal.,2014;Wangetal.,2014;Yi&Caramani...
GradientCoding:AvoidingStragglersinDistributedLearningRashishTandon1QiLei2AlexandrosG.Dimakis3NikosKarampatziakis4AbstractW1W2W3WeproposeanovelcodingtheoreticframeworkD1D2D3formitigatingstragglersi...
GradientProjectionIterativeSketchforLarge-ScaleConstrainedLeast-SquaresJunqiTang1MohammadGolbabaee1MikeE.Davies1Abstractgorithms,thefirststreamisthestochasticGradientde-scent(SGD)anditsvariance-red...
GradientBoostedDecisionTreesforHighDimensionalSparseOutputSiSi1HuanZhang2S.SathiyaKeerthi3DhruvMahajan4InderjitS.Dhillon5Cho-JuiHsieh2Abstractmulti-labellearningandmulti-classclassificationbelongto...
GloballyOptimalGradientDescentforaConvNetwithGaussianInputsAlonBrutzkus1AmirGloberson1Abstractcentattemptstobridgethisgapbetweentheoryandprac-tice.SeveralworksfocusonthegeometricpropertiesofDeeplea...
EvaluatingtheVarianceofLikelihood-RatioGradientEstimatorsSeiyaTokui12IsseiSato32Abstractforevariancereductioniscrucialforpracticallearning.However,fewthingsareknownaboutitstheoreticalas-Thelikeliho...
ConvergenceAnalysisofProximalGradientwithMomentumforNonconvexOptimizationQunweiLi1YiZhou1YingbinLiang1PramodK.Varshney1AbstractAlgorithm1APGInthiswork,weinvestigatetheacceleratedprox-Input:y1=x1=x0...
ConditionalAcceleratedLazyStochasticGradientDescentGuanghuiLan1SebastianPokutta1YiZhou1DanielZink1AbstractComparedtomostotherfirst-ordermethods,suchase.g.,Gradientdescentalgorithmsandacceleratedgra...
AsynchronousStochasticGradientDescentwithDelayCompensationShuxinZheng1QiMeng2TaifengWang3WeiChen3NenghaiYu2Zhi-MingMa4Tie-YanLiu3Abstract(Zhangetal.,2015;Chen&Huo,2016;Chenetal.,2016).Withthefastde...
AnAnalyticalFormulaofPopulationGradientfortwo-layeredReLUnetworkanditsApplicationsinConvergenceandCriticalPointAnalysisYuandongTian1AbstractandwhysimplemethodslikeGradientdescentcansolvethecomplica...