OntheImplicitBiasofinitializationShape:BeyondInfinitesimalMirrorDescentShaharAzulay1EdwardMoroshko2MorShpigelNacson2BlakeWoodworth3NathanSrebro3AmirGloberson1DanielSoudry2Abstractparameterizedmodel...
OntheExplicitRoleofinitializationontheConvergenceandImplicitBiasofOverparametrizedLinearNetworksHanchengMin12SalmaTarmoun13Rene´Vidal14EnriqueMallada12Abstractwithoutexplicitregularization,enjoysg...
ImprovingTransformerOptimizationThroughBetterinitializationXiaoShiHuang12FelipePe´rez1JimmyBa32MaksimsVolkovs1Abstractetal.,2019;Sunetal.,2019).Despitethebroadapplications,optimizationintheTransfo...
GradientdescentwithidentityinitializationefficientlylearnspositivedefinitelineartransformationsbydeepresidualnetworksPeterL.Bartlett1DavidP.Helmbold2PhilipM.Long3Abstract1.IntroductionWeanalyzealgo...