PowerNorm:RethinkingBatchNormalizationinTransformersShengShen1ZheweiYao1AmirGholami1MichaelW.Mahoney1KurtKeutzer1Abstract1.IntroductionThestandardnormalizationmethodforneuralNormalizationhasbecomeo...