RelativePositionalEncodingforTransformerswithLinearComplexityAntoineLiutkus1OndrˇejC´ıfka2Shih-LunWu345UmutS¸ims¸ekli6Yi-HsuanYang35Gae¨lRichard2AbstractFigure1.Examplesofattentionpatternsobs...
RelativeFisherInformationandNaturalGradientforLearningLargeModularModelsKeSun1FrankNielsen23AbstractTheFIMisnotinvariantanddependsontheparameteri-zation.WecanoptionallywriteI(Θ)asIΘ(Θ)toem-Fishe...