Trainingdata-efficientimageTransformers&distillationthroughattentionHugoTouvron12MatthieuCord12MatthijsDouze1FranciscoMassa1AlexandreSablayrolles1Herve´Je´gou1Abstract⚗↑⚗Recently,neuralnetwork...
ThinkingLikeTransformersGailWeiss1YoavGoldberg23EranYahav1Abstractatransformeroperatesatahigher-levelofabstraction,rea-soningintermsofacompositionofsequenceoperationsWhatisthecomputationalmodelbehi...
RelativePositionalEncodingforTransformerswithLinearComplexityAntoineLiutkus1OndrˇejC´ıfka2Shih-LunWu345UmutS¸ims¸ekli6Yi-HsuanYang35Gae¨lRichard2AbstractFigure1.Examplesofattentionpatternsobs...
OmniNet:OmnidirectionalRepresentationsfromTransformersYiTay1MostafaDehghani2VamsiAribandi13JaiGupta1PhilipPham1ZhenQin1DaraBahri1Da-ChengJuan1DonaldMetzler1Abstractkeydefiningcharacteristicsinthese...
LinearTransformersAreSecretlyFastWeightProgrammersImanolSchlag∗1KazukiIrie∗1Ju¨rgenSchmidhuber1Abstractfieldnetwork(Ramsaueretal.,2021;Krotov&Hopfield,2016;Demircigiletal.,2017).Itextendsaformof...
GenerativeAdversarialTransformersDrewA.Hudson§1C.LawrenceZitnick2AbstractFigure1.SampleimagesgeneratedbytheGANsformer,alongwithavisualizationofthemodelattentionmaps.WeintroducetheGANsformer,anovel...
DifferentiableSpatialPlanningusingTransformersDevendraSinghChaplot12DeepakPathak2JitendraMalik13Projectwebpage:https://devendrachaplot.github.io/projects/spatial-planning-TransformersAbstractFigure...
Catformer:DesigningStableTransformersviaSensitivityAnalysisJaredQuincyDavis12AlbertGu1KrzysztofChoromanski34TriDao1ChristopherRe1ChelseaFinn13PercyLiang1Abstracttoamelioratethesechallenges,theyrequ...
CATE:Computation-awareNeuralArchitectureEncodingwithTransformersShenYan1KaiqiangSong23FeiLiu2MiZhang1Abstract2020)ordesigningefficientarchitecturesearchandevalu-ationmethods(Luoetal.,2018b;Shietal....
ConViT:ImprovingVisionTransformerswithSoftConvolutionalInductiveBiasesSte´phaned’Ascoli12HugoTouvron2MatthewL.Leavitt2AriS.Morcos2GiulioBiroli12LeventSagun2Abstract1.IntroductionConvolutionalarch...
TransformersareRNNs:FastAutoregressiveTransformerswithLinearAttentionAngelosKatharopoulos12ApoorvVyas12NikolaosPappas3Franc¸oisFleuret12Abstractbytheglobalreceptivefieldofself-attention,whichpro-c...
StabilizingTransformersforReinforcementLearningEmilioParisotto1H.FrancisSong2JackW.Rae2RazvanPascanu2CaglarGulcehre2SiddhantM.Jayakumar2MaxJaderberg2Raphae¨lLopezKaufman2AidanClark2SebNoury2Matthe...
PowerNorm:RethinkingBatchNormalizationinTransformersShengShen1ZheweiYao1AmirGholami1MichaelW.Mahoney1KurtKeutzer1Abstract1.IntroductionThestandardnormalizationmethodforneuralNormalizationhasbecomeo...