TreeswithAttentionforSetPredictionTasksRoyHirsch1RanGilad-Bachrach2AbstractTree-basedmodels,suchasDecisionTree(DT),RandomForest(RF)andGradientBoostingDecisionTree(GBT)Inmanymachinelearningapplicati...
Trainingdata-efficientimagetransformers&distillationthroughAttentionHugoTouvron12MatthieuCord12MatthijsDouze1FranciscoMassa1AlexandreSablayrolles1Herve´Je´gou1Abstract⚗↑⚗Recently,neuralnetwork...
SimAM:ASimple,Parameter-FreeAttentionModuleforConvolutionalNeuralNetworksLingxiaoYang123Ru-YuanZhang45LidaLi6XiaohuaXie123AbstractResNet-50Inthispaper,weproposeaconceptuallysimple+SEbutveryeffectiv...
Poolingformer:LongDocumentModelingwithPoolingAttentionHangZhang12YeyunGong3YelongShen4WeishengLi5JianchengLv1NanDuan3WeizhuChen4Abstract(a)Single-levellocalAttention(b)Two-levelpoolingAttentionInth...
Perceiver:GeneralPerceptionwithIterativeAttentionAndrewJaegle1FelixGimeno1AndrewBrock1AndrewZisserman1OriolVinyals1JoaoCarreira1AbstractOneglaringissuewithstrongarchitecturalpriorsisthattheyareofte...
LearningSelf-ModulatingAttentioninContinuousTimeSpacewithApplicationstoSequentialRecommendationChaoChen1HaoyuGeng12NianzuYang12JunchiYan12DaiyueXue3JianpingYu3XiaokangYang12Abstractfadeawayduetomat...
IsSpace-TimeAttentionAllYouNeedforVideoUnderstanding?GedasBertasius1HengWang1LorenzoTorresani12AbstractVideounderstandingsharesseveralhigh-levelsimilaritieswithNLP.Firstofall,videosandsentencesareb...
EvolvingAttentionwithResidualConvolutionsYujingWang1YamingYang2JiangangBai12MingliangZhang12JingBai2JingYu3CeZhang4GaoHuang5YunhaiTong1Abstract8079.6379.1Transformerisaubiquitousmodelfornaturallan-...
EL-Attention:MemoryEfficientLosslessAttentionforGenerationYuYan1JiushengChen1WeizhenQi2NikhilBhendawade1YeyunGong3NanDuan3RuofeiZhang4Abstractpruninglayer(Fanetal.,2019)ortrainingasmallerstu-dentmo...
BayesianAttentionBeliefNetworksShujianZhang1XinjieFan1BoChen2MingyuanZhou1AbstractoftheTransformerstructure,itbecomespossibletotrainunprecedentedlargemodelsonbigdatasets(Devlinetal.,Attention-based...
AutoAttend:AutomatedAttentionRepresentationSearchChaoyuGuan1XinWang1WenwuZhu1AbstractOutputOutputSelf-AttentionmechanismshavebeenwidelyAttentionComputationAttentionComputationadoptedinmanymachinele...
Attentionisnotallyouneed:pureAttentionlosesrankdoublyexponentiallywithdepthYiheDong1Jean-BaptisteCordonnier2AndreasLoukas3AbstractAttentionlayers.Surprisingly,wefindthatpureself-Attentionnetworks(S...
SparseSinkhornAttentionYiTay1DaraBahri1LiuYang1DonaldMetzler1Da-ChengJuan1AbstractThispaperproposesanewmethodfor(1)reducingthemem-orycomplexityofthedot-productAttentionmechanismandWeproposeSparseSi...
Low-RankBottleneckinMulti-headAttentionModelsSrinadhBhojanapalli1ChulheeYun2AnkitSinghRawat1SashankReddi1SanjivKumar1Abstracttotherecurrentmodels.SelfAttentionmodelsalsohavefoundapplicationsinvisio...
InfiniteAttention:NNGPandNTKfordeepAttentionnetworksJiriHron1YasamanBahri2JaschaSohl-Dickstein2RomanNovak2Abstractetal.,2019;Novaketal.,2019;Li&Liang,2018;Allen-Zhuetal.,2019;Duetal.,2019;Aroraetal...
Cost-EffectiveInteractiveAttentionLearningwithNeuralAttentionProcessesJayHeo1JunhyeonPark1HyewonJeong1KwangjoonKim2JuhoLee3EunhoYang13SungJuHwang13Abstractofthemodel,atthesametime,makesitdifficultt...
BERTandPALs:ProjectedAttentionLayersforEfficientAdaptationinMulti-TaskLearningAsaCooperStickland1IainMurray1AbstractHowever,fine-tuningseparatemodelsforeachtaskoftenworksbetterinpractice.Althoughwe...
AreaAttentionYangLi1LukaszKaiser1SamyBengio1SiSi1Abstractembeddingsofanimage(Xuetal.,2015)orthehiddenstatesofencodinganinputsentence(Bahdanauetal.,2014;ExistingAttentionmechanismsaretrainedtoat-Luo...
OvercomingCatastrophicForgettingwithHardAttentiontotheTaskJoanSerra`1D´ıdacSur´ıs12MariusMiron13AlexandrosKaratzoglou1Abstractintheadvancementtowardsmoregeneralartificialintel-ligencesystems(Le...
OnlineandLinear-TimeAttentionbyEnforcingMonotonicAlignmentsColinRaffel1Minh-ThangLuong1PeterJ.Liu1RonJ.Weiss1DouglasEck1Abstractmechanisms(Bahdanauetal.,2015).Inasequence-to-sequencemodelwithattent...