TheLipschitzConstantofSelf-AttentionHyunjikKim1GeorgePapamakarios1AndriyMnih1Abstractconstraintforneuralnetworks,tocontrolhowmuchanet-work’soutputcanchangerelativetoitsinput.SuchLips-Lipschitzcons...
Synthesizer:RethinkingSelf-AttentionforTransformerModelsYiTay1DaraBahri1DonaldMetzler1Da-ChengJuan1ZheZhao1CheZheng1AbstractwidelyattributedtothisSelf-Attentionmechanismsincefullyconnectedtokengrap...
SparseBERT:RethinkingtheImportanceAnalysisinSelf-AttentionHanShi1JiahuiGao2XiaozheRen3HangXu3XiaodanLiang4ZhenguoLi3JamesT.Kwok1AbstractincludetheBERT(Devlinetal.,2019),whichachievesstate-of-the-ar...
LipschitzNormalizationforSelf-AttentionLayerswithApplicationtoGraphNeuralNetworksGeorgeDasoulas12KevinScaman1AladinVirmaux1Abstractclassification(Velickovicetal.,2018;Lietal.,2016)andcomputervision...
LieTransformer:EquivariantSelf-AttentionforLieGroupsMichaelHutchinson1CharlineLeLan1SheheryarZaidi1EmilienDupont1YeeWhyeTeh12HyunjikKim2Abstractducesthenumberofparametersandcomputationalcost.Thisha...
Self-AttentionGenerativeAdversarialNetworksHanZhang12IanGoodfellow2DimitrisMetaxas1AugustusOdena2AbstractlutionalGANs(Odenaetal.,2017;Miyatoetal.,2018;Miyato&Koyama,2018)havemuchmoredifficultyinInt...
Self-AttentionGraphPoolingJunhyunLee1InyeopLee1JaewooKang1Abstractcompositionalityofgrid-structureddata(Simoncelli&Ol-shausen,2001;Bronsteinetal.,2017).Asaresult,CNNsAdvancedmethodsofapplyingdeeple...