ViLT:Vision-and-LanguageTransformerWithoutConvolutionorRegionSupervisionWonjaeKim1†BokyungSon1IldooKim2AbstractVisualEmbeddingSchemaVision-and-LanguagePre-training(VLP)hasim-RegionFeatureImageCNNR...
Synthesizer:RethinkingSelf-AttentionforTransformerModelsYiTay1DaraBahri1DonaldMetzler1Da-ChengJuan1ZheZhao1CheZheng1Abstractwidelyattributedtothisself-attentionmechanismsincefullyconnectedtokengrap...
WhichTransformerarchitecturefitsmydata?Avocabularybottleneckinself-attentionNoamWies1YoavLevine1DanielJannai1AmnonShashua1Abstractunchanged,thechosenratiobetweenthenumberofself-attentionlayers(dept...
MSATransformerRoshanRao12JasonLiu3RobertVerkuil3JoshuaMeier3JohnF.Canny1PieterAbbeel1TomSercu3AlexanderRives34AbstractColumnAttentionUntiedRowAttentionFeedForwardUnsupervisedproteinlanguagemodelstr...
GenerativeVideoTransformer:CanObjectsbetheWords?Yi-FuWu1JaesikYoon12SungjinAhn13Abstractinteresttodevelopananalogousgenerativepre-trainingpro-cedureforvideos,thecomputationaloverheadindealingTransf...
TransformerHawkesProcessSimiaoZuo1HaomingJiang1ZichongLi2TuoZhao13HongyuanZha456Abstractdredsofmillionsofusersgeneratelargeamountsoftweets,whichareessentiallysequencesofeventsatdifferenttimeModernd...
OnLayerNormalizationintheTransformerArchitectureRuibinXiong†12YunchangYang3DiHe45KaiZheng4ShuxinZheng5ChenXing6HuishuaiZhang5YanyanLan12LiweiWang43Tie-YanLiu5Abstract1.IntroductionTheTransformeris...
Non-autoregressiveMachineTranslationwithDisentangledContextTransformerJungoKasai1JamesCross2MarjanGhazvininejad2JiataoGu2Abstractconditionalindependenceandpreventsthemodelfromprop-erlycapturingtheh...
LearningtoEncodePositionforTransformerwithContinuousDynamicalModelXuanqingLiu1Hsiang-FuYu2InderjitS.Dhillon32Cho-JuiHsieh1Abstract1.IntroductionWeintroduceanewwayoflearningtoencodeTransformerbasedm...
ImprovingTransformerOptimizationThroughBetterInitializationXiaoShiHuang12FelipePe´rez1JimmyBa32MaksimsVolkovs1Abstractetal.,2019;Sunetal.,2019).Despitethebroadapplications,optimizationintheTransfo...
EncodingMusicalStylewithTransformerAutoencodersKristyChoi1CurtisHawthorne2IanSimon2MonicaDinculescu2JesseEngel2Abstracttwofold.First,Transformers(Vaswanietal.,2017)andtheirvariantsexcelasunconditio...
TheEvolvedTransformerDavidR.So1ChenLiang1QuocV.Le1Abstractmodels,althoughsomeefforthasalsobeeninvestedinsearchingforsequencemodels(Zoph&Le,2017;PhamRecentworkshavehighlightedthestrengthofetal.,2018...
SetTransformer:AFrameworkforAttention-basedPermutation-InvariantNeuralNetworksJuhoLee12YoonhoLee3JungtaekKim4AdamR.Kosiorek15SeungjinChoi4YeeWhyeTeh1Abstract1997;Maron&Lozano-Pe´rez,1998)isanexamp...
InsertionTransformer:FlexibleSequenceGenerationviaInsertionOperationsMitchellStern12WilliamChan1JamieKiros1JakobUszkoreit1AbstractChanetal.,2016),speechsynthesis(Oordetal.,2016a;Wangetal.,2017),ima...
EquivariantTransformerNetworksKaiShengTai1PeterBailis1GregoryValiant1Abstractscalingtoeachtrainingimage).WhiledataaugmentationtypicallyhelpsreducethetesterrorofCNN-basedmodels,Howcanpriorknowledgeo...
ImageTransformerNikiParmar1AshishVaswani1JakobUszkoreit1ŁukaszKaiser1NoamShazeer1AlexanderKu23DustinTran4AbstractTable1.ThreeoutputsofaCelebAsuper-resolutionmodelfol-lowedbythreeimagecompletionsby...