TransformersareRNNs:FastAutoregressiveTransformerswithLinearAttentionAngelosKatharopoulos12ApoorvVyas12NikolaosPappas3Franc¸oisFleuret12Abstractbytheglobalreceptivefieldofself-attention,whichpro-c...
LearningLonger-termDependenciesinRNNswithAuxiliaryLossesTrieuH.Trinh1AndrewM.DaiMinh-ThangLuongQuocV.Le{thtrieu,adai,thangluong,qvl}@google.comAbstractFigure1.Anoverviewofourmethod.Theauxiliaryloss...
FocusedHierarchicalRNNsforConditionalSequenceProcessingNanRosemaryKe123KonradZ˙ołna41AlessandroSordoni3ZhouhanLin135AdamTrischler3YoshuaBengio167JoellePineau897LaurentCharlin110ChrisPal12Abstract...