REALM:Retrieval-AugmentedLanguageModelPre-trainingKelvinGuu1KentonLee1ZoraTung1PanupongPasupat1Ming-WeiChang1AbstractFigure1.REALMaugmentslanguagemodelPre-trainingwithaneuralknowledgeretrieverthatr...
UNILMv2:Pseudo-MaskedLanguageModelsforUnifiedLanguageModelPre-trainingHangboBao1LiDong2FuruWei2WenhuiWang2NanYang2XiaodongLiu2YuWang2SonghaoPiao1JianfengGao2MingZhou2Hsiao-WuenHon2Abstract?1?2?3...
PEGASUS:Pre-trainingwithExtractedGap-sentencesforAbstractiveSummarizationJingqingZhang1YaoZhao2MohammadSaleh2PeterJ.Liu2AbstractRecentworkPre-trainingTransformerswithFigure1:ThebasearchitectureofPE...
UsingPre-trainingCanImproveModelRobustnessandUncertaintyDanHendrycks1KiminLee2MantasMazeika3AbstractSurprisingly,Pre-trainingprovidesnoperformancebene-fitonvarioustasksandarchitecturesovertrainingf...
MASS:MaskedSequencetoSequencePre-trainingforLanguageGenerationKaitaoSong1XuTan2TaoQin2JianfengLu1Tie-YanLiu2AbstractwhilePre-traininghasplentyofdata(Girshicketal.,2014;Szegedyetal.,2015;Ouyangetal....