I-BERT:Integer-onlyBERTQuantizationSehoonKim1AmirGholami1ZheweiYao1MichaelW.Mahoney1KurtKeutzer1Abstract2019),andtheGPTfamily(Brownetal.,2020;Radfordetal.,2018;2019)),haveachievedasignificantaccura...
PoWER-BERT:AcceleratingBERTInferenceviaProgressiveWord-vectorEliminationSaurabhGoyal1AnamitraRoyChoudhury1SaurabhM.Raje1VenkatesanT.Chakaravarthy1YogishSabharwal1AshishVerma2Abstractapplicationsran...
EfficientTrainingofBERTbyProgressivelyStackingLinyuanGong1DiHe1ZhuohanLi1TaoQin2LiweiWang13Tie-YanLiu2Abstractespeciallyindomainsthatrequireparticularexpertise.Unsupervisedpre-trainingiscommonlyuse...
BERTandPALs:ProjectedAttentionLayersforEfficientAdaptationinMulti-TaskLearningAsaCooperStickland1IainMurray1AbstractHowever,fine-tuningseparatemodelsforeachtaskoftenworksbetterinpractice.Althoughwe...