Synthesizer:RethinkingSelf-AttentionforTransformerModelsYiTay1DaraBahri1DonaldMetzler1Da-ChengJuan1ZheZhao1CheZheng1Abstractwidelyattributedtothisself-attentionmechanismsincefullyconnectedtokengrap...
SparseBERT:RethinkingtheImportanceAnalysisinSelf-attentionHanShi1JiahuiGao2XiaozheRen3HangXu3XiaodanLiang4ZhenguoLi3JamesT.Kwok1AbstractincludetheBERT(Devlinetal.,2019),whichachievesstate-of-the-ar...
SoftthenHard:RethinkingtheQuantizationinNeuralImageCompressionZongyuGuo1ZhizhengZhang1RunsenFeng1ZhiboChen1AbstractQuantizationisoneofthekeychallengesforneuralimagecompression.Sincethegradientofqua...
RethinkingRotatedObjectDetectionwithGaussianWassersteinDistanceLossXueYang123JunchiYan12QiMing4WentaoWang1XiaopengZhang3QiTian3AbstractFigure1.ComparisonofthedetectionresultsbetweenSmoothL1loss-bas...
RethinkingNeuralvs.Matrix-FactorizationCollaborativeFiltering:theTheoreticalPerspectivesDaXu1ChuanweiRuan2EvrenKorpeoglu1SushantKumar1KannanAchan1Abstractexploredthankstotheirinterpretabilityandcom...
FILTRA:RethinkingSteerableCNNbyFilterTransformBoLi1QiliWang1GimHeeLee2Abstractputishard-bakedtotransformaccordinglywhentheinputreflectsorrotates.AplentyofworksdevelopthisideaSteerableCNNimposesthep...
TrainLarge,ThenCompress:RethinkingModelSizeforEfficientTrainingandInferenceofTransformersZhuohanLi1EricWallace1ShengShen1KevinLin1KurtKeutzer1DanKlein1JosephE.Gonzalez1AbstractCommonTrainSmallStopT...
TASKNORM:RethinkingBatchNormalizationforMeta-LearningJohnBronskill1JonathanGordon1JamesRequeima12SebastianNowozin3RichardE.Turner13Abstractthe-artperformanceinarangeofbenchmarktasks(Finnetal.,2017;...
RethinkingBias-VarianceTrade-offforGeneralizationofNeuralNetworksZitongYang1YaodongYu1ChongYou1JacobSteinhardt12YiMa1Abstractfromamismatchbetweenthemodelclassandtheunder-lyingdatadistribution,andis...
PowerNorm:RethinkingBatchNormalizationinTransformersShengShen1ZheweiYao1AmirGholami1MichaelW.Mahoney1KurtKeutzer1Abstract1.IntroductionThestandardnormalizationmethodforneuralNormalizationhasbecomeo...
GradientDICE:RethinkingGeneralizedOfflineEstimationofStationaryValuesShangtongZhang1BoLiu2ShimonWhiteson1Abstractevaluationismoreflexible.Wecanevaluateanewpolicywithexistingdatainareplaybuffer(Lin,...
RethinkingLossyCompression:TheRate-Distortion-PerceptionTradeoffYochaiBlau1TomerMichaeli1AbstractarerootedinShannon’sseminalworkonrate-distortiontheory(Shannon,1959),whichanalyzesthefundamentalLos...
EfficientNet:RethinkingModelScalingforConvolutionalNeuralNetworksMingxingTan1QuocV.Le1Abstract84EfficientNet-B7ConvolutionalNeuralNetworks(ConvNets)areB6AmoebaNet-Ccommonlydevelopedatafixedresource...