Trainsimultaneously,generalizebetter:Stabilityofgradient-basedminimaxlearnersFarzanFarnia1AsumanOzdaglar1Abstract2014)andadversarialTraining(Madryetal.,2017)haveachievedgreatsuccessoverawidearrayof...
JustTrainTwice:ImprovingGroupRobustnesswithoutTrainingGroupInformationEvanZheranLiu1BehzadHaghgoo1AnnieS.Chen1AditiRaghunathan1PangWeiKoh1ShioriSagawa1PercyLiang1ChelseaFinn1Abstractcanbeespecially...
TrainLarge,ThenCompress:RethinkingModelSizeforEfficientTrainingandInferenceofTransformersZhuohanLi1EricWallace1ShengShen1KevinLin1KurtKeutzer1DanKlein1JosephE.Gonzalez1AbstractCommonTrainSmallStopT...
OneSizeFitsAll:CanWeTrainOneDenoiserforAllNoiseLevels?AbhiramGnansambandam1StanleyH.Chan12Abstractarguablyuniversalforalllearning-basedestimators.Whensuchaproblemarises,themoststraight-forwardsolut...
HowtoTrainYourNeuralODE:theWorldofJacobianandKineticRegularizationChrisFinlay1Jo¨rn-HenrikJacobsen2LevonNurbekyan3AdamMOberman1Abstract(a)Optimaltransportmap(b)genericflowTrainingneuralODEsonlarge...