UncertaintyWeightedActor-CriticforOfflineReinforcementLearningYueWu12ShuangfeiZhai1NitishSrivastava1JoshuaSusskind1JianZhang1RuslanSalakhutdinov2HanlinGoh1Abstractleveragingpriorexperience(Langeeta...
RepresentationMatters:OfflinePretrainingforSequentialDecisionMakingMengjiaoYang1OfirNachum1AbstractFigure1.Asummaryoftheadvantagesofrepresentationlearningviacontrastiveself-prediction,acrossavariet...
OptiDICE:OfflinePolicyOptimizationviaStationaryDistributionCorrectionEstimationJongminLee1WonseokJeon23Byung-JunLee4JoellePineau235Kee-EungKim16Abstractandthentodeploythemodelwithitsparameterfixedw...
OfflineMeta-ReinforcementLearningwithAdvantageWeightingEricMitchell1RafaelRafailov1XueBinPeng2SergeyLevine2ChelseaFinn1Abstractofreinforcementlearningalgorithms,whenthegoalistoultimatelylearnmanyta...
OfflineReinforcementLearningwithFisherDivergenceCriticRegularizationIlyaKostrikov12JonathanTompson2RobFergus13OfirNachum2Abstractwheredeployinganewpolicytointeractwiththeliveen-vironmentisexpensive...
OfflineReinforcementLearningwithPseudometricLearningRobertDadashi1ShidehRezaeifar2NinoVieillard13Le´onardHussenot14OlivierPietquin1MatthieuGeist1Abstractthatgeneratedtheseexperiences(Pomerleau,199...
OfflineContextualBanditswithOverparameterizedModelsDavidBrandfonbrener1WilliamF.Whitney1RajeshRanganath1JoanBruna1AbstractIncontrast,thebestperformanceinmodernsupervisedlearningisoftenachievedbymas...
IsPessimismProvablyEfficientforOfflineRL?YingJin1ZhuoranYang2ZhaoranWang3AbstractVinyalsetal.,2017)reliesontwoingredients:(i)expressivefunctionapproximators,e.g.,deepneuralnetworks(LeCunWestudyoffl...
InstabilitiesofOfflineRLwithPre-TrainedNeuralRepresentationRuosongWang1YifanWu1RuslanSalakhutdinov1ShamM.Kakade23Abstract2018;Wangetal.,2018;Yuetal.,2019);itisseeingmuchrecentinterestduetothelargea...
ConservativeObjectiveModelsforEffectiveOfflineModel-BasedOptimizationBrandonTrabucco1AviralKumar1XinyangGeng1SergeyLevine1Abstracty-ylabel2yyLossInthispaper,weaimtosolvedata-drivenmodel-Learnedbase...
ActionableModels:UnsupervisedOfflineReinforcementLearningofRoboticSkillsYevgenChebotar1KarolHausman1YaoLu1TedXiao1DmitryKalashnikov1JakeVarley1AlexIrpan1BenjaminEysenbach12RyanJulian13ChelseaFinn14...
OnlinePricingwithOfflineData:PhaseTransitionandInverseSquareLawJinzhiBu1DavidSimchi-Levi1YunzongXu1Abstractofflinehistoricaldataset(basedonhistoricalactions)atthetimethatthelearnerstartsanonlinelea...
GradientDICE:RethinkingGeneralizedOfflineEstimationofStationaryValuesShangtongZhang1BoLiu2ShimonWhiteson1Abstractevaluationismoreflexible.Wecanevaluateanewpolicywithexistingdatainareplaybuffer(Lin,...
FormulaZero:DistributionallyRobustOnlineAdaptationviaOfflinePopulationSynthesisAmanSinha1MatthewO’Kelly2HongruiZheng2RahulMangharam2JohnDuchi1RussTedrake3Abstractdel’Automobile,2019).Empirically,...
AnOptimisticPerspectiveonOfflineReinforcementLearningRishabhAgarwal1DaleSchuurmans12MohammadNorouzi1Abstractunsafe,orrequireahigh-fidelitysimulatorthatisoftendiffi-culttobuild(Dulac-Arnoldetal.,201...