Muesli:CombiningImprovementsinPolicyOptimizationMatteoHessel1IvoDanihelka12FabioViola1ArthurGuez1SimonSchmitt1LaurentSifre1TheophaneWeber1DavidSilver12HadovanHasselt1AbstractMedianhuman-normalizeds...
CombiningPessimismwithOptimismforRobustandEfficientModel-BasedDeepReinforcementLearningSebastianCuri1IlijaBogunovic1AndreasKrause1Abstractunpredictableways.Themaingoalisthentolearnapolicythatprovab...
CombiningDifferentiablePDESolversandGraphNeuralNetworksforFluidFlowPredictionFilipedeAvilaBelbute-Peres1†ThomasD.Economon2†J.ZicoKolter13Abstracttationalfluiddynamics(CFD)simulations,theseequatio...
CombiningParametricandNonparametricModelsforOff-PolicyEvaluationOmerGottesman1YaoLiu2ScottSussex1EmmaBrunskill2FinaleDoshi-Velez1Abstractjectoriesundertheevaluationpolicyviastitchingtogetheractualt...
AContrastiveDivergenceforCombiningVariationalInferenceandMCMCFranciscoJ.R.Ruiz12MichalisK.Titsias3AbstractWedevelopamethodforCombiningVIandMCMCthatimprovesanexplicitvariationaldistribution(i.e.,wit...
Warm-startingContextualBandits:RobustlyCombiningSupervisedandBanditFeedbackChichengZhang1AlekhAgarwal1HalDauméIII12JohnLangford1SahandNNegahban3Abstractensuringthatsuchasystemdoesnotneedtosufferto...
CombiningModel-BasedandModel-FreeUpdatesforTrajectory-CentricReinforcementLearningYevgenChebotar12KarolHausman1MarvinZhang3GauravSukhatme1StefanSchaal12SergeyLevine3AbstractFigure1.Realrobottasksus...