Near-OptimalModel-FreeReinforcementLearninginNon-stationaryEpisodicMDPsWeichaoMao1KaiqingZhang1RuihaoZhu2DavidSimchi-Levi2TamerBas¸ar1Abstractthroughsequentialinteractionswithaninitiallyunknownbut...
ReinforcementLearningforNon-stationaryMarkovDecisionProcesses:TheBlessingof(More)OptimismWangChiCheung1DavidSimchi-Levi2RuihaoZhu2Abstractimizesitscumulativerewards,whilefacingthefollowingchallenge...
OptimizingfortheFutureinNon-stationaryMDPsYashChandak1GeorgiosTheocharous2ShivShankar1MarthaWhite3SridharMahadevan12PhilipS.Thomas1Abstractcreasedfrictionandthus,changeinthesystemdynamics.Similarly...
Non-stationaryDelayedBanditswithIntermediateObservationsClaireVernade1Andra´sGyo¨rgy1TimothyA.Mann1AbstractDelayedfeedbackinonlinelearninghavebeenaddressedbothinthefullinformationsetting(see,e.g....
NonstationaryNonseparableRandomFieldsKangruiWang1OliverHamelijnck12TheodorosDamoulas12MarkSteel2AbstractapplicabletogeneralRDinputspaces.Consideraspatio-temporalstochasticprocessZ(s,t)thathasastati...