LearningHumanObjectivesbyEvaluatingHypotheticalBehaviorSiddharthReddy1AncaD.Dragan1SergeyLevine1ShaneLegg2JanLeike2AbstractGenerativemodelHypotheticalBehaviorWeseektoalignagentBehaviorwithauser’so...
Data-EfficientPolicyEvaluationThroughBehaviorPolicySearchJosiahP.Hanna1PhilipS.Thomas23PeterStone1ScottNiekum1AbstractMethodsthatevaluateπewhileselectingactionsaccordingtoπearetermedon-policy.Pre...