ImportanceSamplingPolicyEvaluationwithanEstimatedBehaviorPolicyJosiahP.Hanna1ScottNiekum1PeterStone1Abstractdeterminetheexpectedreturn–sumofrewards–thatanevaluationpolicy,πe,willobtainwhendeploy...