PosteriorValueFunctions:HindsightBaselinesforPolicyGradientMethodsChrisNota1BrunoCastrodaSilva1PhilipS.Thomas1Abstractcases,suchinformationcanbeusefulforassessingwhichoutcomeswerelikelytohaveoccurr...
Low-VarianceandZero-VarianceBaselinesforExtensive-FormGamesTrevorDavis1†MartinSchmid2MichaelBowling21Abstractetal.,2015),andtobeathumanprofessionalsinanother(Moravcˇíketal.,2017;Brown&Sandholm,2...
TheMirageofAction-DependentBaselinesinReinforcementLearningGeorgeTucker1SuryaBhupatiraju12ShixiangGu134RichardE.Turner3ZoubinGhahramani35SergeyLevine16Abstractetal.,2015a;2017)areaclassofmodel-free...