FingerprintPolicyOptimisationforRobustReinforcementLearningSupratikPaul1MichaelA.Osborne2ShimonWhiteson1Abstractacrossallpossiblesettings.Fortunately,policiescanoftenbetrainedandtestedinasimulatort...