EfficientBias-Span-ConstrainedExploration-ExploitationinReinforcementLearningRonanFruit1MatteoPirotta1AlessandroLazaric2RonaldOrtner3Abstractand,ateachstep,itexecutesthepolicywithhighestopti-mistic...