DouZero:MasteringDouDizhuwithSelf-PlayDeepReinforcementLearningDaochenZha1JingruXie2WenyeMa2ShengZhang3XiangruLian2XiaHu1JiLiu2Abstractexample,AlphaGo(Silveretal.,2016),AlphaZero(Sil-veretal.,2018)...
ProvableSelf-PlayAlgorithmsforCompetitiveReinforcementLearningYuBai1ChiJin2Abstractconflictingrewards(sothattheyessentiallycompetewitheachother)yetcanbetrainedinacentralizedfashion(i.e.Self-Play,wh...