ValueIterationinContinuousActions,StatesandTimeMichaelLutter12ShieMannor13JanPeters2DieterFox14AnimeshGarg15AbstractValueIterationFittedValueIterationContinuousFittedValueIterationClassicalValueite...
ValueAlignmentVerificationDanielS.Brown1JordanSchneider2AncaDragan1ScottNiekum2AbstractvideatheoreticalanalysisoftheproblemofefficientValuealignmentverification:howtoefficientlytestwhetheraAshumans...
UneVEn:UniversalValueExplorationforMulti-AgentReinforcementLearningTarunGupta1AnujMahajan1BeiPeng1WendelinBo¨hmer2ShimonWhiteson1Abstractfactorization,thejointactionValuefunctioncanbedecen-trallym...
PosteriorValueFunctions:HindsightBaselinesforPolicyGradientMethodsChrisNota1BrunoCastrodaSilva1PhilipS.Thomas1Abstractcases,suchinformationcanbeusefulforassessingwhichoutcomeswerelikelytohaveoccurr...
PIDAcceleratedValueIterationAlgorithmAmir-massoudFarahmand12MohammadGhavamzadeh3AbstractapproximationoftheValueoraction-Valuefunctions,i.e.,Vk+1←TπVkorQk+1←T∗Qk.FordiscountedMDPs,Theconvergence...
DFACFramework:FactorizingtheValueFunctionviaQuantileMixtureforMulti-AgentDistributionalQ-LearningWei-FangSun123Cheng-KuangLee2Chun-YiLee1Abstractoptimizetheoverallrewardsineachepisode.Nevertheless,...
DecouplingValueandPolicyforGeneralizationinReinforcementLearningRobertaRaileanu1RobFergus1Abstractization(Farebrotheretal.,2018;Zhangetal.,2018a;Cobbeetal.,2018;Igletal.,2019),dataaugmentation(Cobb...
Multi-AgentRoutingValueIterationNetworkQuinlanSykoraMengyeRenRaquelUrtasunAbstractFigure1.Avisualizationoftherouteproducedbyafleetoftwentyvehiclesusingourproposedalgorithm.ColorsdenotedifferentInth...
ConstrainedMarkovDecisionProcessesviaBackwardValueFunctionsHarshSatija123PhilipAmortila12JoellePineau123Abstractalgorithmshasbeenlimitedtosimulators,wherethelearn-ingalgorithmhastheabilitytoresetth...
TheValueFunctionPolytopeinReinforcementLearningRobertDadashi1AdrienAliTa¨ıga12NicolasLeRoux1DaleSchuurmans13MarcG.Bellemare1AbstractLinetheorem.Weshowthatpoliciesthatagreeonallbutonestategenerate...
TheInformation-TheoreticValueofUnlabeledDatainSemi-SupervisedLearningAlexanderGolovnev1Da´vidPa´l2Bala´zsSzo¨re´nyi2Abstractofalgorithmsindexedbythe(uncountablymany)distri-butionsoverthedomain...
SeparatingValuefunctionsacrosstime-scalesJoshuaRomoff12PeterHenderson3AhmedTouati42EmmaBrunskill3JoellePineau12YannOllivier2Abstractvergenceproperties,makinglearningmoreefficientandstable(Bertsekas...
Self-SimilarEpochs:ValueinArrangementEliavBuchnik12EdithCohen21AvinatanHassidim2YossiMatias2Abstractbroad:entitiescanbeofoneormultipletypesandexampleassociationsusedfortrainingcanberaworpreprocesse...
ConcentrationInequalitiesforConditionalValueatRiskPhilipS.Thomas1ErikLearned-Miller1AbstractPrashanth&Ghavamzadeh,2013;Chow&Ghavamzadeh,2014;Tamaretal.,2015;Pintoetal.,2017;Morimuraetal.,Inthispape...
ComposingValueFunctionsinReinforcementLearningBenjaminvanNiekerk1StevenJames1AdamEarle1BenjaminRosman12Abstractpreviousabilities.Animportantpropertyforlifelong-learningInreinforcementlearning(RL),o...
SmoothedActionValueFunctionsforLearningGaussianPoliciesOfirNachum1MohammadNorouzi1GeorgeTucker1DaleSchuurmans12Abstracthard-maxnotionofQ-Value,definedastheexpectedreturnoffollowinganoptimalpolicy.S...
QMIX:MonotonicValueFunctionFactorisationforDeepMulti-AgentReinforcementLearningTabishRashid1MikayelSamvelyan2ChristianSchroederdeWitt1GregoryFarquhar1JakobFoerster1ShimonWhiteson1Abstract(a)5Marine...
PolicyandValueTransferinLifelongReinforcementLearningDavidAbel†1YuuJinnai†1YueGuo1GeorgeKonidaris1MichaelL.Littman1Abstractcomputedpoliciesfromrelatedtasks(Ferna´ndez&Veloso,2006;Taylor&Stone,20...
DeepValueNetworksLearntoEvaluateandIterativelyRefineStructuredOutputsMichaelGygli1MohammadNorouzi2AneliaAngelova2Abstractcomplicatedhighlevelreasoningtoresolveambiguity.Weapproachstructuredoutputpr...