update lists

This commit is contained in:
2025-07-18 22:22:32 +02:00
parent 55bed3b4a1
commit 5916c5c074
3078 changed files with 331679 additions and 357255 deletions

View File

@@ -1,546 +0,0 @@
<h1 id="awesome-deep-reinforcement-learning">Awesome Deep Reinforcement
Learning</h1>
<blockquote>
<p><strong>Mar 1 2024 update: HILP added</strong></p>
<p><strong>July 2022 update: EDDICT added</strong></p>
<p><strong>Mar 2022 update: a few papers released in early
2022</strong></p>
<p><strong>Dec 2021 update: Unsupervised RL</strong></p>
</blockquote>
<h2 id="introduction-to-awesome-drl">Introduction to awesome drl</h2>
<p>Reinforcement learning is the fundamental framework for building AGI.
Therefore we share important contributions within this awesome drl
project.</p>
<h2 id="landscape-of-deep-rl">Landscape of Deep RL</h2>
<figure>
<img src="images/awesome-drl.png" alt="updated Landscape of DRL" />
<figcaption aria-hidden="true">updated Landscape of
<strong>DRL</strong></figcaption>
</figure>
<h2 id="content">Content</h2>
<ul>
<li><a href="#awesome-deep-reinforcement-learning">Awesome Deep
Reinforcement Learning</a>
<ul>
<li><a href="#introduction-to-awesome-drl">Introduction to awesome
drl</a></li>
<li><a href="#landscape-of-deep-rl">Landscape of Deep RL</a></li>
<li><a href="#content">Content</a></li>
<li><a href="#general-guidances">General guidances</a></li>
<li><a href="#2022">2022</a></li>
<li><a href="#foundations-and-theory">Foundations and theory</a></li>
<li><a href="#general-benchmark-frameworks">General benchmark
frameworks</a></li>
<li><a href="#unsupervised">Unsupervised</a></li>
<li><a href="#offline">Offline</a></li>
<li><a href="#value-based">Value based</a></li>
<li><a href="#policy-gradient">Policy gradient</a></li>
<li><a href="#explorations">Explorations</a></li>
<li><a href="#actor-critic">Actor-Critic</a></li>
<li><a href="#model-based">Model-based</a></li>
<li><a href="#model-free--model-based">Model-free + Model-based</a></li>
<li><a href="#hierarchical">Hierarchical</a></li>
<li><a href="#option">Option</a></li>
<li><a href="#connection-with-other-methods">Connection with other
methods</a></li>
<li><a href="#connecting-value-and-policy-methods">Connecting value and
policy methods</a></li>
<li><a href="#reward-design">Reward design</a></li>
<li><a href="#unifying">Unifying</a></li>
<li><a href="#faster-drl">Faster DRL</a></li>
<li><a href="#multi-agent">Multi-agent</a></li>
<li><a href="#new-design">New design</a></li>
<li><a href="#multitask">Multitask</a></li>
<li><a href="#observational-learning">Observational Learning</a></li>
<li><a href="#meta-learning">Meta Learning</a></li>
<li><a href="#distributional">Distributional</a></li>
<li><a href="#planning">Planning</a></li>
<li><a href="#safety">Safety</a></li>
<li><a href="#inverse-rl">Inverse RL</a></li>
<li><a href="#no-reward-rl">No reward RL</a></li>
<li><a href="#time">Time</a></li>
<li><a href="#adversarial-learning">Adversarial learning</a></li>
<li><a href="#use-natural-language">Use Natural Language</a></li>
<li><a
href="#generative-and-contrastive-representation-learning">Generative
and contrastive representation learning</a></li>
<li><a href="#belief">Belief</a></li>
<li><a href="#pac">PAC</a></li>
<li><a href="#applications">Applications</a></li>
</ul></li>
</ul>
<p>Illustrations:</p>
<p><img src="images/ACER.png" /></p>
<p><strong>Recommendations and suggestions are welcome</strong>. ##
General guidances * <a
href="https://github.com/hanjuku-kaso/awesome-offline-rl">Awesome
Offline RL</a> * <a
href="http://reinforcementlearning.today/">Reinforcement Learning
Today</a> * <a
href="http://mlanctot.info/files/papers/Lanctot_MARL_RLSS2019_Lille.pdf">Multiagent
Reinforcement Learning by Marc Lanctot RLSS @ Lille</a> 11 July 2019 *
<a href="https://david-abel.github.io/notes/rldm_2019.pdf">RLDM 2019
Notes by David Abel</a> 11 July 2019 * <a href="RLNL.md">A Survey of
Reinforcement Learning Informed by Natural Language</a> 10 Jun 2019 <a
href="https://arxiv.org/pdf/1906.03926.pdf">arxiv</a> * <a
href="ChallengesRealWorldRL.md">Challenges of Real-World Reinforcement
Learning</a> 29 Apr 2019 <a
href="https://arxiv.org/pdf/1904.12901.pdf">arxiv</a> * <a
href="RayInterference.md">Ray Interference: a Source of Plateaus in Deep
Reinforcement Learning</a> 25 Apr 2019 <a
href="https://arxiv.org/pdf/1904.11455.pdf">arxiv</a> * <a
href="p10.md">Principles of Deep RL by David Silver</a> * <a
href="https://www.jianshu.com/p/dfd987aa765a">University AIs General
introduction to deep rl (in Chinese)</a> * <a
href="https://spinningup.openai.com/en/latest/">OpenAIs spinningup</a>
* <a
href="https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/">The
Promise of Hierarchical Reinforcement Learning</a> 9 Mar 2019 * <a
href="reproducing.md">Deep Reinforcement Learning that Matters</a> 30
Jan 2019 <a href="https://arxiv.org/pdf/1709.06560.pdf">arxiv</a></p>
<h2 id="section">2024</h2>
<ul>
<li><a href="HILP.md">Foundation Policies with Hilbert
Representations</a> <a href="https://arxiv.org/abs/2402.15567">arxiv</a>
<a href="https://github.com/seohongpark/HILP">repo</a> 23 Feb 2024</li>
</ul>
<h2 id="section-1">2022</h2>
<ul>
<li>Reinforcement Learning with Action-Free Pre-Training from Videos <a
href="https://arxiv.org/abs/2203.13880">arxiv</a> <a
href="https://github.com/younggyoseo/apv">repo</a></li>
</ul>
<h2 id="generalist-policies">Generalist policies</h2>
<ul>
<li><a href="HILP.md">Foundation Policies with Hilbert
Representations</a> <a href="https://arxiv.org/abs/2402.15567">arxiv</a>
<a href="https://github.com/seohongpark/HILP">repo</a> 23 Feb 2024</li>
</ul>
<h2 id="foundations-and-theory">Foundations and theory</h2>
<ul>
<li><a href="GNLBE.md">General non-linear Bellman equations</a> 9 July
2019 <a href="https://arxiv.org/pdf/1907.07331.pdf">arxiv</a></li>
<li><a href="MCGE.md">Monte Carlo Gradient Estimation in Machine
Learning</a> 25 Jun 2019 <a
href="https://arxiv.org/pdf/1906.10652.pdf">arxiv</a></li>
</ul>
<h2 id="general-benchmark-frameworks">General benchmark frameworks</h2>
<ul>
<li><a href="https://github.com/google/brax/">Brax</a>
<img src="https://github.com/google/brax/raw/main/docs/img/brax_logo.gif" width="336" height="80" alt="BRAX"/></li>
</ul>
<p><img
src="https://github.com/google/brax/raw/main/docs/img/fetch.gif" /> * <a
href="https://github.com/deepmind/android_env">Android-Env</a> * <img
src="https://github.com/deepmind/android_env/raw/main/docs/images/device_control.gif" />
* <a href="http://mujoco.org/">MuJoCo</a> | <a
href="https://github.com/tigerneil/mujoco-zh">MuJoCo Chinese version</a>
* <a href="https://github.com/rll-research/url_benchmark">Unsupervised
RL Benchmark</a> * <a
href="https://github.com/rail-berkeley/d4rl">Dataset for Offline RL</a>
* <a href="https://github.com/deepmind/spriteworld">Spriteworld: a
flexible, configurable python-based reinforcement learning
environment</a> * <a
href="https://github.com/chainer/chainerrl-visualizer">Chainerrl
Visualizer</a> * <a href="BSRL.md">Behaviour Suite for Reinforcement
Learning</a> 13 Aug 2019 <a
href="https://arxiv.org/pdf/1908.03568.pdf">arxiv</a> | <a
href="https://github.com/deepmind/bsuite">code</a> * <a
href="Coinrun.md">Quantifying Generalization in Reinforcement
Learning</a> 20 Dec 2018 <a
href="https://arxiv.org/pdf/1812.02341.pdf">arxiv</a> * <a
href="SRL.md">S-RL Toolbox: Environments, Datasets and Evaluation
Metrics for State Representation Learning</a> 25 Sept 2018 * <a
href="https://github.com/google/dopamine">dopamine</a> * <a
href="https://github.com/deepmind/pysc2">StarCraft II</a> * <a
href="https://github.com/deepmind/trfl">tfrl</a> * <a
href="https://github.com/chainer/chainerrl">chainerrl</a> * <a
href="https://github.com/PaddlePaddle/PARL">PARL</a> * <a
href="https://github.com/opendilab/DI-engine">DI-engine: a generalized
decision intelligence engine. It supports various Deep RL algorithms</a>
* <a href="https://github.com/opendilab/PPOxFamily">PPO x Family: Course
in Chinese for Deep RL</a></p>
<h2 id="unsupervised">Unsupervised</h2>
<ul>
<li><a href="https://arxiv.org/abs/2110.15191">URLB: Unsupervised
Reinforcement Learning Benchmark</a> 28 Oct 2021</li>
<li><a href="https://arxiv.org/abs/2108.13956">APS: Active Pretraining
with Successor Feature</a> 31 Aug 2021</li>
<li><a href="https://arxiv.org/abs/2103.04551">Behavior From the Void:
Unsupervised Active Pre-Training</a> 8 Mar 2021</li>
<li><a href="https://arxiv.org/abs/2102.11271">Reinforcement Learning
with Prototypical Representations</a> 22 Feb 2021</li>
<li><a href="https://arxiv.org/abs/1906.05274">Efficient Exploration via
State Marginal Matching</a> 12 Jun 2019</li>
<li><a href="https://arxiv.org/abs/1906.04161">Self-Supervised
Exploration via Disagreement</a> 10 Jun 2019</li>
<li><a href="https://arxiv.org/abs/1810.12894">Exploration by Random
Network Distillation</a> 30 Oct 2018</li>
<li><a href="https://arxiv.org/abs/1802.06070">Diversity is All You
Need: Learning Skills without a Reward Function</a> 16 Feb 2018</li>
<li><a href="https://arxiv.org/pdf/1705.05363">Curiosity-driven
Exploration by Self-supervised Prediction</a> 15 May 2017</li>
</ul>
<h2 id="offline">Offline</h2>
<ul>
<li><a href="https://arxiv.org/abs/2102.06961">PerSim: Data-efficient
Offline Reinforcement Learning with Heterogeneous Agents via
Personalized Simulators</a> 10 Nov 2021</li>
<li><a href="">A General Offline Reinforcement Learning Framework for
Interactive Recommendation</a> AAAI 2021</li>
</ul>
<h2 id="value-based">Value based</h2>
<ul>
<li><a href="SVRL.md">Harnessing Structures for Value-Based Planning and
Reinforcement Learning</a> 5 Feb 2020 <a
href="https://arxiv.org/abs/1909.12255">arxiv</a> | <a
href="https://github.com/YyzHarry/SV-RL">code</a></li>
<li><a href="RVF.md">Recurrent Value Functions</a> 23 May 2019 <a
href="https://arxiv.org/pdf/1905.09562.pdf">arxiv</a></li>
<li><a href="LipschitzQ.md">Stochastic Lipschitz Q-Learning</a> 24 Apr
2019 <a href="https://arxiv.org/pdf/1904.10653.pdf">arxiv</a></li>
<li><a href="https://arxiv.org/pdf/1710.11417">TreeQN and ATreeC:
Differentiable Tree-Structured Models for Deep Reinforcement
Learning</a> 8 Mar 2018</li>
<li><a href="https://arxiv.org/pdf/1803.00933.pdf">DISTRIBUTED
PRIORITIZED EXPERIENCE REPLAY</a> 2 Mar 2018</li>
<li><a href="Rainbow.md">Rainbow: Combining Improvements in Deep
Reinforcement Learning</a> 6 Oct 2017</li>
<li><a href="DQfD.md">Learning from Demonstrations for Real World
Reinforcement Learning</a> 12 Apr 2017</li>
<li><a href="Dueling.md">Dueling Network Architecture</a></li>
<li><a href="DDQN.md">Double DQN</a></li>
<li><a href="PER.md">Prioritized Experience</a></li>
<li><a href="DQN.md">Deep Q-Networks</a></li>
</ul>
<h2 id="policy-gradient">Policy gradient</h2>
<ul>
<li><a href="PPG.md">Phasic Policy Gradient</a> 9 Sep 2020 <a
href="https://arxiv.org/pdf/2009.04416.pdf">arxiv</a> <a
href="https://github.com/openai/phasic-policy-gradient">code</a></li>
<li><a href="OVPG.md">An operator view of policy gradient methods</a> 22
Jun 2020 <a href="https://arxiv.org/pdf/2006.11266.pdf">arxiv</a></li>
<li><a href="DirPG.md">Direct Policy Gradients: Direct Optimization of
Policies in Discrete Action Spaces</a> 14 Jun 2019 <a
href="https://arxiv.org/pdf/1906.06062.pdf">arxiv</a></li>
<li><a href="PGS.md">Policy Gradient Search: Online Planning and Expert
Iteration without Search Trees</a> 7 Apr 2019 <a
href="https://arxiv.org/pdf/1904.03646.pdf">arxiv</a></li>
<li><a href="SPU.md">SUPERVISED POLICY UPDATE FOR DEEP REINFORCEMENT
LEARNING</a> 24 Dec 2018 <a
href="https://arxiv.org/pdf/1805.11706v4.pdf">arxiv</a></li>
<li><a href="PPO-CMA.md">PPO-CMA: Proximal Policy Optimization with
Covariance Matrix Adaptation</a> 5 Oct 2018 <a
href="https://arxiv.org/pdf/1810.02541v6.pdf">arxiv</a></li>
<li><a href="CAPG.md">Clipped Action Policy Gradient</a> 22 June
2018</li>
<li><a href="EPG.md">Expected Policy Gradients for Reinforcement
Learning</a> 10 Jan 2018</li>
<li><a href="PPO.md">Proximal Policy Optimization Algorithms</a> 20 July
2017</li>
<li><a href="DPPO.md">Emergence of Locomotion Behaviours in Rich
Environments</a> 7 July 2017</li>
<li><a href="IPG.md">Interpolated Policy Gradient: Merging On-Policy and
Off-Policy Gradient Estimation for Deep Reinforcement Learning</a> 1 Jun
2017</li>
<li><a href="PGSQL.md">Equivalence Between Policy Gradients and Soft
Q-Learning</a></li>
<li><a href="TRPO.md">Trust Region Policy Optimization</a></li>
<li><a href="DEBP.md">Reinforcement Learning with Deep Energy-Based
Policies</a></li>
<li><a href="QPROP.md">Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN
OFF-POLICY CRITIC</a></li>
</ul>
<h2 id="explorations">Explorations</h2>
<ul>
<li><a href="EDDICT.md">Entropic Desired Dynamics for Intrinsic
Control</a> 2021 <a
href="https://openreview.net/pdf?id=lBSSxTgXmiK">openreview</a></li>
<li><a href="Disagreement.md">Self-Supervised Exploration via
Disagreement</a> 10 Jun 2019 <a
href="https://arxiv.org/pdf/1906.04161.pdf">arxiv</a></li>
<li><a href="MBIE-EB.md">Approximate Exploration through State
Abstraction</a> 24 Jan 2019</li>
<li><a href="UBE.md">The Uncertainty Bellman Equation and
Exploration</a> 15 Sep 2017</li>
<li><a href="NoisyNet.md">Noisy Networks for Exploration</a> 30 Jun 2017
<a
href="https://github.com/Kaixhin/NoisyNet-A3C">implementation</a></li>
<li><a href="PhiEB.md">Count-Based Exploration in Feature Space for
Reinforcement Learning</a> 25 Jun 2017</li>
<li><a href="NDM.md">Count-Based Exploration with Neural Density
Models</a> 14 Jun 2017</li>
<li><a href="QEnsemble.md">UCB and InfoGain Exploration via
Q-Ensembles</a> 11 Jun 2017</li>
<li><a href="MMRB.md">Minimax Regret Bounds for Reinforcement
Learning</a> 16 Mar 2017</li>
<li><a href="incentivizing.md">Incentivizing Exploration In
Reinforcement Learning With Deep Predictive Models</a></li>
<li><a href="EX2.md">EX2: Exploration with Exemplar Models for Deep
Reinforcement Learning</a></li>
</ul>
<h2 id="actor-critic">Actor-Critic</h2>
<ul>
<li><a href="Geoff-PAC.md">Generalized Off-Policy Actor-Critic</a> 27
Mar 2019</li>
<li><a href="https://arxiv.org/pdf/1812.05905.pdf">Soft Actor-Critic
Algorithms and Applications</a> 29 Jan 2019</li>
<li><a href="REACTOR.md">The Reactor: A Sample-Efficient Actor-Critic
Architecture</a> 15 Apr 2017</li>
<li><a href="ACER.md">SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE
REPLAY</a></li>
<li><a href="UNREAL.md">REINFORCEMENT LEARNING WITH UNSUPERVISED
AUXILIARY TASKS</a></li>
<li><a href="DDPG.md">Continuous control with deep reinforcement
learning</a></li>
</ul>
<h2 id="model-based">Model-based</h2>
<ul>
<li><a href="sc.md">Self-Consistent Models and Values</a> 25 Oct 2021 <a
href="https://arxiv.org/pdf/2110.12840.pdf">arxiv</a></li>
<li><a href="parametric.md">When to use parametric models in
reinforcement learning?</a> 12 Jun 2019 <a
href="https://arxiv.org/pdf/1906.05243.pdf">arxiv</a></li>
<li><a href="https://arxiv.org/pdf/1903.00374.pdf">Model Based
Reinforcement Learning for Atari</a> 5 Mar 2019</li>
<li><a href="MBDQN.md">Model-Based Stabilisation of Deep Reinforcement
Learning</a> 6 Sep 2018</li>
<li><a href="IBP.md">Learning model-based planning from scratch</a> 19
July 2017</li>
</ul>
<h2 id="model-free-model-based">Model-free + Model-based</h2>
<ul>
<li><a href="I2As.md">Imagination-Augmented Agents for Deep
Reinforcement Learning</a> 19 July 2017</li>
</ul>
<h2 id="hierarchical">Hierarchical</h2>
<ul>
<li><a href="HIRO.md">WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN
REINFORCEMENT LEARNING?</a> 23 Sep 2019 <a
href="https://arxiv.org/pdf/1909.10618.pdf">arxiv</a></li>
<li><a href="HAL.md">Language as an Abstraction for Hierarchical Deep
Reinforcement Learning</a> 18 Jun 2019 <a
href="https://arxiv.org/pdf/1906.07343.pdf">arxiv</a></li>
</ul>
<h2 id="option">Option</h2>
<ul>
<li><a href="VALOR.md">Variational Option Discovery Algorithms</a> 26
July 2018</li>
<li><a href="LFOD.md">A Laplacian Framework for Option Discovery in
Reinforcement Learning</a> 16 Jun 2017</li>
</ul>
<h2 id="connection-with-other-methods">Connection with other
methods</h2>
<ul>
<li><a href="GVG.md">Robust Imitation of Diverse Behaviors</a></li>
<li><a href="GAIL.md">Learning human behaviors from motion capture by
adversarial imitation</a></li>
<li><a href="GANAC.md">Connecting Generative Adversarial Networks and
Actor-Critic Methods</a></li>
</ul>
<h2 id="connecting-value-and-policy-methods">Connecting value and policy
methods</h2>
<ul>
<li><a href="PCL.md">Bridging the Gap Between Value and Policy Based
Reinforcement Learning</a></li>
<li><a href="PGQ.md">Policy gradient and Q-learning</a></li>
</ul>
<h2 id="reward-design">Reward design</h2>
<ul>
<li><a href="VICE.md">End-to-End Robotic Reinforcement Learning without
Reward Engineering</a> 16 Apr 2019 <a
href="https://arxiv.org/pdf/1904.07854.pdf">arxiv</a></li>
<li><a href="RLCRC.md">Reinforcement Learning with Corrupted Reward
Channel</a> 23 May 2017</li>
</ul>
<h2 id="unifying">Unifying</h2>
<ul>
<li><a href="MSRL.md">Multi-step Reinforcement Learning: A Unifying
Algorithm</a></li>
</ul>
<h2 id="faster-drl">Faster DRL</h2>
<ul>
<li><a href="NEC.md">Neural Episodic Control</a></li>
</ul>
<h2 id="multi-agent">Multi-agent</h2>
<ul>
<li><a href="Dip.md">No Press Diplomacy: Modeling Multi-Agent
Gameplay</a> 4 Sep 2019 <a
href="https://arxiv.org/pdf/1909.02128.pdf">arxiv</a></li>
<li><a href="OPRE">Options as responses: Grounding behavioural
hierarchies in multi-agent RL</a> 6 Jun 2019 <a
href="https://arxiv.org/pdf/1906.01470.pdf">arxiv</a></li>
<li><a href="MERL.md">Evolutionary Reinforcement Learning for
Sample-Efficient Multiagent Coordination</a> 18 Jun 2019 <a
href="https://arxiv.org/pdf/1906.07315.pdf">arxiv</a></li>
<li><a href="ROMMEO.md">A Regularized Opponent Model with Maximum
Entropy Objective</a> 17 May 2019 <a
href="https://arxiv.org/pdf/1905.08087.pdf">arxiv</a></li>
<li><a href="NashDQN.md">Deep Q-Learning for Nash Equilibria:
Nash-DQN</a> 23 Apr 2019 <a
href="https://arxiv.org/pdf/1904.10554.pdf">arxiv</a></li>
<li><a href="MRL.md">Malthusian Reinforcement Learning</a> 3 Mar 2019 <a
href="https://arxiv.org/pdf/1812.07019.pdf">arxiv</a></li>
<li><a href="bad.md">Bayesian Action Decoder for Deep Multi-Agent
Reinforcement Learning</a> 4 Nov 2018</li>
<li><a href="ISMCI.md">INTRINSIC SOCIAL MOTIVATION VIA CAUSAL INFLUENCE
IN MULTI-AGENT RL</a> 19 Oct 2018</li>
<li><a
href="http://www.cs.ox.ac.uk/people/shimon.whiteson/pubs/rashidicml18.pdf">QMIX:
Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning</a> 30 Mar 2018</li>
<li><a href="SOM.md">Modeling Others using Oneself in Multi-Agent
Reinforcement Learning</a> 26 Feb 2018</li>
<li><a href="SGA.md">The Mechanics of n-Player Differentiable Games</a>
15 Feb 2018</li>
<li><a href="RoboSumo.md">Continuous Adaptation via Meta-Learning in
Nonstationary and Competitive Environments</a> 10 Oct 2017</li>
<li><a href="LOLA.md">Learning with Opponent-Learning Awareness</a> 13
Sep 2017</li>
<li><a href="COMA.md">Counterfactual Multi-Agent Policy
Gradients</a></li>
<li><a href="MADDPG.md">Multi-Agent Actor-Critic for Mixed
Cooperative-Competitive Environments</a> 7 Jun 2017</li>
<li><a href="BiCNet.md">Multiagent Bidirectionally-Coordinated Nets for
Learning to Play StarCraft Combat Games</a> 29 Mar 2017</li>
</ul>
<h2 id="new-design">New design</h2>
<ul>
<li><a href="https://arxiv.org/pdf/1802.01561.pdf">IMPALA: Scalable
Distributed Deep-RL with Importance Weighted Actor-Learner
Architectures</a> 9 Feb 2018</li>
<li><a href="RECUR.md">Reverse Curriculum Generation for Reinforcement
Learning</a></li>
<li><a href="HIRL.md">Trial without Error: Towards Safe Reinforcement
Learning via Human Intervention</a></li>
<li><a href="DualMDP.md">Learning to Design Games: Strategic
Environments in Deep Reinforcement Learning</a> 5 July 2017</li>
</ul>
<h2 id="multitask">Multitask</h2>
<ul>
<li><a href="https://arxiv.org/pdf/1803.03835.pdf">Kickstarting Deep
Reinforcement Learning</a> 10 Mar 2018</li>
<li><a href="ZSTG.md">Zero-Shot Task Generalization with Multi-Task Deep
Reinforcement Learning</a> 7 Nov 2017</li>
<li><a href="Distral.md">Distral: Robust Multitask Reinforcement
Learning</a> 13 July 2017</li>
</ul>
<h2 id="observational-learning">Observational Learning</h2>
<ul>
<li><a href="OLRL.md">Observational Learning by Reinforcement
Learning</a> 20 Jun 2017</li>
</ul>
<h2 id="meta-learning">Meta Learning</h2>
<ul>
<li><a href="GVF.md">Discovery of Useful Questions as Auxiliary
Tasks</a> 10 Sep 2019 <a
href="https://arxiv.org/pdf/1909.04607.pdf">arxiv</a></li>
<li><a href="MetaSS.md">Meta-learning of Sequential Strategies</a> 8 May
2019 <a href="https://arxiv.org/pdf/1905.03030.pdf">arxiv</a></li>
<li><a href="PEARL.md">Efficient Off-Policy Meta-Reinforcement Learning
via Probabilistic Context Variables</a> 19 Mar 2019 <a
href="https://arxiv.org/pdf/1903.08254.pdf">arxiv</a></li>
<li><a href="E2.md">Some Considerations on Learning to Explore via
Meta-Reinforcement Learning</a> 11 Jan 2019 <a
href="https://arxiv.org/pdf/1803.01118.pdf">arxiv</a></li>
<li><a href="MGRL.md">Meta-Gradient Reinforcement Learning</a> 24 May
2018 <a href="https://arxiv.org/pdf/1805.09801.pdf">arxiv</a></li>
<li><a href="ProMP.md">ProMP: Proximal Meta-Policy Search</a> 16 Oct
2018 <a href="https://arxiv.org/pdf/1810.06784">arxiv</a></li>
<li><a href="UML.md">Unsupervised Meta-Learning for Reinforcement
Learning</a> 12 Jun 2018</li>
</ul>
<h2 id="distributional">Distributional</h2>
<ul>
<li><a href="GANQL.md">GAN Q-learning</a> 20 July 2018</li>
<li><a href="IQN.md">Implicit Quantile Networks for Distributional
Reinforcement Learning</a> 14 Jun 2018</li>
<li><a href="GTD.md">Nonlinear Distributional Gradient
Temporal-Difference Learning</a> 20 May 2018</li>
<li><a href="D4PG.md">DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY
GRADIENTS</a> 23 Apr 2018</li>
<li><a href="C51-analysis.md">An Analysis of Categorical Distributional
Reinforcement Learning</a> 22 Feb 2018</li>
<li><a href="QR-DQN.md">Distributional Reinforcement Learning with
Quantile Regression</a> 27 Oct 2017</li>
<li><a href="C51.md">A Distributional Perspective on Reinforcement
Learning</a> 21 July 2017</li>
</ul>
<h2 id="planning">Planning</h2>
<ul>
<li><a href="SoRB.md">Search on the Replay Buffer: Bridging Planning and
Reinforcement Learning</a> 12 June 2019 <a
href="https://arxiv.org/pdf/1906.05253.pdf">arxiv</a></li>
</ul>
<h2 id="safety">Safety</h2>
<ul>
<li><a href="MPO.md">Robust Reinforcement Learning for Continuous
Control with Model Misspecification</a> 18 Jun 2019 <a
href="https://arxiv.org/pdf/1906.07516.pdf">arxiv</a></li>
<li><a href="Viper.md">Verifiable Reinforcement Learning via Policy
Extraction</a> 22 May 2018 <a
href="https://arxiv.org/pdf/1805.08328.pdf">arxiv</a></li>
</ul>
<h2 id="inverse-rl">Inverse RL</h2>
<ul>
<li><a href="OP-GAIL.md">ADDRESSING SAMPLE INEFFICIENCY AND REWARD BIAS
IN INVERSE REINFORCEMENT LEARNING</a> 9 Sep 2018</li>
</ul>
<h2 id="no-reward-rl">No reward RL</h2>
<ul>
<li><a href="VISR.md">Fast Task Inference with Variational Intrinsic
Successor Features</a> 2 Jun 2019 <a
href="https://arxiv.org/pdf/1906.05030.pdf">arxiv</a></li>
<li><a href="https://arxiv.org/pdf/1705.05363">Curiosity-driven
Exploration by Self-supervised Prediction</a> 15 May 2017</li>
</ul>
<h2 id="time">Time</h2>
<ul>
<li><a href="Intervaltime.md">Interval timing in deep reinforcement
learning agents</a> 31 May 2019 <a
href="https://arxiv.org/pdf/1905.13469.pdf">arxiv</a></li>
<li><a href="PEB.md">Time Limits in Reinforcement Learning</a></li>
</ul>
<h2 id="adversarial-learning">Adversarial learning</h2>
<ul>
<li><a href="LQR+GAIfO.md">Sample-efficient Adversarial Imitation
Learning from Observation</a> 18 Jun 2019 <a
href="https://arxiv.org/pdf/1906.07374.pdf">arxiv</a></li>
</ul>
<h2 id="use-natural-language">Use Natural Language</h2>
<ul>
<li><a href="LEARN.md">Using Natural Language for Reward Shaping in
Reinforcement Learning</a> 31 May 2019 <a
href="https://www.cs.utexas.edu/~ai-lab/downloadPublication.php?filename=http://www.cs.utexas.edu/users/ml/papers/goyal.ijcai19.pdf&amp;pubid=127757">arxiv</a></li>
</ul>
<h2 id="generative-and-contrastive-representation-learning">Generative
and contrastive representation learning</h2>
<ul>
<li><a href="ST-DIM.md">Unsupervised State Representation Learning in
Atari</a> 19 Jun 2019 <a
href="https://arxiv.org/pdf/1906.08226.pdf">arxiv</a></li>
</ul>
<h2 id="belief">Belief</h2>
<ul>
<li><a href="GenerativeBelief.md">Shaping Belief States with Generative
Environment Models for RL</a> 24 Jun 2019 <a
href="https://arxiv.org/pdf/1906.09237v2.pdf">arxiv</a></li>
</ul>
<h2 id="pac">PAC</h2>
<ul>
<li><a href="COF-PAC.md">Provably Convergent Off-Policy Actor-Critic
with Function Approximation</a> 11 Nov 2019 <a
href="https://arxiv.org/pdf/1911.04384.pdf">arxiv</a></li>
</ul>
<h2 id="applications">Applications</h2>
<ul>
<li><a href="bdope.md">Benchmarks for Deep Off-Policy Evaluation</a> 30
Mar 2021 <a href="https://arxiv.org/pdf/2103.16596.pdf">arxiv</a></li>
<li><a href="Reciprocity.md">Learning Reciprocity in Complex Sequential
Social Dilemmas</a> 19 Mar 2019 <a
href="https://arxiv.org/pdf/1903.08082.pdf">arxiv</a></li>
<li><a href="dmimic.md">DeepMimic: Example-Guided Deep Reinforcement
Learning of Physics-Based Character Skills</a> 9 Apr 2018</li>
<li><a href="RLTUNER.md">TUNING RECURRENT NEURAL NETWORKS WITH
REINFORCEMENT LEARNING</a></li>
</ul>