Files
awesome-awesomeness/html/deeprl.html
2025-07-18 23:13:11 +02:00

549 lines
25 KiB
HTML
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
<h1 id="awesome-deep-reinforcement-learning">Awesome Deep Reinforcement
Learning</h1>
<blockquote>
<p><strong>Mar 1 2024 update: HILP added</strong></p>
<p><strong>July 2022 update: EDDICT added</strong></p>
<p><strong>Mar 2022 update: a few papers released in early
2022</strong></p>
<p><strong>Dec 2021 update: Unsupervised RL</strong></p>
</blockquote>
<h2 id="introduction-to-awesome-drl">Introduction to awesome drl</h2>
<p>Reinforcement learning is the fundamental framework for building AGI.
Therefore we share important contributions within this awesome drl
project.</p>
<h2 id="landscape-of-deep-rl">Landscape of Deep RL</h2>
<figure>
<img src="images/awesome-drl.png" alt="updated Landscape of DRL" />
<figcaption aria-hidden="true">updated Landscape of
<strong>DRL</strong></figcaption>
</figure>
<h2 id="content">Content</h2>
<ul>
<li><a href="#awesome-deep-reinforcement-learning">Awesome Deep
Reinforcement Learning</a>
<ul>
<li><a href="#introduction-to-awesome-drl">Introduction to awesome
drl</a></li>
<li><a href="#landscape-of-deep-rl">Landscape of Deep RL</a></li>
<li><a href="#content">Content</a></li>
<li><a href="#general-guidances">General guidances</a></li>
<li><a href="#2022">2022</a></li>
<li><a href="#foundations-and-theory">Foundations and theory</a></li>
<li><a href="#general-benchmark-frameworks">General benchmark
frameworks</a></li>
<li><a href="#unsupervised">Unsupervised</a></li>
<li><a href="#offline">Offline</a></li>
<li><a href="#value-based">Value based</a></li>
<li><a href="#policy-gradient">Policy gradient</a></li>
<li><a href="#explorations">Explorations</a></li>
<li><a href="#actor-critic">Actor-Critic</a></li>
<li><a href="#model-based">Model-based</a></li>
<li><a href="#model-free--model-based">Model-free + Model-based</a></li>
<li><a href="#hierarchical">Hierarchical</a></li>
<li><a href="#option">Option</a></li>
<li><a href="#connection-with-other-methods">Connection with other
methods</a></li>
<li><a href="#connecting-value-and-policy-methods">Connecting value and
policy methods</a></li>
<li><a href="#reward-design">Reward design</a></li>
<li><a href="#unifying">Unifying</a></li>
<li><a href="#faster-drl">Faster DRL</a></li>
<li><a href="#multi-agent">Multi-agent</a></li>
<li><a href="#new-design">New design</a></li>
<li><a href="#multitask">Multitask</a></li>
<li><a href="#observational-learning">Observational Learning</a></li>
<li><a href="#meta-learning">Meta Learning</a></li>
<li><a href="#distributional">Distributional</a></li>
<li><a href="#planning">Planning</a></li>
<li><a href="#safety">Safety</a></li>
<li><a href="#inverse-rl">Inverse RL</a></li>
<li><a href="#no-reward-rl">No reward RL</a></li>
<li><a href="#time">Time</a></li>
<li><a href="#adversarial-learning">Adversarial learning</a></li>
<li><a href="#use-natural-language">Use Natural Language</a></li>
<li><a
href="#generative-and-contrastive-representation-learning">Generative
and contrastive representation learning</a></li>
<li><a href="#belief">Belief</a></li>
<li><a href="#pac">PAC</a></li>
<li><a href="#applications">Applications</a></li>
</ul></li>
</ul>
<p>Illustrations:</p>
<p><img src="images/ACER.png" /></p>
<p><strong>Recommendations and suggestions are welcome</strong>. ##
General guidances * <a
href="https://github.com/hanjuku-kaso/awesome-offline-rl">Awesome
Offline RL</a> * <a
href="http://reinforcementlearning.today/">Reinforcement Learning
Today</a> * <a
href="http://mlanctot.info/files/papers/Lanctot_MARL_RLSS2019_Lille.pdf">Multiagent
Reinforcement Learning by Marc Lanctot RLSS @ Lille</a> 11 July 2019 *
<a href="https://david-abel.github.io/notes/rldm_2019.pdf">RLDM 2019
Notes by David Abel</a> 11 July 2019 * <a href="RLNL.md">A Survey of
Reinforcement Learning Informed by Natural Language</a> 10 Jun 2019 <a
href="https://arxiv.org/pdf/1906.03926.pdf">arxiv</a> * <a
href="ChallengesRealWorldRL.md">Challenges of Real-World Reinforcement
Learning</a> 29 Apr 2019 <a
href="https://arxiv.org/pdf/1904.12901.pdf">arxiv</a> * <a
href="RayInterference.md">Ray Interference: a Source of Plateaus in Deep
Reinforcement Learning</a> 25 Apr 2019 <a
href="https://arxiv.org/pdf/1904.11455.pdf">arxiv</a> * <a
href="p10.md">Principles of Deep RL by David Silver</a> * <a
href="https://www.jianshu.com/p/dfd987aa765a">University AIs General
introduction to deep rl (in Chinese)</a> * <a
href="https://spinningup.openai.com/en/latest/">OpenAIs spinningup</a>
* <a
href="https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/">The
Promise of Hierarchical Reinforcement Learning</a> 9 Mar 2019 * <a
href="reproducing.md">Deep Reinforcement Learning that Matters</a> 30
Jan 2019 <a href="https://arxiv.org/pdf/1709.06560.pdf">arxiv</a></p>
<h2 id="section">2024</h2>
<ul>
<li><a href="HILP.md">Foundation Policies with Hilbert
Representations</a> <a href="https://arxiv.org/abs/2402.15567">arxiv</a>
<a href="https://github.com/seohongpark/HILP">repo</a> 23 Feb 2024</li>
</ul>
<h2 id="section-1">2022</h2>
<ul>
<li>Reinforcement Learning with Action-Free Pre-Training from Videos <a
href="https://arxiv.org/abs/2203.13880">arxiv</a> <a
href="https://github.com/younggyoseo/apv">repo</a></li>
</ul>
<h2 id="generalist-policies">Generalist policies</h2>
<ul>
<li><a href="HILP.md">Foundation Policies with Hilbert
Representations</a> <a href="https://arxiv.org/abs/2402.15567">arxiv</a>
<a href="https://github.com/seohongpark/HILP">repo</a> 23 Feb 2024</li>
</ul>
<h2 id="foundations-and-theory">Foundations and theory</h2>
<ul>
<li><a href="GNLBE.md">General non-linear Bellman equations</a> 9 July
2019 <a href="https://arxiv.org/pdf/1907.07331.pdf">arxiv</a></li>
<li><a href="MCGE.md">Monte Carlo Gradient Estimation in Machine
Learning</a> 25 Jun 2019 <a
href="https://arxiv.org/pdf/1906.10652.pdf">arxiv</a></li>
</ul>
<h2 id="general-benchmark-frameworks">General benchmark frameworks</h2>
<ul>
<li><a href="https://github.com/google/brax/">Brax</a>
<img src="https://github.com/google/brax/raw/main/docs/img/brax_logo.gif" width="336" height="80" alt="BRAX"/></li>
</ul>
<p><img
src="https://github.com/google/brax/raw/main/docs/img/fetch.gif" /> * <a
href="https://github.com/deepmind/android_env">Android-Env</a> * <img
src="https://github.com/deepmind/android_env/raw/main/docs/images/device_control.gif" />
* <a href="http://mujoco.org/">MuJoCo</a> | <a
href="https://github.com/tigerneil/mujoco-zh">MuJoCo Chinese version</a>
* <a href="https://github.com/rll-research/url_benchmark">Unsupervised
RL Benchmark</a> * <a
href="https://github.com/rail-berkeley/d4rl">Dataset for Offline RL</a>
* <a href="https://github.com/deepmind/spriteworld">Spriteworld: a
flexible, configurable python-based reinforcement learning
environment</a> * <a
href="https://github.com/chainer/chainerrl-visualizer">Chainerrl
Visualizer</a> * <a href="BSRL.md">Behaviour Suite for Reinforcement
Learning</a> 13 Aug 2019 <a
href="https://arxiv.org/pdf/1908.03568.pdf">arxiv</a> | <a
href="https://github.com/deepmind/bsuite">code</a> * <a
href="Coinrun.md">Quantifying Generalization in Reinforcement
Learning</a> 20 Dec 2018 <a
href="https://arxiv.org/pdf/1812.02341.pdf">arxiv</a> * <a
href="SRL.md">S-RL Toolbox: Environments, Datasets and Evaluation
Metrics for State Representation Learning</a> 25 Sept 2018 * <a
href="https://github.com/google/dopamine">dopamine</a> * <a
href="https://github.com/deepmind/pysc2">StarCraft II</a> * <a
href="https://github.com/deepmind/trfl">tfrl</a> * <a
href="https://github.com/chainer/chainerrl">chainerrl</a> * <a
href="https://github.com/PaddlePaddle/PARL">PARL</a> * <a
href="https://github.com/opendilab/DI-engine">DI-engine: a generalized
decision intelligence engine. It supports various Deep RL algorithms</a>
* <a href="https://github.com/opendilab/PPOxFamily">PPO x Family: Course
in Chinese for Deep RL</a></p>
<h2 id="unsupervised">Unsupervised</h2>
<ul>
<li><a href="https://arxiv.org/abs/2110.15191">URLB: Unsupervised
Reinforcement Learning Benchmark</a> 28 Oct 2021</li>
<li><a href="https://arxiv.org/abs/2108.13956">APS: Active Pretraining
with Successor Feature</a> 31 Aug 2021</li>
<li><a href="https://arxiv.org/abs/2103.04551">Behavior From the Void:
Unsupervised Active Pre-Training</a> 8 Mar 2021</li>
<li><a href="https://arxiv.org/abs/2102.11271">Reinforcement Learning
with Prototypical Representations</a> 22 Feb 2021</li>
<li><a href="https://arxiv.org/abs/1906.05274">Efficient Exploration via
State Marginal Matching</a> 12 Jun 2019</li>
<li><a href="https://arxiv.org/abs/1906.04161">Self-Supervised
Exploration via Disagreement</a> 10 Jun 2019</li>
<li><a href="https://arxiv.org/abs/1810.12894">Exploration by Random
Network Distillation</a> 30 Oct 2018</li>
<li><a href="https://arxiv.org/abs/1802.06070">Diversity is All You
Need: Learning Skills without a Reward Function</a> 16 Feb 2018</li>
<li><a href="https://arxiv.org/pdf/1705.05363">Curiosity-driven
Exploration by Self-supervised Prediction</a> 15 May 2017</li>
</ul>
<h2 id="offline">Offline</h2>
<ul>
<li><a href="https://arxiv.org/abs/2102.06961">PerSim: Data-efficient
Offline Reinforcement Learning with Heterogeneous Agents via
Personalized Simulators</a> 10 Nov 2021</li>
<li><a href="">A General Offline Reinforcement Learning Framework for
Interactive Recommendation</a> AAAI 2021</li>
</ul>
<h2 id="value-based">Value based</h2>
<ul>
<li><a href="SVRL.md">Harnessing Structures for Value-Based Planning and
Reinforcement Learning</a> 5 Feb 2020 <a
href="https://arxiv.org/abs/1909.12255">arxiv</a> | <a
href="https://github.com/YyzHarry/SV-RL">code</a></li>
<li><a href="RVF.md">Recurrent Value Functions</a> 23 May 2019 <a
href="https://arxiv.org/pdf/1905.09562.pdf">arxiv</a></li>
<li><a href="LipschitzQ.md">Stochastic Lipschitz Q-Learning</a> 24 Apr
2019 <a href="https://arxiv.org/pdf/1904.10653.pdf">arxiv</a></li>
<li><a href="https://arxiv.org/pdf/1710.11417">TreeQN and ATreeC:
Differentiable Tree-Structured Models for Deep Reinforcement
Learning</a> 8 Mar 2018</li>
<li><a href="https://arxiv.org/pdf/1803.00933.pdf">DISTRIBUTED
PRIORITIZED EXPERIENCE REPLAY</a> 2 Mar 2018</li>
<li><a href="Rainbow.md">Rainbow: Combining Improvements in Deep
Reinforcement Learning</a> 6 Oct 2017</li>
<li><a href="DQfD.md">Learning from Demonstrations for Real World
Reinforcement Learning</a> 12 Apr 2017</li>
<li><a href="Dueling.md">Dueling Network Architecture</a></li>
<li><a href="DDQN.md">Double DQN</a></li>
<li><a href="PER.md">Prioritized Experience</a></li>
<li><a href="DQN.md">Deep Q-Networks</a></li>
</ul>
<h2 id="policy-gradient">Policy gradient</h2>
<ul>
<li><a href="PPG.md">Phasic Policy Gradient</a> 9 Sep 2020 <a
href="https://arxiv.org/pdf/2009.04416.pdf">arxiv</a> <a
href="https://github.com/openai/phasic-policy-gradient">code</a></li>
<li><a href="OVPG.md">An operator view of policy gradient methods</a> 22
Jun 2020 <a href="https://arxiv.org/pdf/2006.11266.pdf">arxiv</a></li>
<li><a href="DirPG.md">Direct Policy Gradients: Direct Optimization of
Policies in Discrete Action Spaces</a> 14 Jun 2019 <a
href="https://arxiv.org/pdf/1906.06062.pdf">arxiv</a></li>
<li><a href="PGS.md">Policy Gradient Search: Online Planning and Expert
Iteration without Search Trees</a> 7 Apr 2019 <a
href="https://arxiv.org/pdf/1904.03646.pdf">arxiv</a></li>
<li><a href="SPU.md">SUPERVISED POLICY UPDATE FOR DEEP REINFORCEMENT
LEARNING</a> 24 Dec 2018 <a
href="https://arxiv.org/pdf/1805.11706v4.pdf">arxiv</a></li>
<li><a href="PPO-CMA.md">PPO-CMA: Proximal Policy Optimization with
Covariance Matrix Adaptation</a> 5 Oct 2018 <a
href="https://arxiv.org/pdf/1810.02541v6.pdf">arxiv</a></li>
<li><a href="CAPG.md">Clipped Action Policy Gradient</a> 22 June
2018</li>
<li><a href="EPG.md">Expected Policy Gradients for Reinforcement
Learning</a> 10 Jan 2018</li>
<li><a href="PPO.md">Proximal Policy Optimization Algorithms</a> 20 July
2017</li>
<li><a href="DPPO.md">Emergence of Locomotion Behaviours in Rich
Environments</a> 7 July 2017</li>
<li><a href="IPG.md">Interpolated Policy Gradient: Merging On-Policy and
Off-Policy Gradient Estimation for Deep Reinforcement Learning</a> 1 Jun
2017</li>
<li><a href="PGSQL.md">Equivalence Between Policy Gradients and Soft
Q-Learning</a></li>
<li><a href="TRPO.md">Trust Region Policy Optimization</a></li>
<li><a href="DEBP.md">Reinforcement Learning with Deep Energy-Based
Policies</a></li>
<li><a href="QPROP.md">Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN
OFF-POLICY CRITIC</a></li>
</ul>
<h2 id="explorations">Explorations</h2>
<ul>
<li><a href="EDDICT.md">Entropic Desired Dynamics for Intrinsic
Control</a> 2021 <a
href="https://openreview.net/pdf?id=lBSSxTgXmiK">openreview</a></li>
<li><a href="Disagreement.md">Self-Supervised Exploration via
Disagreement</a> 10 Jun 2019 <a
href="https://arxiv.org/pdf/1906.04161.pdf">arxiv</a></li>
<li><a href="MBIE-EB.md">Approximate Exploration through State
Abstraction</a> 24 Jan 2019</li>
<li><a href="UBE.md">The Uncertainty Bellman Equation and
Exploration</a> 15 Sep 2017</li>
<li><a href="NoisyNet.md">Noisy Networks for Exploration</a> 30 Jun 2017
<a
href="https://github.com/Kaixhin/NoisyNet-A3C">implementation</a></li>
<li><a href="PhiEB.md">Count-Based Exploration in Feature Space for
Reinforcement Learning</a> 25 Jun 2017</li>
<li><a href="NDM.md">Count-Based Exploration with Neural Density
Models</a> 14 Jun 2017</li>
<li><a href="QEnsemble.md">UCB and InfoGain Exploration via
Q-Ensembles</a> 11 Jun 2017</li>
<li><a href="MMRB.md">Minimax Regret Bounds for Reinforcement
Learning</a> 16 Mar 2017</li>
<li><a href="incentivizing.md">Incentivizing Exploration In
Reinforcement Learning With Deep Predictive Models</a></li>
<li><a href="EX2.md">EX2: Exploration with Exemplar Models for Deep
Reinforcement Learning</a></li>
</ul>
<h2 id="actor-critic">Actor-Critic</h2>
<ul>
<li><a href="Geoff-PAC.md">Generalized Off-Policy Actor-Critic</a> 27
Mar 2019</li>
<li><a href="https://arxiv.org/pdf/1812.05905.pdf">Soft Actor-Critic
Algorithms and Applications</a> 29 Jan 2019</li>
<li><a href="REACTOR.md">The Reactor: A Sample-Efficient Actor-Critic
Architecture</a> 15 Apr 2017</li>
<li><a href="ACER.md">SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE
REPLAY</a></li>
<li><a href="UNREAL.md">REINFORCEMENT LEARNING WITH UNSUPERVISED
AUXILIARY TASKS</a></li>
<li><a href="DDPG.md">Continuous control with deep reinforcement
learning</a></li>
</ul>
<h2 id="model-based">Model-based</h2>
<ul>
<li><a href="sc.md">Self-Consistent Models and Values</a> 25 Oct 2021 <a
href="https://arxiv.org/pdf/2110.12840.pdf">arxiv</a></li>
<li><a href="parametric.md">When to use parametric models in
reinforcement learning?</a> 12 Jun 2019 <a
href="https://arxiv.org/pdf/1906.05243.pdf">arxiv</a></li>
<li><a href="https://arxiv.org/pdf/1903.00374.pdf">Model Based
Reinforcement Learning for Atari</a> 5 Mar 2019</li>
<li><a href="MBDQN.md">Model-Based Stabilisation of Deep Reinforcement
Learning</a> 6 Sep 2018</li>
<li><a href="IBP.md">Learning model-based planning from scratch</a> 19
July 2017</li>
</ul>
<h2 id="model-free-model-based">Model-free + Model-based</h2>
<ul>
<li><a href="I2As.md">Imagination-Augmented Agents for Deep
Reinforcement Learning</a> 19 July 2017</li>
</ul>
<h2 id="hierarchical">Hierarchical</h2>
<ul>
<li><a href="HIRO.md">WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN
REINFORCEMENT LEARNING?</a> 23 Sep 2019 <a
href="https://arxiv.org/pdf/1909.10618.pdf">arxiv</a></li>
<li><a href="HAL.md">Language as an Abstraction for Hierarchical Deep
Reinforcement Learning</a> 18 Jun 2019 <a
href="https://arxiv.org/pdf/1906.07343.pdf">arxiv</a></li>
</ul>
<h2 id="option">Option</h2>
<ul>
<li><a href="VALOR.md">Variational Option Discovery Algorithms</a> 26
July 2018</li>
<li><a href="LFOD.md">A Laplacian Framework for Option Discovery in
Reinforcement Learning</a> 16 Jun 2017</li>
</ul>
<h2 id="connection-with-other-methods">Connection with other
methods</h2>
<ul>
<li><a href="GVG.md">Robust Imitation of Diverse Behaviors</a></li>
<li><a href="GAIL.md">Learning human behaviors from motion capture by
adversarial imitation</a></li>
<li><a href="GANAC.md">Connecting Generative Adversarial Networks and
Actor-Critic Methods</a></li>
</ul>
<h2 id="connecting-value-and-policy-methods">Connecting value and policy
methods</h2>
<ul>
<li><a href="PCL.md">Bridging the Gap Between Value and Policy Based
Reinforcement Learning</a></li>
<li><a href="PGQ.md">Policy gradient and Q-learning</a></li>
</ul>
<h2 id="reward-design">Reward design</h2>
<ul>
<li><a href="VICE.md">End-to-End Robotic Reinforcement Learning without
Reward Engineering</a> 16 Apr 2019 <a
href="https://arxiv.org/pdf/1904.07854.pdf">arxiv</a></li>
<li><a href="RLCRC.md">Reinforcement Learning with Corrupted Reward
Channel</a> 23 May 2017</li>
</ul>
<h2 id="unifying">Unifying</h2>
<ul>
<li><a href="MSRL.md">Multi-step Reinforcement Learning: A Unifying
Algorithm</a></li>
</ul>
<h2 id="faster-drl">Faster DRL</h2>
<ul>
<li><a href="NEC.md">Neural Episodic Control</a></li>
</ul>
<h2 id="multi-agent">Multi-agent</h2>
<ul>
<li><a href="Dip.md">No Press Diplomacy: Modeling Multi-Agent
Gameplay</a> 4 Sep 2019 <a
href="https://arxiv.org/pdf/1909.02128.pdf">arxiv</a></li>
<li><a href="OPRE">Options as responses: Grounding behavioural
hierarchies in multi-agent RL</a> 6 Jun 2019 <a
href="https://arxiv.org/pdf/1906.01470.pdf">arxiv</a></li>
<li><a href="MERL.md">Evolutionary Reinforcement Learning for
Sample-Efficient Multiagent Coordination</a> 18 Jun 2019 <a
href="https://arxiv.org/pdf/1906.07315.pdf">arxiv</a></li>
<li><a href="ROMMEO.md">A Regularized Opponent Model with Maximum
Entropy Objective</a> 17 May 2019 <a
href="https://arxiv.org/pdf/1905.08087.pdf">arxiv</a></li>
<li><a href="NashDQN.md">Deep Q-Learning for Nash Equilibria:
Nash-DQN</a> 23 Apr 2019 <a
href="https://arxiv.org/pdf/1904.10554.pdf">arxiv</a></li>
<li><a href="MRL.md">Malthusian Reinforcement Learning</a> 3 Mar 2019 <a
href="https://arxiv.org/pdf/1812.07019.pdf">arxiv</a></li>
<li><a href="bad.md">Bayesian Action Decoder for Deep Multi-Agent
Reinforcement Learning</a> 4 Nov 2018</li>
<li><a href="ISMCI.md">INTRINSIC SOCIAL MOTIVATION VIA CAUSAL INFLUENCE
IN MULTI-AGENT RL</a> 19 Oct 2018</li>
<li><a
href="http://www.cs.ox.ac.uk/people/shimon.whiteson/pubs/rashidicml18.pdf">QMIX:
Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning</a> 30 Mar 2018</li>
<li><a href="SOM.md">Modeling Others using Oneself in Multi-Agent
Reinforcement Learning</a> 26 Feb 2018</li>
<li><a href="SGA.md">The Mechanics of n-Player Differentiable Games</a>
15 Feb 2018</li>
<li><a href="RoboSumo.md">Continuous Adaptation via Meta-Learning in
Nonstationary and Competitive Environments</a> 10 Oct 2017</li>
<li><a href="LOLA.md">Learning with Opponent-Learning Awareness</a> 13
Sep 2017</li>
<li><a href="COMA.md">Counterfactual Multi-Agent Policy
Gradients</a></li>
<li><a href="MADDPG.md">Multi-Agent Actor-Critic for Mixed
Cooperative-Competitive Environments</a> 7 Jun 2017</li>
<li><a href="BiCNet.md">Multiagent Bidirectionally-Coordinated Nets for
Learning to Play StarCraft Combat Games</a> 29 Mar 2017</li>
</ul>
<h2 id="new-design">New design</h2>
<ul>
<li><a href="https://arxiv.org/pdf/1802.01561.pdf">IMPALA: Scalable
Distributed Deep-RL with Importance Weighted Actor-Learner
Architectures</a> 9 Feb 2018</li>
<li><a href="RECUR.md">Reverse Curriculum Generation for Reinforcement
Learning</a></li>
<li><a href="HIRL.md">Trial without Error: Towards Safe Reinforcement
Learning via Human Intervention</a></li>
<li><a href="DualMDP.md">Learning to Design Games: Strategic
Environments in Deep Reinforcement Learning</a> 5 July 2017</li>
</ul>
<h2 id="multitask">Multitask</h2>
<ul>
<li><a href="https://arxiv.org/pdf/1803.03835.pdf">Kickstarting Deep
Reinforcement Learning</a> 10 Mar 2018</li>
<li><a href="ZSTG.md">Zero-Shot Task Generalization with Multi-Task Deep
Reinforcement Learning</a> 7 Nov 2017</li>
<li><a href="Distral.md">Distral: Robust Multitask Reinforcement
Learning</a> 13 July 2017</li>
</ul>
<h2 id="observational-learning">Observational Learning</h2>
<ul>
<li><a href="OLRL.md">Observational Learning by Reinforcement
Learning</a> 20 Jun 2017</li>
</ul>
<h2 id="meta-learning">Meta Learning</h2>
<ul>
<li><a href="GVF.md">Discovery of Useful Questions as Auxiliary
Tasks</a> 10 Sep 2019 <a
href="https://arxiv.org/pdf/1909.04607.pdf">arxiv</a></li>
<li><a href="MetaSS.md">Meta-learning of Sequential Strategies</a> 8 May
2019 <a href="https://arxiv.org/pdf/1905.03030.pdf">arxiv</a></li>
<li><a href="PEARL.md">Efficient Off-Policy Meta-Reinforcement Learning
via Probabilistic Context Variables</a> 19 Mar 2019 <a
href="https://arxiv.org/pdf/1903.08254.pdf">arxiv</a></li>
<li><a href="E2.md">Some Considerations on Learning to Explore via
Meta-Reinforcement Learning</a> 11 Jan 2019 <a
href="https://arxiv.org/pdf/1803.01118.pdf">arxiv</a></li>
<li><a href="MGRL.md">Meta-Gradient Reinforcement Learning</a> 24 May
2018 <a href="https://arxiv.org/pdf/1805.09801.pdf">arxiv</a></li>
<li><a href="ProMP.md">ProMP: Proximal Meta-Policy Search</a> 16 Oct
2018 <a href="https://arxiv.org/pdf/1810.06784">arxiv</a></li>
<li><a href="UML.md">Unsupervised Meta-Learning for Reinforcement
Learning</a> 12 Jun 2018</li>
</ul>
<h2 id="distributional">Distributional</h2>
<ul>
<li><a href="GANQL.md">GAN Q-learning</a> 20 July 2018</li>
<li><a href="IQN.md">Implicit Quantile Networks for Distributional
Reinforcement Learning</a> 14 Jun 2018</li>
<li><a href="GTD.md">Nonlinear Distributional Gradient
Temporal-Difference Learning</a> 20 May 2018</li>
<li><a href="D4PG.md">DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY
GRADIENTS</a> 23 Apr 2018</li>
<li><a href="C51-analysis.md">An Analysis of Categorical Distributional
Reinforcement Learning</a> 22 Feb 2018</li>
<li><a href="QR-DQN.md">Distributional Reinforcement Learning with
Quantile Regression</a> 27 Oct 2017</li>
<li><a href="C51.md">A Distributional Perspective on Reinforcement
Learning</a> 21 July 2017</li>
</ul>
<h2 id="planning">Planning</h2>
<ul>
<li><a href="SoRB.md">Search on the Replay Buffer: Bridging Planning and
Reinforcement Learning</a> 12 June 2019 <a
href="https://arxiv.org/pdf/1906.05253.pdf">arxiv</a></li>
</ul>
<h2 id="safety">Safety</h2>
<ul>
<li><a href="MPO.md">Robust Reinforcement Learning for Continuous
Control with Model Misspecification</a> 18 Jun 2019 <a
href="https://arxiv.org/pdf/1906.07516.pdf">arxiv</a></li>
<li><a href="Viper.md">Verifiable Reinforcement Learning via Policy
Extraction</a> 22 May 2018 <a
href="https://arxiv.org/pdf/1805.08328.pdf">arxiv</a></li>
</ul>
<h2 id="inverse-rl">Inverse RL</h2>
<ul>
<li><a href="OP-GAIL.md">ADDRESSING SAMPLE INEFFICIENCY AND REWARD BIAS
IN INVERSE REINFORCEMENT LEARNING</a> 9 Sep 2018</li>
</ul>
<h2 id="no-reward-rl">No reward RL</h2>
<ul>
<li><a href="VISR.md">Fast Task Inference with Variational Intrinsic
Successor Features</a> 2 Jun 2019 <a
href="https://arxiv.org/pdf/1906.05030.pdf">arxiv</a></li>
<li><a href="https://arxiv.org/pdf/1705.05363">Curiosity-driven
Exploration by Self-supervised Prediction</a> 15 May 2017</li>
</ul>
<h2 id="time">Time</h2>
<ul>
<li><a href="Intervaltime.md">Interval timing in deep reinforcement
learning agents</a> 31 May 2019 <a
href="https://arxiv.org/pdf/1905.13469.pdf">arxiv</a></li>
<li><a href="PEB.md">Time Limits in Reinforcement Learning</a></li>
</ul>
<h2 id="adversarial-learning">Adversarial learning</h2>
<ul>
<li><a href="LQR+GAIfO.md">Sample-efficient Adversarial Imitation
Learning from Observation</a> 18 Jun 2019 <a
href="https://arxiv.org/pdf/1906.07374.pdf">arxiv</a></li>
</ul>
<h2 id="use-natural-language">Use Natural Language</h2>
<ul>
<li><a href="LEARN.md">Using Natural Language for Reward Shaping in
Reinforcement Learning</a> 31 May 2019 <a
href="https://www.cs.utexas.edu/~ai-lab/downloadPublication.php?filename=http://www.cs.utexas.edu/users/ml/papers/goyal.ijcai19.pdf&amp;pubid=127757">arxiv</a></li>
</ul>
<h2 id="generative-and-contrastive-representation-learning">Generative
and contrastive representation learning</h2>
<ul>
<li><a href="ST-DIM.md">Unsupervised State Representation Learning in
Atari</a> 19 Jun 2019 <a
href="https://arxiv.org/pdf/1906.08226.pdf">arxiv</a></li>
</ul>
<h2 id="belief">Belief</h2>
<ul>
<li><a href="GenerativeBelief.md">Shaping Belief States with Generative
Environment Models for RL</a> 24 Jun 2019 <a
href="https://arxiv.org/pdf/1906.09237v2.pdf">arxiv</a></li>
</ul>
<h2 id="pac">PAC</h2>
<ul>
<li><a href="COF-PAC.md">Provably Convergent Off-Policy Actor-Critic
with Function Approximation</a> 11 Nov 2019 <a
href="https://arxiv.org/pdf/1911.04384.pdf">arxiv</a></li>
</ul>
<h2 id="applications">Applications</h2>
<ul>
<li><a href="bdope.md">Benchmarks for Deep Off-Policy Evaluation</a> 30
Mar 2021 <a href="https://arxiv.org/pdf/2103.16596.pdf">arxiv</a></li>
<li><a href="Reciprocity.md">Learning Reciprocity in Complex Sequential
Social Dilemmas</a> 19 Mar 2019 <a
href="https://arxiv.org/pdf/1903.08082.pdf">arxiv</a></li>
<li><a href="dmimic.md">DeepMimic: Example-Guided Deep Reinforcement
Learning of Physics-Based Character Skills</a> 9 Apr 2018</li>
<li><a href="RLTUNER.md">TUNING RECURRENT NEURAL NETWORKS WITH
REINFORCEMENT LEARNING</a></li>
</ul>
<p><a href="https://github.com/tigerneil/awesome-deep-rl">deeprl.md
Github</a></p>