# Awesome Deep Reinforcement Learning > **Mar 1 2024 update: HILP added** > > **July 2022 update: EDDICT added** > > **Mar 2022 update: a few papers released in early 2022** > > **Dec 2021 update: Unsupervised RL** ## Introduction to awesome drl Reinforcement learning is the fundamental framework for building AGI. Therefore we share important contributions within this awesome drl project. ## Landscape of Deep RL ![updated Landscape of **DRL**](images/awesome-drl.png) ## Content - [Awesome Deep Reinforcement Learning](#awesome-deep-reinforcement-learning) - [Introduction to awesome drl](#introduction-to-awesome-drl) - [Landscape of Deep RL](#landscape-of-deep-rl) - [Content](#content) - [General guidances](#general-guidances) - [2022](#2022) - [Foundations and theory](#foundations-and-theory) - [General benchmark frameworks](#general-benchmark-frameworks) - [Unsupervised](#unsupervised) - [Offline](#offline) - [Value based](#value-based) - [Policy gradient](#policy-gradient) - [Explorations](#explorations) - [Actor-Critic](#actor-critic) - [Model-based](#model-based) - [Model-free + Model-based](#model-free--model-based) - [Hierarchical](#hierarchical) - [Option](#option) - [Connection with other methods](#connection-with-other-methods) - [Connecting value and policy methods](#connecting-value-and-policy-methods) - [Reward design](#reward-design) - [Unifying](#unifying) - [Faster DRL](#faster-drl) - [Multi-agent](#multi-agent) - [New design](#new-design) - [Multitask](#multitask) - [Observational Learning](#observational-learning) - [Meta Learning](#meta-learning) - [Distributional](#distributional) - [Planning](#planning) - [Safety](#safety) - [Inverse RL](#inverse-rl) - [No reward RL](#no-reward-rl) - [Time](#time) - [Adversarial learning](#adversarial-learning) - [Use Natural Language](#use-natural-language) - [Generative and contrastive representation learning](#generative-and-contrastive-representation-learning) - [Belief](#belief) - [PAC](#pac) - [Applications](#applications) Illustrations: ![](images/ACER.png) **Recommendations and suggestions are welcome**. ## General guidances * [Awesome Offline RL](https://github.com/hanjuku-kaso/awesome-offline-rl) * [Reinforcement Learning Today](http://reinforcementlearning.today/) * [Multiagent Reinforcement Learning by Marc Lanctot RLSS @ Lille](http://mlanctot.info/files/papers/Lanctot_MARL_RLSS2019_Lille.pdf) 11 July 2019 * [RLDM 2019 Notes by David Abel](https://david-abel.github.io/notes/rldm_2019.pdf) 11 July 2019 * [A Survey of Reinforcement Learning Informed by Natural Language](RLNL.md) 10 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.03926.pdf) * [Challenges of Real-World Reinforcement Learning](ChallengesRealWorldRL.md) 29 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.12901.pdf) * [Ray Interference: a Source of Plateaus in Deep Reinforcement Learning](RayInterference.md) 25 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.11455.pdf) * [Principles of Deep RL by David Silver](p10.md) * [University AI's General introduction to deep rl (in Chinese)](https://www.jianshu.com/p/dfd987aa765a) * [OpenAI's spinningup](https://spinningup.openai.com/en/latest/) * [The Promise of Hierarchical Reinforcement Learning](https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/) 9 Mar 2019 * [Deep Reinforcement Learning that Matters](reproducing.md) 30 Jan 2019 [arxiv](https://arxiv.org/pdf/1709.06560.pdf) ## 2024 * [Foundation Policies with Hilbert Representations](HILP.md) [arxiv](https://arxiv.org/abs/2402.15567) [repo](https://github.com/seohongpark/HILP) 23 Feb 2024 ## 2022 * Reinforcement Learning with Action-Free Pre-Training from Videos [arxiv](https://arxiv.org/abs/2203.13880) [repo](https://github.com/younggyoseo/apv) ## Generalist policies * [Foundation Policies with Hilbert Representations](HILP.md) [arxiv](https://arxiv.org/abs/2402.15567) [repo](https://github.com/seohongpark/HILP) 23 Feb 2024 ## Foundations and theory * [General non-linear Bellman equations](GNLBE.md) 9 July 2019 [arxiv](https://arxiv.org/pdf/1907.07331.pdf) * [Monte Carlo Gradient Estimation in Machine Learning](MCGE.md) 25 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.10652.pdf) ## General benchmark frameworks * [Brax](https://github.com/google/brax/) BRAX ![](https://github.com/google/brax/raw/main/docs/img/fetch.gif) * [Android-Env](https://github.com/deepmind/android_env) * ![](https://github.com/deepmind/android_env/raw/main/docs/images/device_control.gif) * [MuJoCo](http://mujoco.org/) | [MuJoCo Chinese version](https://github.com/tigerneil/mujoco-zh) * [Unsupervised RL Benchmark](https://github.com/rll-research/url_benchmark) * [Dataset for Offline RL](https://github.com/rail-berkeley/d4rl) * [Spriteworld: a flexible, configurable python-based reinforcement learning environment](https://github.com/deepmind/spriteworld) * [Chainerrl Visualizer](https://github.com/chainer/chainerrl-visualizer) * [Behaviour Suite for Reinforcement Learning](BSRL.md) 13 Aug 2019 [arxiv](https://arxiv.org/pdf/1908.03568.pdf) | [code](https://github.com/deepmind/bsuite) * [Quantifying Generalization in Reinforcement Learning](Coinrun.md) 20 Dec 2018 [arxiv](https://arxiv.org/pdf/1812.02341.pdf) * [S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning](SRL.md) 25 Sept 2018 * [dopamine](https://github.com/google/dopamine) * [StarCraft II](https://github.com/deepmind/pysc2) * [tfrl](https://github.com/deepmind/trfl) * [chainerrl](https://github.com/chainer/chainerrl) * [PARL](https://github.com/PaddlePaddle/PARL) * [DI-engine: a generalized decision intelligence engine. It supports various Deep RL algorithms](https://github.com/opendilab/DI-engine) * [PPO x Family: Course in Chinese for Deep RL](https://github.com/opendilab/PPOxFamily) ## Unsupervised * [URLB: Unsupervised Reinforcement Learning Benchmark](https://arxiv.org/abs/2110.15191) 28 Oct 2021 * [APS: Active Pretraining with Successor Feature](https://arxiv.org/abs/2108.13956) 31 Aug 2021 * [Behavior From the Void: Unsupervised Active Pre-Training](https://arxiv.org/abs/2103.04551) 8 Mar 2021 * [Reinforcement Learning with Prototypical Representations](https://arxiv.org/abs/2102.11271) 22 Feb 2021 * [Efficient Exploration via State Marginal Matching](https://arxiv.org/abs/1906.05274) 12 Jun 2019 * [Self-Supervised Exploration via Disagreement](https://arxiv.org/abs/1906.04161) 10 Jun 2019 * [Exploration by Random Network Distillation](https://arxiv.org/abs/1810.12894) 30 Oct 2018 * [Diversity is All You Need: Learning Skills without a Reward Function](https://arxiv.org/abs/1802.06070) 16 Feb 2018 * [Curiosity-driven Exploration by Self-supervised Prediction](https://arxiv.org/pdf/1705.05363) 15 May 2017 ## Offline * [PerSim: Data-efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators](https://arxiv.org/abs/2102.06961) 10 Nov 2021 * [A General Offline Reinforcement Learning Framework for Interactive Recommendation]() AAAI 2021 ## Value based * [Harnessing Structures for Value-Based Planning and Reinforcement Learning](SVRL.md) 5 Feb 2020 [arxiv](https://arxiv.org/abs/1909.12255) | [code](https://github.com/YyzHarry/SV-RL) * [Recurrent Value Functions](RVF.md) 23 May 2019 [arxiv](https://arxiv.org/pdf/1905.09562.pdf) * [Stochastic Lipschitz Q-Learning](LipschitzQ.md) 24 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.10653.pdf) * [TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning](https://arxiv.org/pdf/1710.11417) 8 Mar 2018 * [DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY](https://arxiv.org/pdf/1803.00933.pdf) 2 Mar 2018 * [Rainbow: Combining Improvements in Deep Reinforcement Learning](Rainbow.md) 6 Oct 2017 * [Learning from Demonstrations for Real World Reinforcement Learning](DQfD.md) 12 Apr 2017 * [Dueling Network Architecture](Dueling.md) * [Double DQN](DDQN.md) * [Prioritized Experience](PER.md) * [Deep Q-Networks](DQN.md) ## Policy gradient * [Phasic Policy Gradient](PPG.md) 9 Sep 2020 [arxiv](https://arxiv.org/pdf/2009.04416.pdf) [code](https://github.com/openai/phasic-policy-gradient) * [An operator view of policy gradient methods](OVPG.md) 22 Jun 2020 [arxiv](https://arxiv.org/pdf/2006.11266.pdf) * [Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces](DirPG.md) 14 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.06062.pdf) * [Policy Gradient Search: Online Planning and Expert Iteration without Search Trees](PGS.md) 7 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.03646.pdf) * [SUPERVISED POLICY UPDATE FOR DEEP REINFORCEMENT LEARNING](SPU.md) 24 Dec 2018 [arxiv](https://arxiv.org/pdf/1805.11706v4.pdf) * [PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation](PPO-CMA.md) 5 Oct 2018 [arxiv](https://arxiv.org/pdf/1810.02541v6.pdf) * [Clipped Action Policy Gradient](CAPG.md) 22 June 2018 * [Expected Policy Gradients for Reinforcement Learning](EPG.md) 10 Jan 2018 * [Proximal Policy Optimization Algorithms](PPO.md) 20 July 2017 * [Emergence of Locomotion Behaviours in Rich Environments](DPPO.md) 7 July 2017 * [Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning](IPG.md) 1 Jun 2017 * [Equivalence Between Policy Gradients and Soft Q-Learning](PGSQL.md) * [Trust Region Policy Optimization](TRPO.md) * [Reinforcement Learning with Deep Energy-Based Policies](DEBP.md) * [Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC](QPROP.md) ## Explorations * [Entropic Desired Dynamics for Intrinsic Control](EDDICT.md) 2021 [openreview](https://openreview.net/pdf?id=lBSSxTgXmiK) * [Self-Supervised Exploration via Disagreement](Disagreement.md) 10 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.04161.pdf) * [Approximate Exploration through State Abstraction](MBIE-EB.md) 24 Jan 2019 * [The Uncertainty Bellman Equation and Exploration](UBE.md) 15 Sep 2017 * [Noisy Networks for Exploration](NoisyNet.md) 30 Jun 2017 [implementation](https://github.com/Kaixhin/NoisyNet-A3C) * [Count-Based Exploration in Feature Space for Reinforcement Learning](PhiEB.md) 25 Jun 2017 * [Count-Based Exploration with Neural Density Models](NDM.md) 14 Jun 2017 * [UCB and InfoGain Exploration via Q-Ensembles](QEnsemble.md) 11 Jun 2017 * [Minimax Regret Bounds for Reinforcement Learning](MMRB.md) 16 Mar 2017 * [Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models](incentivizing.md) * [EX2: Exploration with Exemplar Models for Deep Reinforcement Learning](EX2.md) ## Actor-Critic * [Generalized Off-Policy Actor-Critic](Geoff-PAC.md) 27 Mar 2019 * [Soft Actor-Critic Algorithms and Applications](https://arxiv.org/pdf/1812.05905.pdf) 29 Jan 2019 * [The Reactor: A Sample-Efficient Actor-Critic Architecture](REACTOR.md) 15 Apr 2017 * [SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY](ACER.md) * [REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS](UNREAL.md) * [Continuous control with deep reinforcement learning](DDPG.md) ## Model-based * [Self-Consistent Models and Values](sc.md) 25 Oct 2021 [arxiv](https://arxiv.org/pdf/2110.12840.pdf) * [When to use parametric models in reinforcement learning?](parametric.md) 12 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.05243.pdf) * [Model Based Reinforcement Learning for Atari](https://arxiv.org/pdf/1903.00374.pdf) 5 Mar 2019 * [Model-Based Stabilisation of Deep Reinforcement Learning](MBDQN.md) 6 Sep 2018 * [Learning model-based planning from scratch](IBP.md) 19 July 2017 ## Model-free + Model-based * [Imagination-Augmented Agents for Deep Reinforcement Learning](I2As.md) 19 July 2017 ## Hierarchical * [WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING?](HIRO.md) 23 Sep 2019 [arxiv](https://arxiv.org/pdf/1909.10618.pdf) * [Language as an Abstraction for Hierarchical Deep Reinforcement Learning](HAL.md) 18 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.07343.pdf) ## Option * [Variational Option Discovery Algorithms](VALOR.md) 26 July 2018 * [A Laplacian Framework for Option Discovery in Reinforcement Learning](LFOD.md) 16 Jun 2017 ## Connection with other methods * [Robust Imitation of Diverse Behaviors](GVG.md) * [Learning human behaviors from motion capture by adversarial imitation](GAIL.md) * [Connecting Generative Adversarial Networks and Actor-Critic Methods](GANAC.md) ## Connecting value and policy methods * [Bridging the Gap Between Value and Policy Based Reinforcement Learning](PCL.md) * [Policy gradient and Q-learning](PGQ.md) ## Reward design * [End-to-End Robotic Reinforcement Learning without Reward Engineering](VICE.md) 16 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.07854.pdf) * [Reinforcement Learning with Corrupted Reward Channel](RLCRC.md) 23 May 2017 ## Unifying * [Multi-step Reinforcement Learning: A Unifying Algorithm](MSRL.md) ## Faster DRL * [Neural Episodic Control](NEC.md) ## Multi-agent * [No Press Diplomacy: Modeling Multi-Agent Gameplay](Dip.md) 4 Sep 2019 [arxiv](https://arxiv.org/pdf/1909.02128.pdf) * [Options as responses: Grounding behavioural hierarchies in multi-agent RL](OPRE) 6 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.01470.pdf) * [Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination](MERL.md) 18 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.07315.pdf) * [A Regularized Opponent Model with Maximum Entropy Objective](ROMMEO.md) 17 May 2019 [arxiv](https://arxiv.org/pdf/1905.08087.pdf) * [Deep Q-Learning for Nash Equilibria: Nash-DQN](NashDQN.md) 23 Apr 2019 [arxiv](https://arxiv.org/pdf/1904.10554.pdf) * [Malthusian Reinforcement Learning](MRL.md) 3 Mar 2019 [arxiv](https://arxiv.org/pdf/1812.07019.pdf) * [Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning](bad.md) 4 Nov 2018 * [INTRINSIC SOCIAL MOTIVATION VIA CAUSAL INFLUENCE IN MULTI-AGENT RL](ISMCI.md) 19 Oct 2018 * [QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning](http://www.cs.ox.ac.uk/people/shimon.whiteson/pubs/rashidicml18.pdf) 30 Mar 2018 * [Modeling Others using Oneself in Multi-Agent Reinforcement Learning](SOM.md) 26 Feb 2018 * [The Mechanics of n-Player Differentiable Games](SGA.md) 15 Feb 2018 * [Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments](RoboSumo.md) 10 Oct 2017 * [Learning with Opponent-Learning Awareness](LOLA.md) 13 Sep 2017 * [Counterfactual Multi-Agent Policy Gradients](COMA.md) * [Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments](MADDPG.md) 7 Jun 2017 * [Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games](BiCNet.md) 29 Mar 2017 ## New design * [IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures](https://arxiv.org/pdf/1802.01561.pdf) 9 Feb 2018 * [Reverse Curriculum Generation for Reinforcement Learning](RECUR.md) * [Trial without Error: Towards Safe Reinforcement Learning via Human Intervention](HIRL.md) * [Learning to Design Games: Strategic Environments in Deep Reinforcement Learning](DualMDP.md) 5 July 2017 ## Multitask * [Kickstarting Deep Reinforcement Learning](https://arxiv.org/pdf/1803.03835.pdf) 10 Mar 2018 * [Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning](ZSTG.md) 7 Nov 2017 * [Distral: Robust Multitask Reinforcement Learning](Distral.md) 13 July 2017 ## Observational Learning * [Observational Learning by Reinforcement Learning](OLRL.md) 20 Jun 2017 ## Meta Learning * [Discovery of Useful Questions as Auxiliary Tasks](GVF.md) 10 Sep 2019 [arxiv](https://arxiv.org/pdf/1909.04607.pdf) * [Meta-learning of Sequential Strategies](MetaSS.md) 8 May 2019 [arxiv](https://arxiv.org/pdf/1905.03030.pdf) * [Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables](PEARL.md) 19 Mar 2019 [arxiv](https://arxiv.org/pdf/1903.08254.pdf) * [Some Considerations on Learning to Explore via Meta-Reinforcement Learning](E2.md) 11 Jan 2019 [arxiv](https://arxiv.org/pdf/1803.01118.pdf) * [Meta-Gradient Reinforcement Learning](MGRL.md) 24 May 2018 [arxiv](https://arxiv.org/pdf/1805.09801.pdf) * [ProMP: Proximal Meta-Policy Search](ProMP.md) 16 Oct 2018 [arxiv](https://arxiv.org/pdf/1810.06784) * [Unsupervised Meta-Learning for Reinforcement Learning](UML.md) 12 Jun 2018 ## Distributional * [GAN Q-learning](GANQL.md) 20 July 2018 * [Implicit Quantile Networks for Distributional Reinforcement Learning](IQN.md) 14 Jun 2018 * [Nonlinear Distributional Gradient Temporal-Difference Learning](GTD.md) 20 May 2018 * [DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS](D4PG.md) 23 Apr 2018 * [An Analysis of Categorical Distributional Reinforcement Learning](C51-analysis.md) 22 Feb 2018 * [Distributional Reinforcement Learning with Quantile Regression](QR-DQN.md) 27 Oct 2017 * [A Distributional Perspective on Reinforcement Learning](C51.md) 21 July 2017 ## Planning * [Search on the Replay Buffer: Bridging Planning and Reinforcement Learning](SoRB.md) 12 June 2019 [arxiv](https://arxiv.org/pdf/1906.05253.pdf) ## Safety * [Robust Reinforcement Learning for Continuous Control with Model Misspecification](MPO.md) 18 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.07516.pdf) * [Verifiable Reinforcement Learning via Policy Extraction](Viper.md) 22 May 2018 [arxiv](https://arxiv.org/pdf/1805.08328.pdf) ## Inverse RL * [ADDRESSING SAMPLE INEFFICIENCY AND REWARD BIAS IN INVERSE REINFORCEMENT LEARNING](OP-GAIL.md) 9 Sep 2018 ## No reward RL * [Fast Task Inference with Variational Intrinsic Successor Features](VISR.md) 2 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.05030.pdf) * [Curiosity-driven Exploration by Self-supervised Prediction](https://arxiv.org/pdf/1705.05363) 15 May 2017 ## Time * [Interval timing in deep reinforcement learning agents](Intervaltime.md) 31 May 2019 [arxiv](https://arxiv.org/pdf/1905.13469.pdf) * [Time Limits in Reinforcement Learning](PEB.md) ## Adversarial learning * [Sample-efficient Adversarial Imitation Learning from Observation](LQR+GAIfO.md) 18 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.07374.pdf) ## Use Natural Language * [Using Natural Language for Reward Shaping in Reinforcement Learning](LEARN.md) 31 May 2019 [arxiv](https://www.cs.utexas.edu/~ai-lab/downloadPublication.php?filename=http://www.cs.utexas.edu/users/ml/papers/goyal.ijcai19.pdf&pubid=127757) ## Generative and contrastive representation learning * [Unsupervised State Representation Learning in Atari](ST-DIM.md) 19 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.08226.pdf) ## Belief * [Shaping Belief States with Generative Environment Models for RL](GenerativeBelief.md) 24 Jun 2019 [arxiv](https://arxiv.org/pdf/1906.09237v2.pdf) ## PAC * [Provably Convergent Off-Policy Actor-Critic with Function Approximation](COF-PAC.md) 11 Nov 2019 [arxiv](https://arxiv.org/pdf/1911.04384.pdf) ## Applications * [Benchmarks for Deep Off-Policy Evaluation](bdope.md) 30 Mar 2021 [arxiv](https://arxiv.org/pdf/2103.16596.pdf) * [Learning Reciprocity in Complex Sequential Social Dilemmas](Reciprocity.md) 19 Mar 2019 [arxiv](https://arxiv.org/pdf/1903.08082.pdf) * [DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills](dmimic.md) 9 Apr 2018 * [TUNING RECURRENT NEURAL NETWORKS WITH REINFORCEMENT LEARNING](RLTUNER.md)