Files
awesome-awesomeness/terminal/deeprl7
2024-04-20 19:22:54 +02:00

37 KiB

Awesome Deep Reinforcement Learning
 
Mar 1 2024 update: HILP added

July 2022 update: EDDICT added

Mar 2022 update: a few papers released in early 2022

Dec 2021 update: Unsupervised RL
 
Introduction to awesome drl
Reinforcement learning is the fundamental framework for building AGI. Therefore we share important contributions within this awesome drl project.
 
Landscape of Deep RL
 
!updated Landscape of DRL (images/awesome-drl.png)
 
Content
- Awesome Deep Reinforcement Learning (#awesome-deep-reinforcement-learning)
- Introduction to awesome drl (#introduction-to-awesome-drl)
- Landscape of Deep RL (#landscape-of-deep-rl)
- Content (#content)
- General guidances (#general-guidances)
- 2022 (#2022)
- Foundations and theory (#foundations-and-theory)
- General benchmark frameworks (#general-benchmark-frameworks)
- Unsupervised (#unsupervised)
- Offline (#offline)
- Value based (#value-based)
- Policy gradient (#policy-gradient)
- Explorations (#explorations)
- Actor-Critic (#actor-critic)
- Model-based (#model-based)
- Model-free + Model-based (#model-free--model-based)
- Hierarchical (#hierarchical)
- Option (#option)
- Connection with other methods (#connection-with-other-methods)
- Connecting value and policy methods (#connecting-value-and-policy-methods)
- Reward design (#reward-design)
- Unifying (#unifying)
- Faster DRL (#faster-drl)
- Multi-agent (#multi-agent)
- New design (#new-design)
- Multitask (#multitask)
- Observational Learning (#observational-learning)
- Meta Learning (#meta-learning)
- Distributional (#distributional)
- Planning (#planning)
- Safety (#safety)
- Inverse RL (#inverse-rl)
- No reward RL (#no-reward-rl)
- Time (#time)
- Adversarial learning (#adversarial-learning)
- Use Natural Language (#use-natural-language)
- Generative and contrastive representation learning (#generative-and-contrastive-representation-learning)
- Belief (#belief)
- PAC (#pac)
- Applications (#applications)
 
Illustrations:
 
! (images/ACER.png)
 
Recommendations and suggestions are welcome.
General guidances
Awesome Offline RL (https://github.com/hanjuku-kaso/awesome-offline-rl)
Reinforcement Learning Today (http://reinforcementlearning.today/)
Multiagent Reinforcement Learning by Marc Lanctot RLSS @ Lille (http://mlanctot.info/files/papers/Lanctot_MARL_RLSS2019_Lille.pdf) 11 July 2019
RLDM 2019 Notes by David Abel (https://david-abel.github.io/notes/rldm_2019.pdf) 11 July 2019
A Survey of Reinforcement Learning Informed by Natural Language (RLNL.md) 10 Jun 2019 arxiv (https://arxiv.org/pdf/1906.03926.pdf)
Challenges of Real-World Reinforcement Learning (ChallengesRealWorldRL.md) 29 Apr 2019 arxiv (https://arxiv.org/pdf/1904.12901.pdf)
Ray Interference: a Source of Plateaus in Deep Reinforcement Learning (RayInterference.md) 25 Apr 2019 arxiv (https://arxiv.org/pdf/1904.11455.pdf)
Principles of Deep RL by David Silver (p10.md)
University AI's General introduction to deep rl (in Chinese) (https://www.jianshu.com/p/dfd987aa765a)
OpenAI's spinningup (https://spinningup.openai.com/en/latest/)
The Promise of Hierarchical Reinforcement Learning (https://thegradient.pub/the-promise-of-hierarchical-reinforcement-learning/) 9 Mar 2019
Deep Reinforcement Learning that Matters (reproducing.md) 30 Jan 2019 arxiv (https://arxiv.org/pdf/1709.06560.pdf)
 
2024
Foundation Policies with Hilbert Representations (HILP.md) arxiv (https://arxiv.org/abs/2402.15567) repo (https://github.com/seohongpark/HILP) 23 Feb 2024
 
2022
Reinforcement Learning with Action-Free Pre-Training from Videos arxiv (https://arxiv.org/abs/2203.13880) repo (https://github.com/younggyoseo/apv)
 
Generalist policies
Foundation Policies with Hilbert Representations (HILP.md) arxiv (https://arxiv.org/abs/2402.15567) repo (https://github.com/seohongpark/HILP) 23 Feb 2024
 
Foundations and theory
 
General non-linear Bellman equations (GNLBE.md) 9 July 2019 arxiv (https://arxiv.org/pdf/1907.07331.pdf)
Monte Carlo Gradient Estimation in Machine Learning (MCGE.md) 25 Jun 2019 arxiv (https://arxiv.org/pdf/1906.10652.pdf)
 
General benchmark frameworks
 
Brax (https://github.com/google/brax/)
 
! (https://github.com/google/brax/raw/main/docs/img/fetch.gif)
Android-Env (https://github.com/deepmind/android_env)
! (https://github.com/deepmind/android_env/raw/main/docs/images/device_control.gif)
MuJoCo (http://mujoco.org/) | MuJoCo Chinese version (https://github.com/tigerneil/mujoco-zh)
Unsupervised RL Benchmark (https://github.com/rll-research/url_benchmark)
Dataset for Offline RL (https://github.com/rail-berkeley/d4rl)
Spriteworld: a flexible, configurable python-based reinforcement learning environment (https://github.com/deepmind/spriteworld)
Chainerrl Visualizer (https://github.com/chainer/chainerrl-visualizer)
Behaviour Suite for Reinforcement Learning (BSRL.md) 13 Aug 2019 arxiv (https://arxiv.org/pdf/1908.03568.pdf) | code (https://github.com/deepmind/bsuite)
Quantifying Generalization in Reinforcement Learning (Coinrun.md) 20 Dec 2018 arxiv (https://arxiv.org/pdf/1812.02341.pdf)
S-RL Toolbox: Environments, Datasets and Evaluation Metrics for State Representation Learning (SRL.md) 25 Sept 2018
dopamine (https://github.com/google/dopamine)
StarCraft II (https://github.com/deepmind/pysc2)
tfrl (https://github.com/deepmind/trfl)
chainerrl (https://github.com/chainer/chainerrl)
PARL (https://github.com/PaddlePaddle/PARL)
DI-engine: a generalized decision intelligence engine. It supports various Deep RL algorithms (https://github.com/opendilab/DI-engine)
PPO x Family: Course in Chinese for Deep RL (https://github.com/opendilab/PPOxFamily)
 
Unsupervised
 
URLB: Unsupervised Reinforcement Learning Benchmark (https://arxiv.org/abs/2110.15191) 28 Oct 2021
APS: Active Pretraining with Successor Feature (https://arxiv.org/abs/2108.13956) 31 Aug 2021
Behavior From the Void: Unsupervised Active Pre-Training (https://arxiv.org/abs/2103.04551) 8 Mar 2021
Reinforcement Learning with Prototypical Representations (https://arxiv.org/abs/2102.11271) 22 Feb 2021
Efficient Exploration via State Marginal Matching (https://arxiv.org/abs/1906.05274) 12 Jun 2019
Self-Supervised Exploration via Disagreement (https://arxiv.org/abs/1906.04161) 10 Jun 2019
Exploration by Random Network Distillation (https://arxiv.org/abs/1810.12894) 30 Oct 2018
Diversity is All You Need: Learning Skills without a Reward Function (https://arxiv.org/abs/1802.06070) 16 Feb 2018
Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/pdf/1705.05363) 15 May 2017
 
Offline
PerSim: Data-efficient Offline Reinforcement Learning with Heterogeneous Agents via Personalized Simulators (https://arxiv.org/abs/2102.06961) 10 Nov 2021
A General Offline Reinforcement Learning Framework for Interactive Recommendation () AAAI 2021
 
 
Value based
 
Harnessing Structures for Value-Based Planning and Reinforcement Learning (SVRL.md) 5 Feb 2020 arxiv (https://arxiv.org/abs/1909.12255) | code (https://github.com/YyzHarry/SV-RL)
Recurrent Value Functions (RVF.md) 23 May 2019 arxiv (https://arxiv.org/pdf/1905.09562.pdf)
Stochastic Lipschitz Q-Learning (LipschitzQ.md) 24 Apr 2019 arxiv (https://arxiv.org/pdf/1904.10653.pdf)
TreeQN and ATreeC: Differentiable Tree-Structured Models for Deep Reinforcement Learning (https://arxiv.org/pdf/1710.11417) 8 Mar 2018
DISTRIBUTED PRIORITIZED EXPERIENCE REPLAY (https://arxiv.org/pdf/1803.00933.pdf) 2 Mar 2018
Rainbow: Combining Improvements in Deep Reinforcement Learning (Rainbow.md) 6 Oct 2017
Learning from Demonstrations for Real World Reinforcement Learning (DQfD.md) 12 Apr 2017
Dueling Network Architecture (Dueling.md)
Double DQN (DDQN.md)
Prioritized Experience (PER.md)
Deep Q-Networks (DQN.md)
 
Policy gradient
 
Phasic Policy Gradient (PPG.md) 9 Sep 2020 arxiv (https://arxiv.org/pdf/2009.04416.pdf) code (https://github.com/openai/phasic-policy-gradient)
An operator view of policy gradient methods (OVPG.md) 22 Jun 2020 arxiv (https://arxiv.org/pdf/2006.11266.pdf)
Direct Policy Gradients: Direct Optimization of Policies in Discrete Action Spaces (DirPG.md) 14 Jun 2019 arxiv (https://arxiv.org/pdf/1906.06062.pdf)
Policy Gradient Search: Online Planning and Expert Iteration without Search Trees (PGS.md) 7 Apr 2019 arxiv (https://arxiv.org/pdf/1904.03646.pdf)
SUPERVISED POLICY UPDATE FOR DEEP REINFORCEMENT LEARNING (SPU.md) 24 Dec 2018 arxiv (https://arxiv.org/pdf/1805.11706v4.pdf)
PPO-CMA: Proximal Policy Optimization with Covariance Matrix Adaptation (PPO-CMA.md) 5 Oct 2018 arxiv (https://arxiv.org/pdf/1810.02541v6.pdf)
Clipped Action Policy Gradient (CAPG.md) 22 June 2018
Expected Policy Gradients for Reinforcement Learning (EPG.md) 10 Jan 2018
Proximal Policy Optimization Algorithms (PPO.md) 20 July 2017
Emergence of Locomotion Behaviours in Rich Environments (DPPO.md) 7 July 2017
Interpolated Policy Gradient: Merging On-Policy and Off-Policy Gradient Estimation for Deep Reinforcement Learning (IPG.md) 1 Jun 2017
Equivalence Between Policy Gradients and Soft Q-Learning (PGSQL.md)
Trust Region Policy Optimization (TRPO.md)
Reinforcement Learning with Deep Energy-Based Policies (DEBP.md)
Q-PROP: SAMPLE-EFFICIENT POLICY GRADIENT WITH AN OFF-POLICY CRITIC (QPROP.md)
 
Explorations
 
Entropic Desired Dynamics for Intrinsic Control (EDDICT.md) 2021 openreview (https://openreview.net/pdf?id=lBSSxTgXmiK)
Self-Supervised Exploration via Disagreement (Disagreement.md) 10 Jun 2019 arxiv (https://arxiv.org/pdf/1906.04161.pdf)
Approximate Exploration through State Abstraction (MBIE-EB.md) 24 Jan 2019
The Uncertainty Bellman Equation and Exploration (UBE.md) 15 Sep 2017
Noisy Networks for Exploration (NoisyNet.md) 30 Jun 2017 implementation (https://github.com/Kaixhin/NoisyNet-A3C)
Count-Based Exploration in Feature Space for Reinforcement Learning (PhiEB.md) 25 Jun 2017
Count-Based Exploration with Neural Density Models (NDM.md) 14 Jun 2017
UCB and InfoGain Exploration via Q-Ensembles (QEnsemble.md) 11 Jun 2017
Minimax Regret Bounds for Reinforcement Learning (MMRB.md) 16 Mar 2017
Incentivizing Exploration In Reinforcement Learning With Deep Predictive Models (incentivizing.md)
EX2: Exploration with Exemplar Models for Deep Reinforcement Learning (EX2.md)
 
Actor-Critic
 
Generalized Off-Policy Actor-Critic (Geoff-PAC.md) 27 Mar 2019
Soft Actor-Critic Algorithms and Applications (https://arxiv.org/pdf/1812.05905.pdf) 29 Jan 2019
The Reactor: A Sample-Efficient Actor-Critic Architecture (REACTOR.md) 15 Apr 2017
SAMPLE EFFICIENT ACTOR-CRITIC WITH EXPERIENCE REPLAY (ACER.md)
REINFORCEMENT LEARNING WITH UNSUPERVISED AUXILIARY TASKS (UNREAL.md)
Continuous control with deep reinforcement learning (DDPG.md)
 
Model-based

Self-Consistent Models and Values (sc.md) 25 Oct 2021 arxiv (https://arxiv.org/pdf/2110.12840.pdf)
When to use parametric models in reinforcement learning? (parametric.md) 12 Jun 2019 arxiv (https://arxiv.org/pdf/1906.05243.pdf)
Model Based Reinforcement Learning for Atari (https://arxiv.org/pdf/1903.00374.pdf) 5 Mar 2019
Model-Based Stabilisation of Deep Reinforcement Learning (MBDQN.md) 6 Sep 2018
Learning model-based planning from scratch (IBP.md) 19 July 2017
 
Model-free + Model-based
 
Imagination-Augmented Agents for Deep Reinforcement Learning (I2As.md) 19 July 2017
 
Hierarchical
 
WHY DOES HIERARCHY (SOMETIMES) WORK SO WELL IN REINFORCEMENT LEARNING? (HIRO.md) 23 Sep 2019 arxiv (https://arxiv.org/pdf/1909.10618.pdf)
Language as an Abstraction for Hierarchical Deep Reinforcement Learning (HAL.md) 18 Jun 2019 arxiv (https://arxiv.org/pdf/1906.07343.pdf)
 
Option
 
Variational Option Discovery Algorithms (VALOR.md) 26 July 2018
A Laplacian Framework for Option Discovery in Reinforcement Learning (LFOD.md) 16 Jun 2017
 
Connection with other methods
 
Robust Imitation of Diverse Behaviors (GVG.md)
Learning human behaviors from motion capture by adversarial imitation (GAIL.md)
Connecting Generative Adversarial Networks and Actor-Critic Methods (GANAC.md)
 
Connecting value and policy methods
 
Bridging the Gap Between Value and Policy Based Reinforcement Learning (PCL.md)
Policy gradient and Q-learning (PGQ.md)
 
Reward design
 
End-to-End Robotic Reinforcement Learning without Reward Engineering (VICE.md) 16 Apr 2019 arxiv (https://arxiv.org/pdf/1904.07854.pdf)
Reinforcement Learning with Corrupted Reward Channel (RLCRC.md) 23 May 2017
 
Unifying
 
Multi-step Reinforcement Learning: A Unifying Algorithm (MSRL.md)
 
Faster DRL
 
Neural Episodic Control (NEC.md)
 
Multi-agent
 
No Press Diplomacy: Modeling Multi-Agent Gameplay (Dip.md) 4 Sep 2019 arxiv (https://arxiv.org/pdf/1909.02128.pdf)
Options as responses: Grounding behavioural hierarchies in multi-agent RL (OPRE) 6 Jun 2019 arxiv (https://arxiv.org/pdf/1906.01470.pdf)
Evolutionary Reinforcement Learning for Sample-Efficient Multiagent Coordination (MERL.md) 18 Jun 2019 arxiv (https://arxiv.org/pdf/1906.07315.pdf)
A Regularized Opponent Model with Maximum Entropy Objective (ROMMEO.md) 17 May 2019 arxiv (https://arxiv.org/pdf/1905.08087.pdf)
Deep Q-Learning for Nash Equilibria: Nash-DQN (NashDQN.md) 23 Apr 2019 arxiv (https://arxiv.org/pdf/1904.10554.pdf)
Malthusian Reinforcement Learning (MRL.md) 3 Mar 2019 arxiv (https://arxiv.org/pdf/1812.07019.pdf)
Bayesian Action Decoder for Deep Multi-Agent Reinforcement Learning (bad.md) 4 Nov 2018
INTRINSIC SOCIAL MOTIVATION VIA CAUSAL INFLUENCE IN MULTI-AGENT RL (ISMCI.md) 19 Oct 2018
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning (http://www.cs.ox.ac.uk/people/shimon.whiteson/pubs/rashidicml18.pdf) 30 Mar 2018
Modeling Others using Oneself in Multi-Agent Reinforcement Learning (SOM.md) 26 Feb 2018
The Mechanics of n-Player Differentiable Games (SGA.md) 15 Feb 2018
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments (RoboSumo.md) 10 Oct 2017
Learning with Opponent-Learning Awareness (LOLA.md) 13 Sep 2017
Counterfactual Multi-Agent Policy Gradients (COMA.md)
Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments (MADDPG.md) 7 Jun 2017
Multiagent Bidirectionally-Coordinated Nets for Learning to Play StarCraft Combat Games (BiCNet.md) 29 Mar 2017
 
New design
 
IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures (https://arxiv.org/pdf/1802.01561.pdf) 9 Feb 2018
Reverse Curriculum Generation for Reinforcement Learning (RECUR.md)
Trial without Error: Towards Safe Reinforcement Learning via Human Intervention (HIRL.md)
Learning to Design Games: Strategic Environments in Deep Reinforcement Learning (DualMDP.md) 5 July 2017
 
Multitask
 
Kickstarting Deep Reinforcement Learning (https://arxiv.org/pdf/1803.03835.pdf) 10 Mar 2018
Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning (ZSTG.md) 7 Nov 2017
Distral: Robust Multitask Reinforcement Learning (Distral.md) 13 July 2017
 
Observational Learning
 
Observational Learning by Reinforcement Learning (OLRL.md) 20 Jun 2017
 
Meta Learning
 
Discovery of Useful Questions as Auxiliary Tasks (GVF.md) 10 Sep 2019 arxiv (https://arxiv.org/pdf/1909.04607.pdf)
Meta-learning of Sequential Strategies (MetaSS.md) 8 May 2019 arxiv (https://arxiv.org/pdf/1905.03030.pdf)
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables (PEARL.md) 19 Mar 2019 arxiv (https://arxiv.org/pdf/1903.08254.pdf)
Some Considerations on Learning to Explore via Meta-Reinforcement Learning (E2.md) 11 Jan 2019 arxiv (https://arxiv.org/pdf/1803.01118.pdf)
Meta-Gradient Reinforcement Learning (MGRL.md) 24 May 2018 arxiv (https://arxiv.org/pdf/1805.09801.pdf)
ProMP: Proximal Meta-Policy Search (ProMP.md) 16 Oct 2018 arxiv (https://arxiv.org/pdf/1810.06784)
Unsupervised Meta-Learning for Reinforcement Learning (UML.md) 12 Jun 2018
 
Distributional
 
GAN Q-learning (GANQL.md) 20 July 2018
Implicit Quantile Networks for Distributional Reinforcement Learning (IQN.md) 14 Jun 2018
Nonlinear Distributional Gradient Temporal-Difference Learning (GTD.md) 20 May 2018
DISTRIBUTED DISTRIBUTIONAL DETERMINISTIC POLICY GRADIENTS (D4PG.md) 23 Apr 2018
An Analysis of Categorical Distributional Reinforcement Learning (C51-analysis.md) 22 Feb 2018
Distributional Reinforcement Learning with Quantile Regression (QR-DQN.md) 27 Oct 2017
A Distributional Perspective on Reinforcement Learning (C51.md) 21 July 2017
 
Planning
 
Search on the Replay Buffer: Bridging Planning and Reinforcement Learning (SoRB.md) 12 June 2019 arxiv (https://arxiv.org/pdf/1906.05253.pdf)
 
Safety
 
Robust Reinforcement Learning for Continuous Control with Model Misspecification (MPO.md) 18 Jun 2019 arxiv (https://arxiv.org/pdf/1906.07516.pdf)
Verifiable Reinforcement Learning via Policy Extraction (Viper.md) 22 May 2018 arxiv (https://arxiv.org/pdf/1805.08328.pdf)
 
Inverse RL
 
ADDRESSING SAMPLE INEFFICIENCY AND REWARD BIAS IN INVERSE REINFORCEMENT LEARNING (OP-GAIL.md) 9 Sep 2018
 
No reward RL
 
Fast Task Inference with Variational Intrinsic Successor Features (VISR.md) 2 Jun 2019 arxiv (https://arxiv.org/pdf/1906.05030.pdf)
Curiosity-driven Exploration by Self-supervised Prediction (https://arxiv.org/pdf/1705.05363) 15 May 2017
 
Time
 
Interval timing in deep reinforcement learning agents (Intervaltime.md) 31 May 2019 arxiv (https://arxiv.org/pdf/1905.13469.pdf)
Time Limits in Reinforcement Learning (PEB.md)
 
Adversarial learning
 
Sample-efficient Adversarial Imitation Learning from Observation (LQR+GAIfO.md) 18 Jun 2019 arxiv (https://arxiv.org/pdf/1906.07374.pdf)
 
Use Natural Language
 
Using Natural Language for Reward Shaping in Reinforcement Learning (LEARN.md) 31 May 2019 arxiv
(https://www.cs.utexas.edu/~ai-lab/downloadPublication.php?filename=http://www.cs.utexas.edu/users/ml/papers/goyal.ijcai19.pdf&pubid=127757)
 
Generative and contrastive representation learning
 
Unsupervised State Representation Learning in Atari (ST-DIM.md) 19 Jun 2019 arxiv (https://arxiv.org/pdf/1906.08226.pdf)
 
Belief
 
Shaping Belief States with Generative Environment Models for RL (GenerativeBelief.md) 24 Jun 2019 arxiv (https://arxiv.org/pdf/1906.09237v2.pdf)
 
PAC
Provably Convergent Off-Policy Actor-Critic with Function Approximation (COF-PAC.md) 11 Nov 2019 arxiv (https://arxiv.org/pdf/1911.04384.pdf)
 
 
Applications
Benchmarks for Deep Off-Policy Evaluation (bdope.md) 30 Mar 2021 arxiv (https://arxiv.org/pdf/2103.16596.pdf)
Learning Reciprocity in Complex Sequential Social Dilemmas (Reciprocity.md) 19 Mar 2019 arxiv (https://arxiv.org/pdf/1903.08082.pdf)
DeepMimic: Example-Guided Deep Reinforcement Learning of Physics-Based Character Skills (dmimic.md) 9 Apr 2018
TUNING RECURRENT NEURAL NETWORKS WITH REINFORCEMENT LEARNING (RLTUNER.md)