Referências e ferramentas

Nesta página estão listadas as principais referências e ferramentas utilizadas nesta disciplina.

Livros

Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2^nd ed.). The MIT Press.
Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2024.
Mitchell, T. (1997). Reinforcement Learning in Machine Learning. McGraw-Hill.
NORVIG, P.; RUSSELL, S., Inteligência Artificial, 3ª ed., Campus Elsevier, 2013

Artigos

Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep reinforcement learning that matters. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI'18/IAAI'18/EAAI'18). AAAI Press, Article 392, 3207–3214.
Dohare, S., Hernandez-Garcia, J.F., Lan, Q. et al. Loss of plasticity in deep continual learning. Nature 632, 768–774 (2024). https://doi.org/10.1038/s41586-024-07711-7
Watkins, C.J.C.H., Dayan, P. Q-Learning. Mach Learn 8, 279–292 (1992). https://doi.org/10.1007/BF00992698
Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992). https://doi.org/10.1007/BF00992696.
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. 2013 Dec 19.
Mnih, V., Kavukcuoglu, K., Silver, D. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In International conference on machine learning 2015 Jun 1 (pp. 1889-1897). PMLR.
van Hasselt, H., Guez, A. and Silver, D. 2016. Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 30, 1 (Mar. 2016). DOI: https://doi.org/10.1609/aaai.v30i1.10295.
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. 2017 Jul 20.
Silver D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362,1140-1144 (2018). DOI:10.1126/science.aar6404.
Silver D., Singh S., Precup D., Sutton R. Reward is enough, Artificial Intelligence, Volume 299, 2021, https://doi.org/10.1016/j.artint.2021.103535.
M. Mehdi Afsar, Trafford Crump, and Behrouz Far. 2022. Reinforcement Learning based Recommender Systems: A Survey. ACM Comput. Surv. 55, 7, Article 145 (July 2023), 38 pages. https://doi.org/10.1145/3543846
Shuo Sun, Rundong Wang, and Bo An. 2023. Reinforcement Learning for Quantitative Trading. ACM Trans. Intell. Syst. Technol. 14, 3, Article 44 (June 2023), 29 pages. https://doi.org/10.1145/3582560
Andrej Karpathy. Deep Reinforcement Learning: Pong from Pixels. Disponível em http://karpathy.github.io/2016/05/31/rl/. Acessado a última vez em 30 de abril de 2023.
Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 2021. http://jmlr.org/papers/v22/20-1364.html.
Volodymyr Mnih, Adrià Puigdomènech Badia, Mehdi Mirza, Alex Graves, Tim Harley, Timothy P. Lillicrap, David Silver, and Koray Kavukcuoglu. 2016. Asynchronous methods for deep reinforcement learning. In Proceedings of the 33^rd International Conference on International Conference on Machine Learning - Volume 48 (ICML'16). JMLR.org, 1928–1937.

Implementações e tutoriais

Deep Reinforcement Learning from OpenAI. Disponível em https://spinningup.openai.com/en/latest/index.html. Acessado a última vez em 30 de abril de 2023.
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. Openai gym. arXiv preprint arXiv:1606.01540. 2016 Jun 5.
Raffin R., Hill A., Gleave A., Kanervisto A., Ernestus M., Dormann N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 2021. http://jmlr.org/papers/v22/20-1364.html.
Andrej Karpathy. Deep Reinforcement Learning: Pong from Pixels. Disponível em http://karpathy.github.io/2016/05/31/rl/. Acessado a última vez em 30 de abril de 2023.
Deep Reinforcement Learning from OpenAI. Disponível em https://spinningup.openai.com/en/latest/index.html. Acessado a última vez em 30 de abril de 2023.
The 37 Implementation Details of Proximal Policy Optimization. Disponível https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/. Último acesso em maio de 2023.
Understanding Proximal Policy Optimization (Schulman et al., 2017). Disponível https://blog.tylertaewook.com/post/proximal-policy-optimization. Último acesso em maio de 2023.
Simonini, T. Proximal Policy Optimization (PPO). Unit 8, of the Deep Reinforcement Learning Class with Hugging Face. Disponível em https://huggingface.co/blog/deep-rl-ppo. Último acesso em maio de 2023.
Deep Reinforcement Learning Class with Hugging Face. Disponível em https://huggingface.co/learn/deep-rl-course/unit0/introduction. Último acesso em fevereiro de 2024.

Ferramentas

The Farama Foundation: this group is responsible for maintaining the Gymnasium and PettingZoo projects.
Kaggle Environments Project.
How to use Gymnasium API: a Python library for single agent reinforcement learning.
SuperSuit: wrappers for RL environments.
CleanRL: site com diversas implementações de algoritmos de RL.
Worldgen: Emergent tool use from multi-agent interaction.
Highway envs.
Tianshou is a reinforcement learning platform based on pure PyTorch.
Unity Machine Learning Agents.
Reinforcement learning for Recommendation Systems.
FlatLand.
Drone Swarm Search Environment.
MARLlib environments.
Multi-Robot Warehouse Environments.