Skip to content

Referências e ferramentas

Nesta página estão listadas as principais referências e ferramentas utilizadas nesta disciplina.

Livros

  1. Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction (2nd ed.). The MIT Press.

  2. Stefano V. Albrecht, Filippos Christianos, and Lukas Schäfer. Multi-Agent Reinforcement Learning: Foundations and Modern Approaches. MIT Press, 2024.

  3. Mitchell, T. (1997). Reinforcement Learning in Machine Learning. McGraw-Hill.

  4. NORVIG, P.; RUSSELL, S., Inteligência Artificial, 3ª ed., Campus Elsevier, 2013

Artigos

  1. Watkins, C.J.C.H., Dayan, P. Q-Learning. Mach Learn 8, 279–292 (1992). https://doi.org/10.1007/BF00992698
  2. Williams, R.J. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992). https://doi.org/10.1007/BF00992696.

  3. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, Riedmiller M. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602. 2013 Dec 19.

  4. Mnih, V., Kavukcuoglu, K., Silver, D. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015). https://doi.org/10.1038/nature14236

  5. Schulman J, Levine S, Abbeel P, Jordan M, Moritz P. Trust region policy optimization. In International conference on machine learning 2015 Jun 1 (pp. 1889-1897). PMLR.

  6. van Hasselt, H., Guez, A. and Silver, D. 2016. Deep Reinforcement Learning with Double Q-Learning. Proceedings of the AAAI Conference on Artificial Intelligence. 30, 1 (Mar. 2016). DOI: https://doi.org/10.1609/aaai.v30i1.10295.

  7. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347. 2017 Jul 20.

  8. Silver D. et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362,1140-1144 (2018). DOI:10.1126/science.aar6404.

  9. Silver D., Singh S., Precup D., Sutton R. Reward is enough, Artificial Intelligence, Volume 299, 2021, https://doi.org/10.1016/j.artint.2021.103535.

  10. M. Mehdi Afsar, Trafford Crump, and Behrouz Far. 2022. Reinforcement Learning based Recommender Systems: A Survey. ACM Comput. Surv. 55, 7, Article 145 (July 2023), 38 pages. https://doi.org/10.1145/3543846

  11. Shuo Sun, Rundong Wang, and Bo An. 2023. Reinforcement Learning for Quantitative Trading. ACM Trans. Intell. Syst. Technol. 14, 3, Article 44 (June 2023), 29 pages. https://doi.org/10.1145/3582560

  12. Andrej Karpathy. Deep Reinforcement Learning: Pong from Pixels. Disponível em http://karpathy.github.io/2016/05/31/rl/. Acessado a última vez em 30 de abril de 2023.

  13. Antonin Raffin and Ashley Hill and Adam Gleave and Anssi Kanervisto and Maximilian Ernestus and Noah Dormann Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 2021. http://jmlr.org/papers/v22/20-1364.html.

Implementações e tutoriais

  1. Deep Reinforcement Learning from OpenAI. Disponível em https://spinningup.openai.com/en/latest/index.html. Acessado a última vez em 30 de abril de 2023.

  2. Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W. Openai gym. arXiv preprint arXiv:1606.01540. 2016 Jun 5.

  3. Raffin R., Hill A., Gleave A., Kanervisto A., Ernestus M., Dormann N. Stable-Baselines3: Reliable Reinforcement Learning Implementations. Journal of Machine Learning Research, 2021. http://jmlr.org/papers/v22/20-1364.html.

  4. Andrej Karpathy. Deep Reinforcement Learning: Pong from Pixels. Disponível em http://karpathy.github.io/2016/05/31/rl/. Acessado a última vez em 30 de abril de 2023.

  5. Deep Reinforcement Learning from OpenAI. Disponível em https://spinningup.openai.com/en/latest/index.html. Acessado a última vez em 30 de abril de 2023.

  6. The 37 Implementation Details of Proximal Policy Optimization. Disponível https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/. Último acesso em maio de 2023.

  7. Understanding Proximal Policy Optimization (Schulman et al., 2017). Disponível https://blog.tylertaewook.com/post/proximal-policy-optimization. Último acesso em maio de 2023.

  8. Simonini, T. Proximal Policy Optimization (PPO). Unit 8, of the Deep Reinforcement Learning Class with Hugging Face. Disponível em https://huggingface.co/blog/deep-rl-ppo. Último acesso em maio de 2023.

  9. Deep Reinforcement Learning Class with Hugging Face. Disponível em https://huggingface.co/learn/deep-rl-course/unit0/introduction. Último acesso em fevereiro de 2024.

Ferramentas

  1. The Farama Foundation: this group is responsible for maintaining the Gymnasium and PettingZoo projects.
  2. Kaggle Environments Project.
  3. How to use Gymnasium API: a Python library for single agent reinforcement learning.
  4. SuperSuit: wrappers for RL environments.
  5. CleanRL: site com diversas implementações de algoritmos de RL.
  6. Worldgen: Emergent tool use from multi-agent interaction.
  7. Highway envs.
  8. Tianshou is a reinforcement learning platform based on pure PyTorch.
  9. Unity Machine Learning Agents.
  10. Reinforcement learning for Recommendation Systems.
  11. FlatLand.
  12. Drone Swarm Search Environment.
  13. MARLlib environments.
  14. Multi-Robot Warehouse Environments.

Last update: April 23, 2024