Aprendizagem por Reforço

Ementa

Aprendizagem por Reforço. Algoritmos de Aprendizagem por Reforço. Implementação de agentes autônomos usando aprendizagem por reforço.

Ao final da disciplina o estudante será capaz de:

Construir um sistema baseado em aprendizagem por reforço para tomada de decisões sequenciais.
Compreender como se deve formalizar uma tarefa considerando um problema de aprendizagem por reforço, como implementar uma solução e avaliá-la.
Compreender os tipos de algoritmos de aprendizagem por reforço: value-based, policy gradient e actor-critic.
Compreender qual é a relação de aprendizagem por reforço com aprendizagem supervisionada e não-supervisionada. supervised learning.

Introdução ao Aprendizado por Reforço.
Implementação de agentes autônomos usando aprendizagem por reforço.
Taxonomia dos algoritmos de aprendizagem por reforço.
Algoritmo Q-Learning.
Algoritmo Sarsa.
Deep Reinforcement Learning.
Algoritmos do tipo Deep Q-Learning.
Reinforce: um algoritmo de Policy Gradient.
Algoritmos do tipo Actor-Critic.
Implementações de agentes autônomos usando projetos, tais como, Gymnasium da Farama e a biblioteca para reinforcement learning do Kaggle.
Exemplos de soluções usando aprendizagem por reforço.

GÉRON, A. Hands-on Machine Learning with Scikit-learn, Keras, and TensorFlow, 2ª ed., O'Reilly, 2021.
SUTTON, R.; BARTO, A. Reinforcement Learning: An Introduction. Second Edition. The MIT Press, 2018.
Van Hasselt, H., Guez, A. and Silver, D., 2016, March. Deep reinforcement learning with double q-learning. In Proceedings of the AAAI conference on artificial intelligence (Vol. 30, No. 1).
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. and Klimov, O., 2017. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Brockman, G. et al., 2016. Openai gym. arXiv preprint arXiv:1606.01540.

Laura Graesser and Wah Loon Keng. 2019. Foundations of Deep Reinforcement Learning: Theory and Practice in Python (1^st. ed.). Addison-Wesley Professional.
NORVIG, P.; RUSSELL, S., Inteligência Artificial, 3ª ed., Campus Elsevier, 2013.
SILVER, D.; SINGH S.; PRECUP D.; SUTTON R. Reward is enough. Artificial Intelligence. Vol 299, 2021.
MuZero: Mastering Go, chess, shogi and Atari without rules. Publicado em Dezembro, 2020.
SILVER, D.; HUBERT T.; SCHRITTWIESER, J.; ANTONOGLOU, I.; LAI, M.; GUEZ, A. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 362, 1140-1144 (2018).
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D. and Riedmiller, M., 2013. Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602.
Peter Henderson, Riashat Islam, Philip Bachman, Joelle Pineau, Doina Precup, and David Meger. 2018. Deep reinforcement learning that matters. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence and Thirtieth Innovative Applications of Artificial Intelligence Conference and Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (AAAI'18/IAAI'18/EAAI'18). AAAI Press, Article 392, 3207–3214.
Dohare, S., Hernandez-Garcia, J.F., Lan, Q. et al. Loss of plasticity in deep continual learning. Nature 632, 768–774 (2024). https://doi.org/10.1038/s41586-024-07711-7