Using RL in a competitive environment

The goal of this activity is to implement an agent to play tic-tac-toe using Q-Learning or Sarsa algorithms and show the results.

Tic-tac-toe player

Do you remember this code?

#
# Reference: https://pettingzoo.farama.org/environments/classic/tictactoe/
#

import gymnasium as gym
from pettingzoo.classic import tictactoe_v3

def play_random_agent(agent, obs):
    x = env.action_space(agent).sample()
    while obs['action_mask'][x] != 1:
        x = env.action_space(agent).sample()
    return x

def play_my_agent(agent, obs):
    # TODO you must implement your code here
    pass

env = tictactoe_v3.env(render_mode='human')
env.reset()

not_finish = True
while not_finish:
    for agent in ['player_1','player_2']:
        observation, reward, termination, truncation, info = env.last() 
        if termination or truncation:
            not_finish = False
        else:
            if agent == 'player_1':
                action = play_random_agent(agent,observation)
            else:
                action = play_my_agent(agent,observation)
            print(f'play: ',action)
            env.step(action)

print(env.rewards)

The exercise today is to implement an agent using reinforcement learning able to play tic-tac-toe and win or draw and never lose.

In this case, the states (obs) are represented by a matrix. How to transform each possible matrix configuration into a state id? How many states are possible? Is it possible to define a function that has a matrix as an input and generate an id for each state?

Deliver

This exercise must be done by a group of 3 students.
The deadline is 03/19/2023 23:30 -0300.
The implementation must be delivered through Github classroom. This is the link https://classroom.github.com/a/5MNmW_QO.
You must add everything necessary to run this project in the repository, like the README file, requirements file and code.