In [ ]:
import pandas as pd
sarsa = pd.read_csv('./results/rewards_cliffwalking_sarsa_exploration.csv', header=None)
qlearning = pd.read_csv('./results/rewards_cliffwalking_qlearning_exploration.csv', header=None)
In [ ]:
import matplotlib.pyplot as plt
# Calculate the rolling average of 10 values for both series
sarsa_avg = sarsa[0].rolling(window=50).mean()
qlearning_avg = qlearning[0].rolling(window=50).mean()
# Plotting the rolling average series
plt.plot(sarsa_avg, label='SARSA')
plt.plot(qlearning_avg, label='Q-Learning')
# Adding labels and title
plt.xlabel('Episode')
plt.ylabel('Reward')
plt.title('SARSA vs Q-Learning (exploration,explotation approach) on Cliff Walking Problem (50-episode rolling average)')
# Adding legend
plt.legend()
# Displaying the plot
plt.show()
O algoritmo Q-Learning é mais otimista que o algoritmo Sarsa. Portanto, durante o treinamento um agente treinado com Q-Learning tende a errar mais em ambientes que punem mais.
In [ ]:
import pandas as pd
sarsa = pd.read_csv('./results/rewards_cliffwalking_sarsa.csv', header=None)
qlearning = pd.read_csv('./results/rewards_cliffwalking_qlearning.csv', header=None)
In [ ]:
import matplotlib.pyplot as plt
# Calculate the rolling average of 10 values for both series
sarsa_avg = sarsa[0].rolling(window=50).mean()
qlearning_avg = qlearning[0].rolling(window=50).mean()
# Plotting the rolling average series
plt.plot(sarsa_avg, label='SARSA')
plt.plot(qlearning_avg, label='Q-Learning')
# Adding labels and title
plt.xlabel('Episode')
plt.ylabel('Reward')
plt.title('SARSA vs Q-Learning (greedy approach) on Cliff Walking Problem (50-episode rolling average)')
# Adding legend
plt.legend()
# Displaying the plot
plt.show()
Mesmo em treinamentos onde o agente seleciona suas ações de forma gananciosa, olhando mais para a Q-table, o agente treinado com Q-Learning tende a errar mais que o agente treinado com Sarsa durante a etapa de treinamento.
In [ ]:
sarsa = pd.read_csv('./results/rewards_taxi_driver_sarsa.csv', header=None)
qlearning = pd.read_csv('./results/rewards_taxi_driver_qlearning.csv', header=None)
In [ ]:
import matplotlib.pyplot as plt
# Calculate the rolling average of 10 values for both series
sarsa_avg = sarsa[0].rolling(window=50).mean()
qlearning_avg = qlearning[0].rolling(window=50).mean()
# Plotting the rolling average series
plt.plot(sarsa_avg, label='SARSA')
plt.plot(qlearning_avg, label='Q-Learning')
# Adding labels and title
plt.xlabel('Episode')
plt.ylabel('Reward')
plt.title('SARSA vs Q-Learning on Taxi Driver Problem (50-episode rolling average)')
# Adding legend
plt.legend()
# Displaying the plot
plt.show()
Já em ambientes que não punem tanto, o agente treinado com Q-Learning tem comportamento similar ao agente treinado com Sarsa.