This is a coursework for reinforcement learning class. The goal is to build a Q-learning agent to play an old racing game published by Activision for Atari 2600 called Enduro. The game has a simple rule: you have to overtake as many cars as possible and it will suddenly decelerate if it hits another car. So, it translates to get a reward if the agents overtakes another car, but get negative reward if it is overtaken.
The features I built are position relative to the center of the road, movement state, and position of the opponents. The Q(s,a) state-action value function consists of parameters vector and features vector. This parameters vector is updated using off-policy TD update rule. I wrote the code here.
The agent overtaking opponents after about 200 epochs,