round 0
record 0


$\\Q(S_t,A) \leftarrow (1-\alpha)*Q(S_t,A) + \alpha*[R_t + \gamma*max_aQ(S_{t+1},a)]$
$\text{where}{\hspace{5pt}}S = (dx, dy, v_{bird})$

https://www.cs.toronto.edu/~vmnih/docs/dqn.pdf