Chapter 7 n-step Bootstrapping

The pseudo code for n-step TD is shown below.

Given a policy, this algorithm will estimate the state values in the environment.

_images/nstep_td_pseudocode.jpg

IntroRL implements the above pseudo code with the class NStepTDWalker

Figure 7.2 Random Walk

Figure 7.2 illustrates the fact that an n-step algorithm can outperform both TD(0) and Monte Carlo.

For the 19 state random walk process, the image below compares the results of the NStepTDWalker from IntroRL with the published Sutton & Barto values.

_images/figure_7_2_random_walk_19.png

The code used to generate the above figure is: Figure 7.2 Code

Few Examples

Chapter 7 gives very few examples against which to verify IntroRL routines.

Perhaps of interest, however, is the use of the n-step Sarsa routine.

_images/nstep_sarsa_pseudocode.jpg

Using the above pseudo code, IntroRL implements NStepSarsaWalker for evaluating a policy.

Also implemented is NStepSarsaQStarFinder for calculating the optimum action value function Q*.