.. chapter_7 Chapter 7 n-step Bootstrapping ============================== The pseudo code for n-step TD is shown below. Given a policy, this algorithm will estimate the state values in the environment. .. image:: _static/nstep_td_pseudocode.jpg **IntroRL** implements the above pseudo code with the class `NStepTDWalker <./_static/colorized_scripts/agent_supt/nstep_td_eval_walker.html>`_ Figure 7.2 Random Walk ---------------------- Figure 7.2 illustrates the fact that an n-step algorithm can outperform both TD(0) and Monte Carlo. For the 19 state random walk process, the image below compares the results of the `NStepTDWalker <./_static/colorized_scripts/agent_supt/nstep_td_eval_walker.html>`_ from **IntroRL** with the published `Sutton & Barto `_ values. .. image:: _static/figure_7_2_random_walk_19.png The code used to generate the above figure is: `Figure 7.2 Code <./_static/colorized_scripts/examples/chapter_7/figure_7_2.html>`_ Few Examples ------------ Chapter 7 gives very few examples against which to verify **IntroRL** routines. Perhaps of interest, however, is the use of the n-step Sarsa routine. .. image:: _static/nstep_sarsa_pseudocode.jpg Using the above pseudo code, **IntroRL** implements `NStepSarsaWalker <./_static/colorized_scripts/agent_supt/nstep_sarsa_eval_walker.html>`_ for evaluating a policy. Also implemented is `NStepSarsaQStarFinder <./_static/colorized_scripts/agent_supt/nstep_sarsa_qstar_walker.html>`_ for calculating the optimum action value function ``Q*``.