240 发简信
IP属地:英格兰
  • Chapter 9

    Chapter 9: On-policy Prediction with Approximation From this chapter, we...

  • Resize,w 360,h 240
    Chapter 7

    Chapter 7: n-step Bootstrapping n-step TD methods span a spectrum with M...

  • Chapter 6

    Chapter 6: Temporal-Difference Learning Temporal-difference (TD) learnin...

  • Resize,w 360,h 240
    Chapter 5

    Chapter 5: Monte Carlo Methods Monte Carlo (MC) methods are learning met...

  • Chapter 4

    Chapter 4: Dynamic Programming Dynamic programming computes optimal poli...

  • Chapter 3

    Chapter 3: Finite Markov Decision Processes Basic Definitions MDP is the...

  • Chapter 2

    Chapter 2: Multi-armed Bandits Multi-armed bandits can be seen as the si...

  • Pointer Networks

    Pointer Networks Oriol Vinyals, Meire Fortunato, Navdeep JaitlyGoogle, B...

  • Resize,w 360,h 240
    Neural Computation of Decisions in Optimization Problems

    Neural Computation of Decisions in Optimization Problems J. J. Hopfield,...