DRL - 01导论

ML 23-1 deep reinforcement learning

scenario of deep reinforcement learning

  • learning to play GO
  • Supervised vs Reinforcement
  • applications

    Gym: https://gym.openai.com/

    Universe: https://openai.com/blog/universe/

  • difficulties of reinforcement learning

    reward delay 一些没有奖励的动作在当前看起来没有用,但对未来会产生影响,帮助在未来得到奖励。

    agent’s actions affect the subsequent data it recevives,agent 需要去探索,不管是好的行为还是坏的。

  • outline

Policy-based Approach - Learning an Actor

  • machine learning $\approx$ looking for a function
  • 找function 的三大步骤

  • DRL

    1. neural network as actor

      input: vector、matrix,eg: pixels

      output: action 采取行动的几率,stochastic

    1. goodness of function

      supervised learning vs DRL

    1. pick the best
    • gradient ascent


    • add a baseline

critics

评估observation

Actor-Critic

ML 23-2 policy gradient (Supplementary Explanation)

ML 23-3 RL

interact with environments

机器学到的行为会影响下一步的发展,所有的action 当成整体看待

components

env、reward function不能控制,只能调整actor的行为

critic

评估critic:

Monre-Carlo:

Temporal defference:

Q

actor 如果⽆法穷举则会爆炸,采用PDPG

pathwise derivative policy gradient

Asynchronous A3C

imitation learning



类似GAN:

作者

Yang

发布于

2022-03-19

更新于

2022-03-24

许可协议

评论