一、与监督学习不一样的地方:
(1)closed-loop
(2)not told which actions to take
(3)not only the immediate reward but also the next situation,through that, all subsequent rewards.
(4)the dilemma is that neither exploration nor exploitation can be persued exclusively without failing at the task.
(5)a goal-directed agent interacting with an uncertain environment.