强化学习1-白红宇

强化学习1

阅读量：5293 次

发布时间：2019-06-14

本文共 399 字，大约阅读时间需要 1 分钟。

一、与监督学习不一样的地方：

（1）closed-loop

（2）not told which actions to take

（3）not only the immediate reward but also the next situation,through that, all subsequent rewards.

（4）the dilemma is that neither exploration nor exploitation can be persued exclusively without failing at the task.

（5）a goal-directed agent interacting with an uncertain environment.

转载于:https://www.cnblogs.com/Wanggcong/p/6017114.html

你可能感兴趣的文章

解决升级系统导致的 curl: (48) An unknown option was passed in to libcurl

查看>>

Shell命令-内置命令及其它之watch、date

查看>>

Java Session 介绍;

查看>>

spoj TBATTLE 质因数分解+二分

dedecms讲解－arc.listview.class.php分析，列表页展示

查看>>

Microsoft SQL Server Transact-SQL

查看>>

Font: a C++ class

查看>>

Extjs6 经典版 combo下拉框数据的使用及动态传参

查看>>

Java四种引用包括强引用，软引用，弱引用，虚引用

查看>>

【NodeJS】http-server.cmd

查看>>

iOS bundle identifier 不一致，target general的Bundle Identifier后面总是有三条灰色的横线...

查看>>

研磨JavaScript系列（五）：奇妙的对象

查看>>

对比传统的Xilinx AMP方案和OPENAMP方案-xapp1078和ug1186

查看>>

面试题2

查看>>

selenium+java iframe定位

细读 php json数据和JavaScript json数据

查看>>