强化学习的ppt

上传人:he****18 文档编号:253369587 上传时间:2024-12-12 格式:PPT 页数:20 大小:1.03MB
返回 下载 相关 举报
强化学习的ppt_第1页
第1页 / 共20页
强化学习的ppt_第2页
第2页 / 共20页
强化学习的ppt_第3页
第3页 / 共20页
点击查看更多>>
资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,*,Click to edit Master title style,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,*,*,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,REINFORCEMENT LEARNING,Overview&Applications to Music,Gautam Bhattacharya,MUMT 621,1,rise of the machine,let us assume that we are playing against an imperfect player,one whose play is sometimes incorrect and allows us to win.For the moment,in fact,let us consider draws and losses to be equally bad for us.How might we construct a player that will find the imperfections in its opponents play and learn to maximize its chances of winning?,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,2,Goals&topics,What is Reinforcement Learning?,History,Introduction,Individuality&Examples,Elements of a Reinforcement Learning System,The Reinforcement Problem-An Example,Applications to Music,Questions&Comments,3,History,heterostatic theory of adaptive systems developed by A.Harry Klopf,but in 1979 we came to realize that perhaps the simplest of the ideas,which had long been taken for granted,had received surprisingly little attention from a computational perspective.This was simply the idea of a learning system that,wants,something,that adapts its behaviour in order to maximize a special signal from its environment.This was the idea of a hedonistic learning system,or,as we would say now,the idea of reinforcement learning,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,4,Introduction,What is Reinforcement Learning?,Reinforcement learning is learning,what to do,-,how to map situations to actions,-so as to,maximize a numerical reward signal,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,These two characteristics:,trial-and-error,delayed reward,These are the two most important distinguishing features of reinforcement learning,5,Introduction,The formulation is intended to include just these three aspects,-,sensation,-,action,-goal,Clearly,such an agent must be able to sense the state of the environment to some extent and must be able to take actions that affect the state.The agent also must have a goal or goals relating to the state of the environment.,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,6,DIFFERENCES WITH RESPECT TO SUPERVISED LEARNING,Learning with a critic as opposed to learning with a teacher.,Reinforcement learning=INTERACTIVE learning,In interactive problems it is often impractical to obtain examples of desired behaviour that are both correct and representative of all the situations in which the agent has to act,Reinforcement Learning looks at the bigger picture,For example,we have mentioned that much of machine learning research is concerned with supervised learning without explicitly specifying how such an ability would finally be useful.,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,7,challenges,Challenges,One of the challenges that arise in reinforcement learning and not in other kinds of learning is the trade-off between,exploration,and,exploitation,The,whole,problem of a goal-directed agent interacting with an uncertain environment,All reinforcement learning agents have explicit goals,can sense aspects of their environments,and can choose actions to influence their environments.Moreover,it is usually assumed from the beginning that the agent has to operate despite significant uncertainty about the environment it faces.,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,8,Examples,A master chess player makes a move.The choice is informed both by planning-anticipating possible replies and counterreplies-and by immediate,intuitive judgments of the desirability of particular positions and moves.,A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station.It makes its decision based on how quickly and easily it has been able to find the recharger in
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 生活休闲 > 生活常识


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!