资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,*,Click to edit Master title style,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,*,*,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,REINFORCEMENT LEARNING,Overview&Applications to Music,Gautam Bhattacharya,MUMT 621,1,rise of the machine,let us assume that we are playing against an imperfect player,one whose play is sometimes incorrect and allows us to win.For the moment,in fact,let us consider draws and losses to be equally bad for us.How might we construct a player that will find the imperfections in its opponents play and learn to maximize its chances of winning?,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,2,Goals&topics,What is Reinforcement Learning?,History,Introduction,Individuality&Examples,Elements of a Reinforcement Learning System,The Reinforcement Problem-An Example,Applications to Music,Questions&Comments,3,History,heterostatic theory of adaptive systems developed by A.Harry Klopf,but in 1979 we came to realize that perhaps the simplest of the ideas,which had long been taken for granted,had received surprisingly little attention from a computational perspective.This was simply the idea of a learning system that,wants,something,that adapts its behaviour in order to maximize a special signal from its environment.This was the idea of a hedonistic learning system,or,as we would say now,the idea of reinforcement learning,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,4,Introduction,What is Reinforcement Learning?,Reinforcement learning is learning,what to do,-,how to map situations to actions,-so as to,maximize a numerical reward signal,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,These two characteristics:,trial-and-error,delayed reward,These are the two most important distinguishing features of reinforcement learning,5,Introduction,The formulation is intended to include just these three aspects,-,sensation,-,action,-goal,Clearly,such an agent must be able to sense the state of the environment to some extent and must be able to take actions that affect the state.The agent also must have a goal or goals relating to the state of the environment.,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,6,DIFFERENCES WITH RESPECT TO SUPERVISED LEARNING,Learning with a critic as opposed to learning with a teacher.,Reinforcement learning=INTERACTIVE learning,In interactive problems it is often impractical to obtain examples of desired behaviour that are both correct and representative of all the situations in which the agent has to act,Reinforcement Learning looks at the bigger picture,For example,we have mentioned that much of machine learning research is concerned with supervised learning without explicitly specifying how such an ability would finally be useful.,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,7,challenges,Challenges,One of the challenges that arise in reinforcement learning and not in other kinds of learning is the trade-off between,exploration,and,exploitation,The,whole,problem of a goal-directed agent interacting with an uncertain environment,All reinforcement learning agents have explicit goals,can sense aspects of their environments,and can choose actions to influence their environments.Moreover,it is usually assumed from the beginning that the agent has to operate despite significant uncertainty about the environment it faces.,-Sutton,R.S.,and A.G.Barto.1998.,Reinforcement Learning:,An Introduction,8,Examples,A master chess player makes a move.The choice is informed both by planning-anticipating possible replies and counterreplies-and by immediate,intuitive judgments of the desirability of particular positions and moves.,A mobile robot decides whether it should enter a new room in search of more trash to collect or start trying to find its way back to its battery recharging station.It makes its decision based on how quickly and easily it has been able to find the recharger in
展开阅读全文