资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,神经网络学习控制,Neural Network based Learning Control,1,7.1,Reinforcement Learning(,再励学习,自强式学习),神经网络学习方法有三类:,监督学习,Supervised Learning ,例如,BP ,有明确的“教师”信号,无监督学习,Unsupervised Learning ,没有任何“教师”信号 只是通过输入数据的内部信息 相当自组织?类方法。 例如,再励学习,Reinforcement Learning,源于心理学简单的说,一个人有笔钱,有几个投资选择,A.B.C.,他投,B,,赚钱了,再投,B 。until B,不赚钱了,或有突发事件使他觉得,A,更好,于是将钱投到,A。,2,由,Barto,等人提出的,Reinforcement Learning,可称为,ASE,/ACE,模型,即由,ASE:Associative Search Element,:,关联搜索单元,ACE:Adaptive Critic Element:,自适应评判单元构成。,ASE,的作用是确定控制信号,y ACE,则对再励信号,r,进行改善,得到,ASE,和,ACE,各有,n,路输入通道,由系统状态,S,解码而成(这与,cmac,相同),且每一时刻只选一个,即控制信号的确定和各通道权值的修正如下:,3,其中, 和 分别为,ASE,和,ACE,各通道的权值; 是经改善的再励信号,,、,、,和,有关系数,noise,为随机噪声。,4,DECODER,CartPole,system,V1,.,v2 vn,W1,w2,wn,5,Cart-Pole,的数学模型,Failure,的条件,显然,各单元的输出几乎完全取决于被选通道的权值,,ASE,略受噪声的影响。各权值的学习几乎独立,只有那些曾经被选中的通道才会得到修正,其他则不变。这样,一旦碰到完全新的情况,则可能输出一个完全错误的控制信号,导致,FAIL,6,Two approaches to Neural Network based Learning Control,7.2 Direct Inverse Modelling,7.3 Learning Control with a Distal Teacher (Distal Learning,),7,The control problem,Learner,Environment,intention,action,outcome,Inverse,Model,Environment,y*,xn-1,un-1,yn-1,8,1. The Direct Inverse Modeling approach to learning an inverse model,Environment,Inverse Model,xn-1,yn,un-1,+,-,9,2. The distal learning approach to learning an inverse model,Environment,Forward Model,xn-1,yn,un-1,+,-,2.1,Learning the forward model using the prediction error yn-yn,yn,10,2.2,Learning the inverse model via forward model using the performance error y*n-yn,Inverse,Model,y*n-1,xn-1,un-1,yn,forward,Model,y*n-yn,11,The control systems,1.,The direst inverse modeling approach,Environment,Inverse Model,yn,un-1,+,-,y*n,12,1.2 Eg. Learning control of CSTR using CMAC,CMAC,memory,CMAC,training,CMAC,response,CSTR,P,controller,extreme,controller,control,Switch,reference,Coordinator,Sd,ep,ed,ud,up,ue,uc,So,13,The CSTR system,(continuous-stirred tank reactor),And this maybe transformed to the dimensionless form as:,14,Where,x1 is the conversion rate relating to the reaction concentration;,x2 is the reaction temperature in the dimensionless form;,Uf and Uc are control variables corresponding to the input flow rate F and coolant temperature Tc, respectively.,are system parameters.,15,Temperature control,feed,product,jacket,16,CMAC based learning control approach,Current outcome state So(x1,x2,dx1), current setting x1e(k), next setting x1ek+1, where, dx1k=x1k x1k-1,Let ed= x1ek+1 x1k-1,ep,=x1ek- x1k , where, ed= difference between next setting and current output,ep,=current deviation between desired and actual output,IF |ed| threshold, THEN take the extreme control, i.e., IF ed threshold, THEN,Uc,=,Umax,IF ed - threshold, THEN,Uc,=,Umin,OTHERWISE take the learning control,Uc,= Up +,Ud,Up=,ep,*,Kp,Ud,= CMAC response,17,CMAC training,So ( x1k+1, x2k+1, dx1k+1 ) as the input to the CMAC,Uck as the “teacher signal” for the training,Consider,that So is the result caused by Uck, therefore, if the input to CMAC is So, the corresponding output should be Uck,This is the end of one control-learning cycle, and successive cycles are just the same.,18,The Distal Learning Control Approach,NN1,P,extreme,Control,switch,NN2,CSTR,coordinator,reference,19,
展开阅读全文