神经网络学习控制NeuralNetworkbasedLearningControl

上传人:yx****d 文档编号:242973592 上传时间:2024-09-13 格式:PPT 页数:19 大小:121KB
返回 下载 相关 举报
神经网络学习控制NeuralNetworkbasedLearningControl_第1页
第1页 / 共19页
神经网络学习控制NeuralNetworkbasedLearningControl_第2页
第2页 / 共19页
神经网络学习控制NeuralNetworkbasedLearningControl_第3页
第3页 / 共19页
点击查看更多>>
资源描述
单击此处编辑母版标题样式,单击此处编辑母版文本样式,第二级,第三级,第四级,第五级,*,神经网络学习控制,Neural Network based Learning Control,1,7.1,Reinforcement Learning(,再励学习,自强式学习),神经网络学习方法有三类:,监督学习,Supervised Learning ,例如,BP ,有明确的“教师”信号,无监督学习,Unsupervised Learning ,没有任何“教师”信号 只是通过输入数据的内部信息 相当自组织?类方法。 例如,再励学习,Reinforcement Learning,源于心理学简单的说,一个人有笔钱,有几个投资选择,A.B.C.,他投,B,,赚钱了,再投,B 。until B,不赚钱了,或有突发事件使他觉得,A,更好,于是将钱投到,A。,2,由,Barto,等人提出的,Reinforcement Learning,可称为,ASE,/ACE,模型,即由,ASE:Associative Search Element,:,关联搜索单元,ACE:Adaptive Critic Element:,自适应评判单元构成。,ASE,的作用是确定控制信号,y ACE,则对再励信号,r,进行改善,得到,ASE,和,ACE,各有,n,路输入通道,由系统状态,S,解码而成(这与,cmac,相同),且每一时刻只选一个,即控制信号的确定和各通道权值的修正如下:,3,其中, 和 分别为,ASE,和,ACE,各通道的权值; 是经改善的再励信号,,、,、,和,有关系数,noise,为随机噪声。,4,DECODER,CartPole,system,V1,.,v2 vn,W1,w2,wn,5,Cart-Pole,的数学模型,Failure,的条件,显然,各单元的输出几乎完全取决于被选通道的权值,,ASE,略受噪声的影响。各权值的学习几乎独立,只有那些曾经被选中的通道才会得到修正,其他则不变。这样,一旦碰到完全新的情况,则可能输出一个完全错误的控制信号,导致,FAIL,6,Two approaches to Neural Network based Learning Control,7.2 Direct Inverse Modelling,7.3 Learning Control with a Distal Teacher (Distal Learning,),7,The control problem,Learner,Environment,intention,action,outcome,Inverse,Model,Environment,y*,xn-1,un-1,yn-1,8,1. The Direct Inverse Modeling approach to learning an inverse model,Environment,Inverse Model,xn-1,yn,un-1,+,-,9,2. The distal learning approach to learning an inverse model,Environment,Forward Model,xn-1,yn,un-1,+,-,2.1,Learning the forward model using the prediction error yn-yn,yn,10,2.2,Learning the inverse model via forward model using the performance error y*n-yn,Inverse,Model,y*n-1,xn-1,un-1,yn,forward,Model,y*n-yn,11,The control systems,1.,The direst inverse modeling approach,Environment,Inverse Model,yn,un-1,+,-,y*n,12,1.2 Eg. Learning control of CSTR using CMAC,CMAC,memory,CMAC,training,CMAC,response,CSTR,P,controller,extreme,controller,control,Switch,reference,Coordinator,Sd,ep,ed,ud,up,ue,uc,So,13,The CSTR system,(continuous-stirred tank reactor),And this maybe transformed to the dimensionless form as:,14,Where,x1 is the conversion rate relating to the reaction concentration;,x2 is the reaction temperature in the dimensionless form;,Uf and Uc are control variables corresponding to the input flow rate F and coolant temperature Tc, respectively.,are system parameters.,15,Temperature control,feed,product,jacket,16,CMAC based learning control approach,Current outcome state So(x1,x2,dx1), current setting x1e(k), next setting x1ek+1, where, dx1k=x1k x1k-1,Let ed= x1ek+1 x1k-1,ep,=x1ek- x1k , where, ed= difference between next setting and current output,ep,=current deviation between desired and actual output,IF |ed| threshold, THEN take the extreme control, i.e., IF ed threshold, THEN,Uc,=,Umax,IF ed - threshold, THEN,Uc,=,Umin,OTHERWISE take the learning control,Uc,= Up +,Ud,Up=,ep,*,Kp,Ud,= CMAC response,17,CMAC training,So ( x1k+1, x2k+1, dx1k+1 ) as the input to the CMAC,Uck as the “teacher signal” for the training,Consider,that So is the result caused by Uck, therefore, if the input to CMAC is So, the corresponding output should be Uck,This is the end of one control-learning cycle, and successive cycles are just the same.,18,The Distal Learning Control Approach,NN1,P,extreme,Control,switch,NN2,CSTR,coordinator,reference,19,
展开阅读全文
相关资源
正为您匹配相似的精品文档
相关搜索

最新文档


当前位置:首页 > 图纸专区 > 大学资料


copyright@ 2023-2025  zhuangpeitu.com 装配图网版权所有   联系电话:18123376007

备案号:ICP2024067431-1 川公网安备51140202000466号


本站为文档C2C交易模式,即用户上传的文档直接被用户下载,本站只是中间服务平台,本站所有文档下载所得的收益归上传人(含作者)所有。装配图网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对上载内容本身不做任何修改或编辑。若文档所含内容侵犯了您的版权或隐私,请立即通知装配图网,我们立即给予删除!