这份报纸论述 OCPA (发生作用之人或物调节概率的自动机) 仿生的自治学习系统基于为解决一个二车轮的灵活机器人的平衡控制问题的 Skinners 发生作用之人或物调节理论。学习系统的 OCPA 由二个阶段组成:在第一个阶段,一个发生作用之人或物行动从一套发生作用之人或物行动随机地被选择然后用作控制系统的输入;在第二个阶段,学习系统收集系统的取向信息并且为优化使用它直到完成控制目标。同时,发生作用之人或物行动集合的尺寸能自动地为避免小概率事件在学习过程期间被减少。理论分析为在纸学习系统的设计 OCPA 被做,它理论上证明在学习系统的 OCPA 学习机制的发生作用之人或物调节的集中,也就是,发生作用之人或物行动熵将与学习进程收敛到最小。然后 OCPA 学习系统被用于姿势二车轮的灵活自我平衡的机器人的平衡控制。机器人不把 posutre 平衡处于起始的状态和在发生作用之人或物集合的每发生作用之人或物的选择概率的技巧是相等的。与学习继续,最佳的发生作用之人或物的选择可能性逐渐地逐渐地趋于到一个和发生作用之人或物行动熵趋于到最小,并且机器人逐渐地因此学习了姿势平衡技巧。
This paper presents an OCPA (operant conditioning probabilistic automaton) bionic autonomous learning system based on Skinner's operant conditioning theory for solving the balance control problem of a two-wheeled flexible robot. The OCPA learning system consists of two stages: in the first stage, an operant action is selected stochastically from a set of operant actions and then used as the input of the control system; in the second stage, the learning system gathers the orientation information of the system and uses it for optimization until achieves control target. At the same time, the size of the operant action set can be automatically reduced during the learning process for avoiding little probability event. Theory analysis is made for the designed OCPA learning system in the paper, which theoretically proves the convergence of operant conditioning learning mechanism in OCPA learning system, namely the operant action entropy will converge to minimum with the learning process. And then OCPA learning system is applied to posture balanced control of two-wheeled flexible self-balanced robots. Robot does not have posutre balanced skill in initial state and the selecting probability of each operant in operant sets is equal. With the learning proceeding, the selected probabilities of optimal operant gradually tend to one and the operant action entropy gradually tends to minimum, and so robot gradually learned the posture balanced skill.