文中提出了一种在部分观测环境下学习规划领域的派生谓词规则的方法.在规划领域描述语言(PDDL)中,派生谓词用来描述动作的非直接效果,是规划领域模型和搜索控制知识的重要组成部分.然而,对于大多数规划领域而言,从无到有地构造派生谓词规则是不容易的.因此,研究自动获取派生谓词的推导规则是有意义的.已有研究工作提出通过修订一个初始的不完备的领域理论来获取推导规则的方法,但是它们的主要缺点在于待学习谓词的训练例的数量非常少,这是因为训练例按照非常有限的方式来生成.而更本质的原因在于它们假设环境是不可观测的.其实,在现实生活中很多动作的非直接效果是可以观测的,或者通过简单的目测或者通过专门的工具.因此文中提出增加观测来反映动作的非直接效果,以便增加待学习谓词的训练例数目从而改善学习的精准度.此外,为了补充一些在归纳学习过程中学习不到的谓词,文中还提出了一个后处理方法来使得学习到的规则在语义上更完整.通过在派生谓词基准领域上的实验表明,文中所提出的方法是可行有效的.更深远的意义在于,文中的研究工作有利于规划领域的自动建模或者控制知识的自动获取的研究与实现.
This paper presents a method to learn derived predicate rules for planning domains under partial observability.In the PDDL(Planning Domain Description Language),derived predicates are a compact way to describe indirect effects of actions,and an important part of planning domain models or search control knowledge.However,for most planning domains,it is not easy to write derived predicate rules from scratch,even for experts.Therefore,it is worthy of studying how to automatically acquire rules for derived predicates from observed plans.There has been some research work on gaining derived rules by refining an initial and imperfect domain theory.But,their primary disadvantage was that the number of training examples for predicates to be learned was very small since training examples were produced in a very limited way.The underlying reason was that they assumed that the environment was unobservable.In fact,in the real world,many indirect effects of actions are observable by simple eye-measurement or tools.This paper uses observations to reflect actions' indirect effects in order to increase the number of trainingexamples and to improve the learning accuracy.Also,to complement some predicates which cannot be learned by the inductive learning method,this paper gives a post-processing algorithm to make the semantics of learned rules more perfect.Experiments on some benchmark domains show that,the method presented in this paper is feasible and effective.And further,the work in this paper is beneficial for the study on automatically modeling planning domains and automatically acquiring control knowledge.