通过逆向增强学习和策略不变条件下的回报函数变形原理,研究已有标准轨迹前提下的车辆轨迹评测问题,提出基于倾向性分析的轨迹评测技术.对于标准轨迹和评测轨迹应用逆向增强学习算法,分别求出两者相对应的特征权重,将特征权重在策略不变条件下分别扩展成线性子空间,通过计算由正交投影矩阵定义的子空间间距离得到对评测轨迹的评测值.在四轮车辆仿真实验中,针对几种典型的驾驶风格轨迹对该方法进行验证.实验结果表明,该方法能够对于避障评测轨迹按其与标准轨迹的差异给出评测结果,克服了相同策略对应回报函数不唯一性所带来的影响,有效解决了车辆轨迹之间难于定量比较的难题.
The trajectory evaluation problem when a demonstration from an expert is available was investigated through inverse reinforcement learning and reward reshaping technique under policy invariance. A novel intention-based method was presented. The weights of the given trajectory and the demonstration were determined with respect to a fixed group of features. The linear subspaces spanned by these two weight vectors were computed by using the reward reshaping technique. The norm of orthogonal projections was calculated and used to measure the difference between subspaees. In the fo wheel vehicle simulation experiment, the approach was tested by applying it to trajectories generat several typical scenarios. Empirical results showed that, for the given trajectories, the approach can reasonable marks in finite steps according to the difference between the given trajectory demonstration. The approach can eliminate the ambiguity brought by the inherent ill-posedness of in problems, and overcome the difficulties of trajectory evaluation. ed in yield and verse