装备故障文本聚类是发现故障规律和预防故障发生的新方法。通过分析装备故障模式文法,定义了故障关键词。利用关键词,从重合度和逆序度两方面提出了综合语义距离计算方法,分析了语义距离的性质,构建了最大簌聚类算法。从装备故障现象数据集的聚类分析结果来看,该方法能快速、准确聚类装备故障,平均F值达0.93以上,并成功应用在基于J2EE的装备保障信息系统中。
Equipment failure text clustering is a novel method to discover the law of failure and to prevent the occurrence of failure. By analyzing the failure mode of the equipment, the key words of faults are defined. From the coincidence degree and the reverse order, the integrated semantic distance calculation method is proposed. Besides, the nature of semantic distance is analysed, and the maximum sieve clustering algorithm is constructed by using keywords. In the case of data sets of equipment failure phenomenon, the method can quickly and accurately cluster the equipment failure, with the average F-measure reaching more than 0.93, and it is successfully applied in the J2EE based equipment support information system.