临床行为数据经清理后仍然存在时间关系噪音,直接用于序列挖掘算法难以发现高质量的模式。提出了一种时间规范化模型,该模型定义了时序行为的顺序和并列关系,针对所给出的关系进行相交系数的计算,根据计算结果确定行为时间关系中的噪音,遵循规范后的所有行为相互之间既无噪音又保持原正确关系不变的准则,进行噪音清除。针对模型进行了算法实现,对样本数据的测试结果表明,经处理后的数据满足了后续的模式挖掘的要求。
The time relationship noises still exist even after the clinical behavior data are cleaned, so it is difficult to discover high quality patterns from such data using sequential mining algorithms. A model for normalization is proposed, which defines ordinal and parallel relationships of the temporal behaviors. The intersection coefficient is worked out using the given relationships, according to the calculated results, the noises in relatioships is determined, and then the work of eliminating noises is carried out complying with the guideline that no noises exist and original correct relationships are kept among the normalized behaviors. To test the sampling data, an algorithm for the model is implemented. The testing results show the clinical data processed by the algorithm can fulfil following data mining needs.