针对网络用户兴趣行为特征的抽取,提出了一种基于隐半马尔可夫模型的用户兴趣特征提取模型,通过用状态驻留时间的概率来控制用户浏览行为,使描述兴趣特征的隐状态和时间的相关性更紧密地结合起来,并且根据隐半马尔可夫模型可以产生多观察值序列的特性,把文本信息划分成多个文本块子区域,使每个子区域的特征和其中一个观察值序列对应起来。实验结果表明,利用隐半马尔可夫模型进行特征提取比HMM方法有更高的准确率和召回率。
For the extraction of users interests behavior feature,a method of user interests feature extraction based on hidden semi-Markov model is proposed,which can control the user's browsing behavior through by using the probability of state stay time,and combine the hidden state of described interest feature with the relevance of time tightly.According to the characteristic that hidden semi-Markov model can generate multiple sequences of observations,the text information is divided into several sub-regions,so that the feature of each sun-regions and the sequence of observations can correspond one to another.Experiments show that using HSMM has higher accuracy and recall than the HMM method for feature extraction.