为了准确挖掘出同一主题的大量网络新闻的线索发展脉络,该文提出了一种基于条件随机场模型的网络新闻主题线索发掘方法。首先,根据新闻主题线索句的识别规则提取出相关特征,并应用到条件随机场模型中提取出主题线索句;然后,按照时间顺序构建原始线索链;最后,对语义相近的原始线索链进行合并处理,获得最终的新闻主题发展脉络。实验结果表明,该方法在主题线索句识别上有较好的效果,最终得到的主题线索脉络能够较清晰地展现新闻发展趋势。
To accurately find out the clues of the same topic from a large number of Web news, a method of topic clues mining is proposed based on the Conditional Random Fields model. Firstly, according to the identification rules of the topic sentence, the relative characteristics were extracted and utilized on the Conditional Random Field model to get the candidate topic sentences. Then the lexical chains of topic clues were built by chronological order and lexical weight. Finally the similar clue chains in semantic needed to be merged and the whole development context of network news can be described. The experiment results show the method proposed achieves a good performance on the topic clue sentence extraction and the topic clue chains obtained can clearly show the development trend of network news.