如何在信息时代增加馆藏资源的可见度,提高馆藏资源的利用率,是一个急需研究和解决的问题。实时新闻和图书馆馆藏资源间的连接可以提高图书馆馆藏资源的可见度,增加图书馆馆藏资源的利用率.为用户提供丰富、全面的阅读资料和专业知识,帮助用户形成全面、深入阅读和思考的良好习惯。基于快数据处理技术的实时新闻分析和馆藏资源推荐框架,通过分析网络实时新闻获取用户感兴趣的话题,应用快数据处理技术、潜在语义分析、非负矩阵分解、权重矩阵分解等方法对数据进行语义分析和处理,对图书馆馆藏资源进行相关话题的分类和推荐。对OCLC的百万数据集和雅虎新闻的分析和实验表明,这种资源推荐框架和方法有较好的应用效果。
With the development of the web, reading is more regarded as a kind of entertainment such as reading twitter or blog than the study with in-depth thoughts. The real-time news, for example, is a kind of popular web-published information which can help people catch the update news from the web. At the same time, the works in library which contain in-depth thoughts and domain knowledge are often overlooked in daily life. There is a lack of research for providing professional domain knowledge and extending reading list to users who are interested in the special topics mentioned in the real-time newswires. Meanwhile, there is a large scale of domain knowledge and application examples in library collections which can help users have a good understanding for those special topics. Hence, in this paper, we provide a novel method to link the corresponding real-time news and records in the library. The extending reading list from the library can berecommended with the technology of natural language processing and semantic analysis. We recommend the related library records to the users who are interested in the target news. We adopt natural language processing technology and LSA, NMF, and WMF methods to carry out our experiments. We use the catalogue records corpora: WorldCat-million dataset released by the OCLC in 2012. The dataset contains metadata records of nearly 1.2 million materials most widely held in libraries. The metadata contains approximately 80 million linked data triples, which can help users find the linked resources easily on the web. For the corpus of news articles, we collect the news articles of Yahoo! news from RSS feeds, dated from the 5th of April to 7th of July, 2014, totally 95 days. In order to get an objective observation of the performance, we randomly selected 500 news articles ( about 10% of the news articles set) for evaluation. The results are evaluated with TOP10 recall hit rate, from which we can see WMF has better performance than KSA and NMF. This newswire-library linking offers