随着大数据时代的到来,与同一产品或者话题相关的网络评论在不同领域、语言等方面所呈现的前所未有的分散性和多样性给观点检索带来了巨大挑战。大数据环境下的观点检索不再等同于基于大规模数据的观点检索,而是涉及如何解决规模跨度、领域跨度、语言跨度等众多问题。文章从潜在语义索引、佩奇排名、映射/规约以及SQL-onHadoop等方面对解决规模跨度问题进行了探索;从共同特征选择、目标领域文档选择、查询词扩充、迁移学习等方面对解决领域跨度问题进行了分析;从多语词典构建、语料库对齐、用户反馈和用户行为、领域知识对齐等方面对解决语言跨度问题进行了研究。致力于解决大数据环境下的观点检索的可用性问题,丰富这一领域的研究内涵,促进观点检索方法的研究与应用。
With the advent of the era of big data,opinion retrieval is facing the grand challenges of the dispersed and diversified distribution of the reviews and comments regarding the same product or topic in different disciplines and languages.Opinion retrieval in the large data environment is no longer limited to the researches on large-scale information retrieval,but extended to resolving the scalability,cross-domain and cross-language problems.This paper discusses the approach to resolving the scalability problem,which includes latent semantic indexing,Page Rank,Map / Reduce and SQL-on-Hadoop.The paper analyzes the approach to resolving cross-domain problem,which includes common feature selection,document selection in the target domain,query expansion and transfer learning.The paper also researches on the approach to resolving cross-language problem,which includes multilingual dictionary construction,corpus alignment,user feedback and user behavior,and domain knowledge alignment.The purpose of the paper is to resolve the practicability problems of the opinion retrieval,as well as information retrieval,in the big data era,and thus to promote the research and application of opinion retrieval methods.