同义词在信息检索、自动文摘、情感分析、机器翻译等应用中都发挥着重要的作用。该文提出在大规模语料中结合潜在语义分析与上下文互信息进行同义词挖掘的方法,分析了不同的词汇上下文窗口选择、权值计算、潜在语义分析降维、余弦相似度计算在同义词抽取中的作用。实验结果表明,同义词抽取的效果明显提高。
Synonym plays an important part in many natural language processing applications, such as information retrieval, auto-matic summarization, sentiment analysis and machine translation. This paper presents a synonym mining method by combination of Latent Semantic Analysis(LSA) and context mutual information from large-scale corpus, investigating the different window-based context selection, the computation of weight, the dimensionality reduction of LSA, the cosine similarity play a part in syn-onym extraction. Experimental results show that the effectiveness of synonym extraction has improved obviously.