确定实体之间的关系有助于更好的理解文本内容,通过实体关系模板可以从海量无结构的文本中获取大量的实体关系,并予以结构化.本文针对互联网藏文文本的特点,通过对藏文实体进行模板表示,采用基于word2vec的无监督词义相似度计算方法,构建近义词资源,实现了藏文词义相似度计算系统,最终构建一种基于相似度计算的实体关系模板获取模型.通过网络爬虫抓取青海湖藏文网的语料进行试验,实验结果表明本文提出的藏文实体关系模板抽取方法较为有效,达到了较好的实验效果.
Extracting entity relations is benefcial to understand the meanings of text. By the entity relation templates, we can get a lot of entity relation and structured data from the massive unstructured text. According to the characteristics of Tibetan text from the internet, the paper studies the Tibetan template representations, and implements an unsupervised Tibetan semantic similarity system based on word2vec, finally implement a Tibetan entity relation templates extraction model based on similarity calculation model. We studies the mode by crawling the amdotibet. The experimental results show that our model is effective, and achieved a good results.