语义知识资源蕴含了深刻的语言学理论,是语言学知识和语言工程的重要接口。该文以形容词句法语义词典为研究对象,探索对语义知识资源自动扩展的方法。该文的目标是利用大规模语料库,扩展原有词典的词表及其对应的句法格式。具体方法是根据词的句法格式将词典的词分类,将待扩展的新词通过分类器映射到原有词典的词中,以此把词典扩展问题转化为多类分类问题。依据的原理是词典词和待扩展新词在大规模语料中句法结构的相似性。该文通过远监督的方法构造训练数据,避免大量的人工标注。训练过程结合了浅层机器学习方法和深度神经网络,取得了有意义的成果。实验结果显示,深度神经网络能够习得句法结构信息,有效提升匹配的准确率。
The semantic knowledge resources containing extensive linguistic information are one of the important interfaces of linguistics and language engineering.In this paper,we study the automatic expansion of semantic knowledge resources by the example of the Adjective Syntactic-Semantics Dictionary.We aim to extend the vocabulary of the dictionary and their syntactic patterns via the large corpus.More specifically,our method is to classify the words in dictionary into 97 categories by their syntactic patterns,and mapping the new words which are not existing in the dictionary into each category,thereby the whole task can be treated as a multi-class classification issue.The method is based on the fact that the new words and the dictionary words have the similar syntactic patterns in large corpus.We construct the training data by distance supervision,so as to reduce the effort of manual annotation.Training process combines the shallow learning and the deep neural network,which achieves the promising results.The experimental results show that the deep neural network is able to learn the syntactic information,and effectively improve the accuracy in the mapping task.