基于大规模微博语料库,构建了3个词同现语言网络,并采用复杂网络分析工具对这些语言网络进行分析。主要目的是探索复杂网络分析方法应用于微博文本的可行性,进而研究微博语言网络的个性特征。研究结果表明,复杂网络分析方法在微博文本上是可行的,在复杂网络的相关参数,如度分布、聚类系数、平均最短路径等方面反映了微博语言的语体特征。该研究不仅拓展了复杂网络方法在语言学领域的应用,而且为基于复杂网络的微博内容挖掘提供了可行途径。
Based on the large-scale MicroBlog text corpus, three different Microblog word co-occurrence language networks are constructed, and their network characteristics are analyzed by using complex network analysis tools. The main purpose of this paper is to explore the feasibility of applying complex network analysis methods to the MicroBlog text for studying MicroBlog language network' s special characteristics. The experimental results show that the complex network methods are feasible for MicroBlog text. MicroBlog text characteristics are described by the complex network' s parameters, such as degree distribution, clustering coefficient, average shortest path, etc. This research extends the applications of complex network methods into linguistics domain, and provides an effective data mining method on MicroBlog text based on complex network.