麻省理工学院学者近期发表在国际顶尖期刊《美国科学院院报》上的一项语言学交叉研究利用已经公开发布的依存树库,对37种语言进行了统计分析,指出人类语言存在依存距离最小化这一倾向。此研究被媒体热议,但却存在一些缺陷。依存距离是两个句法相关词之间的线性距离,受工作记忆机制的约束,与句法处理的复杂度密切相关。因此,人类语言具有依存距离最小化的倾向。基于句法标注语料库的依存距离最小化研究表明,大数据研究方法在语言认知研究中具有重要作用。现代语言学具有鲜明的交叉学科色彩,语言研究中不同学科的相互借鉴与融合有助于深入揭示语言系统的运作规律以及语言与认知之间的关系。
This interview examines a recent study on Dependency Distance(length)Minimization,introduces earlier works on and the significance of this topic.Dependency distance,or,dependency length,is taken as an insightful metric of syntactic complexity in the framework of dependency grammar(DG).According to dependency grammar,the syntactic structure of a sentence consists of nothing but dependencies between individual words— an assumption that is widely accepted not only in computational linguistics but also in theoretical linguistics.A dependency relation has the following core properties:it is a binary relation between two linguistic units;it is usually asymmetrical,with one of the two units actingas the governor and the other as dependent;it is classified in terms of a range of general grammatical relations,as shown conventionally by a label on top of the arc linking the two units.Sentences are linearly unfolded,and as a result,the governor and the dependent may or may not be adjacent.That is,there may be different linear distances between governors and dependents.This linear distance is termed as dependency distance(length),usually measured by the number of the intervening words between them,which is believed to have much to do with parsing(processing)difficulty.In terms of dependency grammar(DG),the syntactic parsing of a sentence is based on successive input of individual words,committed to establishing,at each parsing state,syntactic relation between the presently processed word and a previous one.As a cognitive activity,syntactic parsing is complemented via working memory,on which different burdens may be imposed by different dependency distances:the intervening words may either strain the capacity the WM or result in,owing to time-decay of memory,difficult retrieval of a previous word.Hence,longer dependency distance,or more intervening words,probably means more syntactic complexity and higher cognitive cost in processing.Given the cognitive possibility that dependency distance positively