口语理解是实现口语对话系统的关键技术之一.它主要面临两方面的挑战:1)稳健性,因为输入语句往往是病态的;2)可移植性,即口语理解单元应能够快速移植到新的领域和语言.提出了一种新的基于两阶段分类的口语理解方法:第1阶段为主题分类,用来识别用户输入语句的主题;第2阶段为主题相关的语义槽分类,根据识别的主题抽取相应的语义槽/值对.该方法能对用户输入语句进行深层理解,同时也能保持稳健性.它基本上是数据驱动的,而且训练数据的标记也比较容易,可方便地移植到新的领域和语言.实验分别在汉语交通查询领域和英语DARPA Communicator领域进行,结果表明了该方法的有效性.
Spoken language understanding (SLU) is one of the key components in a spoken dialogue system. One challenge for SLU is robustness since the speech recognizer inevitably makes errors and spoken language is plagued with a large set of spontaneous speech phenomena. Another challenge is portability. Traditionally, the rule-based SLU approaches require linguistic experts to handcraft the domain-specific grammar for parsing, which is time-consuming and laboursome. A new SLU approach based on two-stage classification is proposed. Firstly, the topic classifier is used to identify the topic of an input utterance. Then, with the restriction of the recognized target topic, the semantic slot classifiers are trained to extract the corresponding slot-value pairs. The advantage of the proposed approach is that it is mainly data-driven and requires only minimally annotated corpus for training whilst retaining the understanding robustness and deepness for spoken language. Experiments have been conducted in the Chinese public transportation information inquiry domain and the English DARPA Communicator domain. The good performance demonstrates the viability of the proposed approach.