在总结前人研究的基础上,将少儿华语、主题、词表研究三者相结合,基于语文百科性,以12套代表性的东南亚少儿华语教材为语料,构建了一个分层级的少儿华语主题库;运用计算语言学的相关技术实现主题词语聚类,并人工干预筛选出那些与主题密切相关、使用频率高、难度较低的词语,按相关度、常用度排序;少儿华语主题分类词表共60个三级话题,2970个词条。
Constructed the Youth Chinese topic bank and topic-specific word list at the basis of previous studies, which com- bines the research of Youth Chinese, topic and word list. First, 12 sets of representative Youth Chinese textbooks were select- ed as the source of corpus, and a hierarchical topic bank for Youth Chinese was set up in the principle of encyclopedia of Chi- nese character. Then, got the cluster of topic words by using relevant technologies of Computational Linguistics, selected those closely related, widely used and relatively easy words in the cluster and rank those selected words according to the us- age. The Youth Chinese topic-specific word list contained 60 topics at Level Three and 2970 words.