与拼音文字不同,用户在进行中文输入时需要借助输入法软件完成从拼音串到汉字串的转换过程,输入法因此成为中文用户进行人机交互的基础性工具,而输入法的相关技术研发也一直是学术界与产业界的关注热点。在中文输入法技术的研究中,用户的行为特点对输入法软件的词库建立、算法设计、交互方式设计与性能评价等多方面都有着至关重要的作用,但由于数据获取与分析的困难,这方面的相关研究尚不多见。该文利用某中文输入法在用户许可下收集的超过4.1亿条用户输入行为记录,进行了中文输入法用户行为的分析研究,针对不同类别应用程序的输入词频差异,不同用户在同类应用程序中的不同候选词条的选择等行为特点进行了挖掘分析,研究结果会对深入了解中文输入法用户行为,进而改进输入法软件性能具有一定的指导意义。
Different from alphabetic languages,input software is required to transform PinYin strings into characters for Chinese language.Input software therefore plays an important role in HCI process for Chinese users.In the research field of Chinese input method,it is important to look into users' behavior information to improve the qualityof dictionary construction,the algorithm,the interaction design as well as the performance evaluation.However,there lacks such works due to the difficulties in collecting corresponding behavior data.With the help of a widely-used Chinese input software company,we collected user input logs under users' agreement which contain 410 million input strings.With analysis into these input logs,we focused on the following behavior features: input string length distribution,character/word/phrase selection for different kinds of application software and the adoption of abbreviations.Conclusions help us to better understand users' input behavior and show possible ways to improve input software designation.