针对社交媒体数据搜索中存在的消息文本短、不利于构建索引,排名列表形式单一、无法展现社交媒体数据的整体结构的问题,通过挖掘社交媒体数据隐含的多重语义特征,强化索引结构,提出基于可视分析方法的搜索系统,采用交互式界面可视化表达语义特征,使得搜索更准确.以推特数据为研究对象,基于时间上的语义相关性,首先抽取数据中隐含的话题和命名实体集合;在此基础上构建层次语义图模型,简化数据的内在语义关系,同时为可视化搜索提供必要的索引结构;用户浏览数据时,分裂环形图表示数据的多重语义特征,系统提供多种交互方式方便用户探索更多信息.案例分析结果表明,相对于连线和气泡图特征模式,分裂环形图更加明显,方便用户寻找关注的消息;用户调查结果反映出该方法较传统的搜索方式更容易找到想要的结果.
There are two main challenges for social media search. First, the messages are short, and are quite hard to construct indices. Second, the ranking list is too simple to fairly express the global structure of social media data. This paper proposes a visual search system which discovers inner semantic features from the raw data to strengthen index structures and provides an interactive interface to visualize features and filters search results. By using Twitter data as an example, our approach extracts topics and named entities based on temporal relationships. Then, a hierarchical semantic graph model is built to simplify semantic relations between topics and named entities. In the meantime, the model provides an essential index for visual query. During exploration, a set of split rings is employed to show multiple semantic patterns, together with informative interactions. Case studies demonstrate that the split ring representation preserves more obvious patterns than linked lines and bubble sets, and facilitates convenient search of messages of interest. User studies indicate that this system can find targets more easily than conventional systems.