提出通过图表标题信息来检测在线生物文献中核磁共振图像的新方法.学术文献中每张图表都有对应的图表标题,而图表一般由多个嵌图组成,图表标题中不同文本是对不同嵌图的文字解释.将图表标题分割成与嵌图匹配的嵌图标注,利用嵌图标注来完成核磁共振图像的检测.依托正则语言理论,寻找图表标题中指向嵌图的图像指针,图像指针将图表标题分割成嵌图标注并与对应嵌图进行匹配.在分析嵌图标注的基础上,提出嵌图混合标注方法,根据图表仅包含同类型嵌图和包含不同类型嵌图2种情况,分别采用嵌图标注或者整个未分割标题作为图像识别的文本特征.实验结果表明,该方法可以很好地识别在线生物文献中的核磁共振图像.
A new method was presented to identify magnetic resource images (MRI) from online biological literature, which was based on the captions of figures in literature. In academic papers, every figure has its corresponding caption. A figure is often consisted of more than one panel and its different parts of caption cover their corresponding panels. Therefore, a caption requires to be segmented into several panel-annota- tions to identify the MRI images in panels. Regular expression theory was employed to find the image points in caption and use them to cut a caption into the panel-annotations, so that the panel-annotations were matched to the panels. A mix-annotation method was proposed according to the two different cases that the panels in a figure were the same type or not, in which the panel-annotation or the total caption was selected relying on which case the figure was. Experimental results show the method has a better perform- ance of detecting MRIs from online biological literature.