根据生物医学文本中基因名的特点,提出了一组新特征用于基因名的识别。利用精简的特征集,将提出的新特征融合进精简特征集中。应用GlobalLinear模型和感知机学习算法在BioCreativeⅡ数据集中对提出的方法进行了验证,结果表明,通过使用数量较少的、区分能力强的特征,仍能使系统达到较高的性能。当融合新特征时,系统的精确率和召回率也有一定的提高。
Based on the features in biomedical text, a new feature method was proposed to recognize gene names. A reductive feature set combined with some new features was employed in the form of gene lexi- cons, applying the method on BioCreative Ⅱ shared dataset with global linear framework and perceptron learning algorithm. Results of the experiment show that in the case of reductive and strong classification features, the system still obtain high performance. When incorporate new features, the precision and recall continue improved to some extent.