针对商务信息领域的产品命名实体,研究了产品命名实体各部分的结构特征和相互关系,建立了一个三层的半监督学习框架.该方法综合利用规则词典和统计的方法,建立一个隐条件随机场模型,可以更充分地利用自举得到数据的隐藏状态.在数码相机领域进行的实验结果表明,该方法只需要少量的手工标记数据就能较好地识别网页等文本中的产品命名实体.
A semi-supervised approach based on a three-level framework for product named entity recog- nition is presented. The structure features and relationships among different parts of product named enti- ties are studied, and a combined method is applied. A hidden conditional random field model is built so as to utilize the hidden status of learned samples. The labels failed to be learned by the bootstrapping al- gorithm is considered as hidden statuses. Experiment in digital camera area shows that, with only a few manually labeled data, this method could recognize product named entities from text contents of web pa- ges very well.