文本区域定位对复杂背景图像中的字符识别和检索具有重要意义.已有方法取得高的定位准确率和召回率,但效率较低,难以应用于实际的系统中.文中提出一种基于连通分量过滤和K-means聚类的文本区域定位方法.该方法首先对图像进行自适应分割,对字符颜色层提取连通分量.然后提取连通分量的特征,并用Adaboost分类器过滤非字符连通分量.最后,对候选的字符连通分量根据其位置和颜色层进行K-means聚类来定位文本区域.实验结果显示该方法具有与当前方法相当的准确率和召回率,同时具有较低的计算复杂度.
Text region location is important to text recognition and retrieval in images of complex background. The existing methods with precision and recall rate have high computational complexity. These methods are unpractical real environment. A text region location method is proposed based on component filtering and K-means clustering. Firstly, the input image is segmented into three layers by an adaptive image segmentation method, and the components are extracted from the character layers. Then, the features of the component are obtained, and Adaboost classifier is used to filter non-character components. The candidates of character components are grouped into text regions by K-means clustering based on the position and layer of the component. The experimental results demonstrate that the precision and the recall rate of the proposed approach is almost the same that of as the other methods, and the proposed method has lower computational complexity.