图像中的文本区域为判别图像垃圾邮件提供了重要依据。为了获得图像中的文本区域信息,提出了基于Hough变换提取图像中倾斜文本区域的算法和降低图像背景干扰的八邻域细小边缘去除算法,实现了一种不受图像中文本颜色、字体、大小、位置、方向限制的文本区域的自动提取方法。在包含100幅垃圾图像的数据集上进行提取图像文本区域的实验。实验结果显示,新方法具有良好的文本区域提取性能。
Text regions provide an important clue for filtering image spam. To get the information of the text region in image spam, an algorithm based on Hough transform was proposed for slant text region extraction, and a tiny region removal algorithm based on eight-neighbor pixels was also proposed for effectively eliminating the disturbance of background image. The two algorithms were integrated to implement an approach of automatic extraction of the text region. The new approach was insensitive to the orientation, location, color, font, and size of the text. The simulation experiments were carried on among a collection of 100 spam images. Results show a good performance of text region extraction.