在很多场合的应用中,利用图片做数据挖掘之前,有必要区分计算机生成图片和相机拍摄图片。本文针对互联网上的商品图片集(主题为商品的图片,简称商品图)。对现有的一些分辨计算机生成图片和相机拍摄图片的方法进行了讨论。分析了各种判别方法在商品图上运行时的局限性,并提出了一种适用于商品图的简单有效的方法。该算法首先从文字图片和图像图片具有不同视角效果的角度对二者进行区分,并将文字图片认定为计算机生成图片;然后对剩余的图像图片的主颜色,颜色多样性等进行综合的判定区分出计算机生成图片。本文在一个拥有40000张的网上商品图片库上验证了算法的有效性。
With the development of online shopping, picture presentation as a shopping guide plays an increasingly important role in the online transaction. In many applications, it is very necessary to distinguish graphics and photographs before data mining. Based on the pictures of goods on the Internet, several ways are discussed to distinguish graphics from photographs, and their limits are analyzed. Then, a simple and effective algorithm is proposed. Firstly, text-images were divided from photographs by human visual habits, and token as graphics. Then, according to the analysis of dominant color and color diversity of the left images, the part generated by computer was figured out. The result on an online goods picture library including 40 000 pictures demonstrates the effectiveness and applicability of the algorithm.