垃圾网页检测具有重要意义,由于只有少量标记网页,所以可使用半监督协同训练方法检测垃圾网页。将网页特征分为两个视图,即内容视图与链接视图。首先使用独立成分分析分别提取两视图特征的独立成分,然后进行协同训练。实验结果表明,该方法可有效提高垃圾网页检测精度,同时验证了对两个视图分别进行独立成分分析相比于其他方法更为有效。
Web spam detection is of great significance,and there only exists a small number of labeled pages.Thus,the semi-supervised co-training was used to detect the Web spam pages.The page features were divided into two views,the content view and the link view.First,the independent components of each view were extracted by the independent component analysis,and then the co-training was used to detect the label of each Web page.Experimental results showed that this method could effectively improve the recognition accuracy of Web spam.The results also verified that two respective independent component analyses of each view were more effective than the other methods.