位置:成果数据库 > 期刊 > 期刊详情页
TST: Threshold Based Similarity Transitivity Method in Collaborative Filtering with Cloud Computing
  • ISSN号:1007-0214
  • 期刊名称:Tsinghua Science and Technology
  • 时间:2013.6.15
  • 页码:318-327
  • 分类:TP391.41[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术] TP393.098[自动化与计算机技术—计算机应用技术;自动化与计算机技术—计算机科学与技术]
  • 作者机构:[1]Department of Automation, Research Institute of Information Technology and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University,Beijing100084,China, [2]Research Institute of Information Technology and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University,Beijing100084,China, [3]Department of Computer Science and Technologies and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University,Beijing100084,China, [4]Department of Electronic Engineering and Tsinghua National Laboratory for Information Science and Technology (TNList), Tsinghua University,Beijing100084
  • 相关基金:supported by Ministry of Science and Technology of China under the National Key Basic Research and Development (973) Program of China (Nos. 2012CB315801 and 2011CB302805); the National Natural Science Foundation of China A3 Program (No. 61161140320);the National Natural Science Foundation of China (No. 61233016); supported by Intel Research Council with the title of Security Vulnerability Analysis based on Cloud Platform with Intel IA Architecture
  • 相关项目:下一代互联网安全与隐私关键性技术的研究
中文摘要:

Collaborative filtering solves information overload problem by presenting personalized content to individual users based on their interests, which has been extensively applied in real-world recommender systems. As a class of simple but efficient collaborative filtering method, similarity based approaches make predictions by finding users with similar taste or items that have been similarly chosen. However, as the number of users or items grows rapidly, the traditional approach is suffering from the data sparsity problem. Inaccurate similarities derived from the sparse user-item associations would generate the inaccurate neighborhood for each user or item. Consequently, its poor recommendation drives us to propose a Threshold based Similarity Transitivity (TST) method in this paper. TST firstly filters out those inaccurate similarities by setting an intersection threshold and then replaces them with the transitivity similarity. Besides, the TST method is designed to be scalable with MapReduce framework based on cloud computing platform. We evaluate our algorithm on the public data set MovieLens and a real-world data set from AppChina (an Android application market) with several well-known metrics including precision, recall, coverage, and popularity. The experimental results demonstrate that TST copes well with the tradeoff between quality and quantity of similarity by setting an appropriate threshold. Moreover, we can experimentally find the optimal threshold which will be smaller as the data set becomes sparser. The experimental results also show that TST significantly outperforms the traditional approach even when the data becomes sparser.

英文摘要:

Collaborative filtering solves information overload problem by presenting personalized content to individual users based on their interests, which has been extensively applied in real-world recommender systems. As a class of simple but efficient collaborative filtering method, similarity based approaches make predictions by finding users with similar taste or items that have been similarly chosen. However, as the number of users or items grows rapidly, the traditional approach is suffering from the data sparsity problem. Inaccurate similarities derived from the sparse user-item associations would generate the inaccurate neighborhood for each user or item. Consequently, its poor recommendation drives us to propose a Threshold based Similarity Transitivity (TST) method in this paper. TST firstly filters out those inaccurate similarities by setting an intersection threshold and then replaces them with the transitivity similarity. Besides, the TST method is designed to be scalable with MapReduce framework based on cloud computing platform. We evaluate our algorithm on the public data set MovieLens and a real-world data set from AppChina (an Android application market) with several well-known metrics including precision, recall, coverage, and popularity. The experimental results demonstrate that TST copes well with the tradeoff between quality and quantity of similarity by setting an appropriate threshold. Moreover, we can experimentally find the optimal threshold which will be smaller as the data set becomes sparser. The experimental results also show that TST significantly outperforms the traditional approach even when the data becomes sparser.

同期刊论文项目
同项目期刊论文
期刊信息
  • 《清华大学学报:自然科学英文版》
  • 主管单位:教育部
  • 主办单位:清华大学
  • 主编:孙家广
  • 地址:北京市海淀区清华园
  • 邮编:100084
  • 邮箱:journal@tsinghua.edu.cn
  • 电话:010-62788108 62792994
  • 国际标准刊号:ISSN:1007-0214
  • 国内统一刊号:ISSN:11-3745/N
  • 邮发代号:82-627
  • 获奖情况:
  • 国内外数据库收录:
  • 美国化学文摘(网络版),美国数学评论(网络版),德国数学文摘,荷兰文摘与引文数据库,美国工程索引,美国剑桥科学文摘
  • 被引量:323