针对非结构化的海量文档获取困难的问题,设计和开发了基于云存储的网络文档共享系统。该系统采用了Hadoop和Lucene以及Mahout来实现对文档存储、全文检索和推荐。通过测试证明,网络文档共享系统可以使用户更快速高效地获取文档。
In view of the difficult problem of massive document acquisition, the network document sharing system based on cloud storage is designed and developed. The system uses Hadoop,Lucene and Mahout to achieve the document storage, full-text search and recommendation. The test shows that the network file sharing system can be used to obtain the documents more quickly and efficiently.