社会信息化的飞速发展使得社会化信息日益丰富。这些信息会对股市波动产生一定影响,然而这些信息数量巨大且多属于非结构化数据,使得分析社会化信息对市场的影响具有一定的难度。尝试通过分布式计算技术来解决这个问题,并从信息量和信息情感两个方面考察社会化信息对股市的影响。通过搭建SparkR平台,首先讨论如何利用该平台解决大数据环境下股市社会化信息的特征选择以及情感分类问题,其次对比了信息量和信息情感对市场的影响情况,说明信息情感变化更能准确说明市场的波动变化情况。为进一步验证方案的可行性,定义了不同的情感计量方式并对比了不同方案的优缺点,进而给出分析社会化信息对股市波动影响的综合解决方案,并且通过实验验证了该方案的有效性。
With the rapid development of information society, social intormation is becoming richer. Thls reformation will influence the stock market volatility. However, the data is in huge amount, and most of them are unstructured, thus increasing the difficulty of analyzing the impact of the social information on the market. This problem is trying to be solved by the distributed calculating technology and discuss the impact of social information on the stock market from two aspects of information volume and information sentiment. By building SparkR platform firstly how to use this platform to solve the problem of feature selection and sentiment classification of social information of the stock market is discussed. Secondly, the impact on the market of information volume and sentimental information is compared. Experimental result shows the sentimental information can accurately describe the changes of market. In order to validate the feasibility of this solution further, this paper defines the different sentimental measurement method, compares the advantages and disadvantages of these different solutions, and then gives the integrated solutions for analyzing the impact of social information on stock market volatility. Finally, the effectiveness of the proposed solution is verified by experiment.