[目的/意义]预测建模是数据挖掘的基础任务之一,当前基于信息保护的预测模型大多建立在一个中心化的架构之下,因而不可避免地存在一些安全性和鲁棒性漏洞。因此尝试一种新的、去中心化的预测建模方法,能够兼具敏感信息保护和数据处理能力,无论对跨机构互操作还是对国家层面的信息安全与共享服务,均有重要的意义。[方法/过程]文章提出了一个新的预测模型架构——模型链,将基于信息保护的在线机器学习与私有区块链网络技术相互整合,应用交易元数据传递局部模型,并提出一种新的信息证明算法以确定在线学习进程的执行顺序。[结果/结论]在模型链中,每个参与的站点都将有助于模型参数估计,但无需透露己方的任何信息(即只有模型参数而没有用户数据在机构间进行交换)。基于区块链技术的预测建模能够有效提高机构问互操作性,同时规避了敏感信息泄露、数据处理需要中央服务器引导而产生的种种系统性、政策性风险。该方法将有助于支持全国范围内的信息共享服务的路线图设计。[局限]模型链的技术实施还有待结合具体的网络环境进行评估,这也是开展后续研究的目标。
[ Purpose/significance ] Predictive modeling is a basic task of data mining. The existing predictive models based on information protection mostly establish on a centralized framework, which inevitably exist some robustness and security vulnerabilities. Therefore, it is necessary to establish a new decentralized predictive modeling method, which can be used to protect sensitive data with sound data processing ability. This has great significance on the national level of information sharing service, as well as cross organizational interoperability. [ Method/process] This paper proposes a new predictive modeling architecture: model chain, which integrates online machine learning based on information protection with a private block chain network, applies transaction metadata to transfer partial models, and guarantees the execution of online learning process with a new algorithm to prove informa- tion. [ Result/conclusion] In the proposed model chain, each of the participating sites contributes to the estimation of model pa- rameters without disclosing their users' information ( i. e. model parameters instead of user data are exchanged between sites). Based on block chain technologies, predictive modeling can effectively improve cross organizational interoperability, and at the same time avoid the disclosure of sensitive information, and the systematic data processing and policy risks generated by a central server' s guide. The method proposed in the present paper is helpful to support the roadmap design for national-wide information sharing services. [ Limitations ] Technical implementations of the model chain still need to be verified under specific network environment, which is also the goal of follow-up research.