以“深蓝”的技术原理为比对,研究了AlphaGo有监督学习策略网络、快速走子模型、增强学习策略网络和价值网络等核心模块,较为详细地分析了策略网络、价值网络引导的蒙特卡洛树搜索算法的实现;以AlphaGo的技术突破为起点,展望了人工智能在物理域、信息域、认知域和社会域上的可能应用,分析了美国国防部高级研究计划局资助的人工智能军事应用项目;以OODA循环理论为基础,研究了人工智能应用于军事领域可能会带来的颠覆性效果.
Compared with chess-playing program ”Deep Blue”, supervised learning of policy networks,rollout policy,reinforcementlearning of policy networks and reinforcement learning of policy networks of AlphaGo are studied. A Monte Carlo tree search(MCTS)algorithm guiding by the policy and value networks is analyzed. Based on AlphaGo’s technological breakthroughs, potential applicationsof artificial intelligence(AI) in physics domain, information domain, cognition domain and social domain of war space are forecasted,and AI programs funded by Defense Advanced Research Projects Agency(DARPA) are analyzed. Finally,the revolutionary impacts ofAI on military domain are studied based on the Observation, Orientation, Decision, Action(OODA) loop theory.