数据库查询技术作为目前计算机审计的主要方法,是一种通过审计人员先验知识发现审计疑点的方法。但当缺乏相关审计知识时,便难以给出从海量数据中发现疑点的方法。为破解这一难题,提出基于迭代式聚类的审计疑点发现方法。该方法可在无先验知识的情形下,通过对审计指标的分析,将与大多数被审计对象行为明显相异的少数对象自主识别为审计疑点。利用多种非结构化信息及网络爬取技术.从140份审计报告中自动提取出高频审计问题并据此选定财务指标;归集2008-2012年913家上市公司的财报数据,应用迭代式聚类方法,挖掘出68家疑点公司进行分析。并利用证监会等机构的非结构化网络信息,验证了此方法的有效性。验证结果表明:迭代式聚类方法有助于从海量数据中自主发现审计疑点,缩小疑点筛查范围,提高审计效率。
As a main IT audit method at present, audit method based on database query technology utilizes prior knowledge of auditors to find out audit doubts. However, when lack of relevant knowledge, auditors can hardly identify doubtful points in mass audit data. In this situation, clustering technology can automatically detect audit doubts explicitly different from majority auditees by analyzing audit indicators. We took advantage of various unstructured information and web crawling technology, automatically extracted audit findings with high frequency from 140 audit reports and selected financial indicators, collected financial statements data of 913 listed companies from 2008 to 2012, and run iterative clustering. Finally, we dug out 68 companies with audit doubts. Wecompared our results with the unstructured information disclosed by China Securities Regulatory Commission and other organizations, and proved the effectiveness of this method.