随着Web服务技术的不断成熟和发展,互联网上出现了大量的公共Web服务.在使用Web服务开发软件系统的过程中,其文本描述信息(例如简介和使用说明等)可以帮助服务消费者直观有效地识别和理解Web服务并加以利用.已有的研究工作大多关注于从Web服务的WSDL文件中获取此类信息进行Web服务的发现或检索,调研发现,互联网上大部分Web服务的WSDL文件中普遍缺少甚至没有此类信息.为此,提出一种基于网络信息搜索的从WSDL文件之外的信息源为Web服务扩充文本描述信息的方法.从互联网上收集包含目标Web服务特征标识的相关网页,基于从网页中抽取出的信息片段,利用信息检索技术计算信息片段与目标Web服务的相关度,并选取相关度较高的文本片段为Web/]g~务扩充文本描述信息.基于互联网上的真实数据进行的实验,其结果表明,可为约51%的互联网上的Web服务获取到相关网页,并为这些Web服务中约88%扩充文本描述信息.收集到的Web服务及其文本描述信息数据均已公开发布.
With the development of Web services technologies, more and more public Web services have been published on the Internet. During the searching and utilizing of these public services, services' textual descriptions (such as introduction and user manual), which are generally expressed in natural language, provide great help for service consumers to locate, understand, and utilize proper Web services. Existing methods for services discovery usually try to obtain such descriptions only from services' WSDL files. However, according to this investigation, lots of Web services do not contain enough textual descriptions in their WSDL files. This paper proposes an approach to enriching textual descriptions for public Web services on the Internet using the information sources outside of WSDL files. Given a Web service, the study collects related Web pages containing its features from the Internet. Then, the enriched descriptions for the service are identified from the Web pages using information retrieval technologies. Experiments conducted on real data indicate that our approach can enrich descriptions for about half of the public services on the Internet effectively. The collected data is publicly available on the Internet.