The World Wide Web has become an important resource of information due to its explosive growth and spread in the past two decades. The tremendous amount of web data has opened a new era for data analysis and mining systems. More and more web applications need to extract, mine, and integrate data from enormous data sources. However, due to the semi - structure characteristic of web pages, web data exhibited on web pages is not directly consumable by machines. Web information extraction aims at extracting structured data from web pages, which is a very challenging problem clue to the large - scale and highly - heterogeneous characteristic of web data. This paper introduces the state - of - the - art web information extraction studies, analyzes the advantages and limitations of each method, and conducts categorization and comparison of existing approaches.