It appears you don't have support to open PDFs in this web browser. To view this file, Open with your PDF reader
Abstract
Table is one of the ways to visualize information on web pages. The abundant number of web pages that compose the World Wide Web has been the motivation of information extraction and information retrieval research, including the research for table extraction. Besides, there is a need for a system which is designed to specifically handle location-related information. Based on this background, this research is conducted to provide a way to extract location-related data from web tables so that it can be used in the development of Geographic Information Retrieval (GIR) system. The location-related data will be identified by the toponym (location name). In this research, a rule-based approach with gazetteer is used to recognize toponym from web table. Meanwhile, to extract data from a table, a combination of rule-based approach and statistical-based approach is used. On the statistical-based approach, Conditional Random Fields (CRF) model is used to understand the schema of the table. The result of table extraction is presented on JSON format. If a web table contains toponym, a field will be added on the JSON document to store the toponym values. This field can be used to index the table data in accordance to the toponym, which then can be used in the development of GIR system.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 School of Electrical Engineering and Informatics, Bandung Institute of Technology