Content area

Abstract

In this paper, the author proposes a novel heterogeneous Web data extraction algorithm using a modified hidden conditional random fields model. Considering the traditional linear chain based conditional random fields can not effectively solve the problem of complex and heterogeneous Web data extraction, the author modifies the standard hidden conditional random fields in three aspects, which are using the hidden Markov model to calculate the hidden variables and modifying the standard hidden conditional random fields through two stages. In the first stage, each training data sequence is learned using hidden Markov model, and then implicit variables can be visible. In the second stage, parameters can be learned for a given sequence. Finally, experiments are conducted to make performance evaluation on two standard datasets -- "EData dataset" and "Research Papers dataset". Compared with the existing Web data extraction methods, it can be seen that the proposed algorithm can extract useful information from heterogeneous Web data effectively and efficiently.

Details

Title
Heterogeneous Web Data Extraction Algorithm Based On Modified Hidden Conditional Random Fields
Author
Publication title
Volume
9
Issue
4
Pages
993-999
Number of pages
7
Publication year
2014
Publication date
Apr 2014
Publisher
Academy Publisher
Place of publication
Oulu
Country of publication
Finland
ISSN
17962056
Source type
Scholarly Journal
Language of publication
English
Document type
Feature
Document feature
Diagrams; Equations; Graphs; Illustrations; Tables
ProQuest document ID
1520576292
Document URL
https://www.proquest.com/scholarly-journals/heterogeneous-web-data-extraction-algorithm-based/docview/1520576292/se-2?accountid=208611
Copyright
Copyright Academy Publisher Apr 2014
Last updated
2023-11-28
Database
ProQuest One Academic