Content area

Abstract

Purpose

The purpose of this paper is to propose a knowledge extraction framework to extract knowledge, including entities and relationships between them, from unstructured texts in digital humanities (DH).

Design/methodology/approach

The proposed cooperative crowdsourcing framework (CCF) uses both human–computer cooperation and crowdsourcing to achieve high-quality and scalable knowledge extraction. CCF integrates active learning with a novel category-based crowdsourcing mechanism to facilitate domain experts labeling and verifying extracted knowledge.

Findings

The case study shows that CCF can effectively and efficiently extract knowledge from multi-sourced heterogeneous data in the field of Tang poetry. Specifically, CCF achieves higher accuracy of knowledge extraction than the state-of-the-art methods, the contribution of feedbacks to the training model can be maximized by the active learning mechanism and the proposed category-based crowdsourcing mechanism can scale up the effective human–computer collaboration by considering the specialization of workers in different categories of tasks.

Research limitations/implications

This research proposes CCF to enable high-quality and scalable knowledge extraction in the field of Tang poetry. CCF can be generalized to other fields of DH by introducing domain knowledge and experts.

Practical implications

The extracted knowledge is machine-understandable and can support the research of Tang poetry and knowledge-driven intelligent applications in DH.

Originality/value

CCF is the first human-in-the-loop knowledge extraction framework that integrates active learning and crowdsourcing mechanisms; he human–computer cooperation method uses the feedback of domain experts through the active learning mechanism; the category-based crowdsourcing mechanism considers the matching of categories of DH data and especially of domain experts.

Details

Business indexing term
Title
A cooperative crowdsourcing framework for knowledge extraction in digital humanities – cases on Tang poetry
Alternate title
Digital humanities
Publication title
Volume
72
Issue
2
Pages
243-261
Number of pages
19
Publication year
2020
Publication date
2020
Publisher
Emerald Group Publishing Limited
Place of publication
Bradford
Country of publication
United Kingdom
ISSN
20503806
e-ISSN
17583748
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2020-02-25
Milestone dates
2019-07-31 (Received); 2019-11-15 (Revised); 2020-01-04 (Revised); 2020-01-05 (Revised); 2020-01-14 (Revised); 2020-01-27 (Accepted)
Publication history
 
 
   First posting date
25 Feb 2020
ProQuest document ID
2498985491
Document URL
https://www.proquest.com/scholarly-journals/cooperative-crowdsourcing-framework-knowledge/docview/2498985491/se-2?accountid=208611
Copyright
© Emerald Publishing Limited 2020
Last updated
2025-11-14
Database
ProQuest One Academic