Abstract

There are several problems in Pattern Recognition and Data Mining that, by its inherent nature, consider that objects could belong to more than one class o cluster. DClustR is a dynamic overlapping clustering algorithm that has shown, in the task of document clustering, the better tradeoff between quality of the clusters and efficiency among the exiting dynamic overlapping clustering algorithms. Despite the good achievements attained by DClustR, this could be less useful in applications dealing with a large number of documents, due to it has a computational complexity of and the amount of memory that it uses in order to the processing of collections. In this paper, a GPU-based parallel algorithm of DClustR, named CUDA-DClus is proposed in order to enhance the efficiency of DClustR, in problems dealing with a large number of documents. The experimental evaluation conducted over several standard document collections showed the CUDA-DClus better performance in terms of efficiency and memory consumption.

Details

Title
Algoritmo incremental de agrupamiento con traslape para el procesamiento de grandes colecciones de datos
Author
González-Soler, Lázaro Janier; Pérez-Suárez, Airel; Chang-Fernández, Leonardo
Pages
1-12
Publication year
2015
Publication date
2015
Publisher
Dr. Luis Camilo Ortigueira Sánchez
e-ISSN
22555684
Source type
Scholarly Journal
Language of publication
Spanish
ProQuest document ID
1755954320
Copyright
Copyright Dr. Luis Camilo Ortigueira Sánchez 2015