Full text

Turn on search term navigation

© 2018. This work is published under https://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Pattern Recognition and Data Mining pose several problems in which, by their inherent nature, it is considered that an object can belong to more than one class; that is, clusters can overlap each other. OClustR and DClustR are overlapping clustering algorithms that have shown, in the task of documents clustering, the better tradeoff between quality of the clusters and efñciency, among the existing overlapping clustering algorithms. Despite the good achievements attained by both aforementioned algorithms, they are O(n2) so they could be less useful in applications dealing with a large number of documents. Moreover, although DClustR can efñciently process changes in an already clustered collection, the amount of memory it uses could make it not suitable for applications dealing with very large document collections. In this paper, two GPU-based parallel algorithms, named CUDA-OClus and CUDA-DClus, are proposed in order to enhance the efñciency of OClustR and DClustR, respectively, in problems dealing with a very large number of documents. The experimental evaluation conducted over several standard document collections showed the correctness of both CUDA-OClus and CUDA-DClus, and also their better performance in terms of efficiency and memory consumption.

Alternate abstract:

OClustR in DClustR sta prekrivna algoritma za grucenje, ki dosegata dobre rezultate, vendar je njuna kompleksnost kvadratnega reda velikosti. V tem príspevku sta predstavljena dva paralelna algoritma, ki temeljita na GPU: CUDA-OClus in CUDA-DClus. V eksperimentih sta pokazala zmožnost dela z velikimi kolicinami podatkov.

Details

Title
Static and Incremental Overlapping Clustering Algorithms for Large Collections Processing in GPU
Author
González-Soler, Lázaro Janier 1 ; Pérez-Suárez, Airel 2 ; Chang, Leonardo 3 

 Advanced Technologies Application Center (CENATAV) 7ma A # 21406, Playa, CP: 12200, Havana, Cuba E-mail: [email protected] and http://www.cenatav.co.cu/index.php/profile/profile/userprofile/jsoler 
 Advanced Technologies Application Center (CENATAV) 7ma A # 21406, Playa, CP: 12200, Havana, Cuba E-mail: [email protected] and http://www.cenatav.co.cu/index.php/profile/profile/userprofile/asuarez 
 Advanced Technologies Application Center (CENATAV) 7ma A # 21406, Playa, CP: 12200, Havana, Cuba E-mail: [email protected] and http://www.cenatav.co.cu/index.php/profile/profile/userprofile/lchang 
Pages
229-244
Publication year
2018
Publication date
Jun 2018
Publisher
Slovenian Society Informatika / Slovensko drustvo Informatika
ISSN
03505596
e-ISSN
18543871
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2133763598
Copyright
© 2018. This work is published under https://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.