Full Text

Turn on search term navigation

Copyright © 2015 Jingjing Wang and Chen Lin. Jingjing Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives and false negatives is favored. To address these problems, in this paper we propose Personalized Locality Sensitive Hashing (PLSH), where a new banding scheme is embedded to tailor the number of false positives, false negatives, and the sum of both. PLSH is implemented in parallel using MapReduce framework to deal with similarity joins on large scale data. Experimental studies on real and simulated data verify the efficiency and effectiveness of our proposed PLSH technique, compared with state-of-the-art methods.

Details

Title
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
Author
Wang, Jingjing; Chen, Lin
Publication year
2015
Publication date
2015
Publisher
John Wiley & Sons, Inc.
ISSN
16875265
e-ISSN
16875273
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1679858127
Copyright
Copyright © 2015 Jingjing Wang and Chen Lin. Jingjing Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.