Content area
With the great advance of information techniques, we are witnessing the tremendous growth of data generation. How to retrieve desirable objects quickly and effectively from such massive data is still a challenging issue. Hash learning provides a feasible solution for fast retrieval, due to its potential advantages. In this paper, we develop a novel hashing model based on an information-preserving metric for image retrieval. First, we theoretically reveal the relationships of the Hamming distances between the binary representations to the angles between their corresponding original data in the Euclidean space. We further demonstrate that the Hamming distances are unbiased estimations of the angles, especially when the length of binary representations is long enough. Based on this property, we introduce an information-preserving metric and take it into account within our hashing model, so that the information embedded within the original data can be preserved as much as possible during the encoding stage of hash learning. To verify the superiority of our hashing model, we conducted comprehensive experiments on three public image collections. The experimental results show that the proposed hashing algorithm can achieve competitive performance compared to the representative hashing algorithms.
Introduction
With an ever-growing number of digital devices and mobile network facilities, we are witnessing the great growth of data generation in an unprecedented way [1]. Such a massive amount of data has become increasingly important in recent years due to its wide range of potential applications in many fields, including computer vision, natural language processing, medical diagnosis, media analysis, and so on [2]. One important application where big data has had a significant impact is image retrieval, whose purpose is to retrieve or locate desirable images from a collection of digital images. Traditional image retrieval methods, such as text-based and content-based ones, have shown competitive performance. However, the increasing availability of large-scale image collections poses many challenges to the traditional retrieval methods, which often assume the scale of data is not so large [3]. This has urged a growing interest in developing efficient retrieval techniques or methods for specific images within the large-scale collections [4].
Hash learning is a promising technique to improve the retrieval efficiency for large-scale image retrieval [5]. As we know, efficiency is a critical aspect for the practicable applications of big data, since we always expect to immediately locate desirable objects for a given query. For example, it is insufferable to match a face in more than five seconds in the scenarios of security checks at airports or elsewhere. Hash learning provides a feasible solution to efficient retrieval for large-scale image collections by transforming high-dimensional images into binary representations. With the binary representation, the retrieval operations can be straightly achieved by CPU instructions, making image retrieval extremely fast [5]. Besides, data storage and communication overhead can also be significantly reduced, thereby improving the efficiency of learning systems [6]. Due to these potential advantages, hash learning has now received increasing attention and advanced significant progress in image retrieval in recent years.
According to whether data is independent to construct hash functions, traditional hashing techniques can be broadly grouped into two categories: data-oblivious hashing and data-aware hashing [7]. The former straightly encodes high-dimensional data into binary representations in a random manner, whereas the latter constructs hash functions by exploiting inherent or structural properties of data. Notable examples of traditional hashing algorithms include Locality-Sensitive Hashing (LSH) [8], ITerative Quantization (ITQ) [9] and FasteR Online Sketc.hing Hashing (FROSH) [10]. With its ability to extract powerful representational features from data, deep learning has been extensively studied in hash learning and has emerged as one of the most prominent hashing techniques [11]. For instance, Bi-half [12] employs an AutoEncoder as its backbone, followed by a bi-half layer to produce higher-quality binary codes. This bi-half layer explicitly quantizes real-valued features into an optimal half-half bit distribution for each hash channel. Despite significant advancements in hash learning for image retrieval, several open and challenging issues remain unresolved, necessitating further research efforts. Traditional hashing techniques offer high efficiency and scalability but often suffer from relatively poor performance. Conversely, deep hashing, while achieving superior performance, typically requires extensive training time, limiting its applicability in large-scale scenarios.
In this paper, we introduce a novel hashing method, dubbed HIP (Hashing with Information Preservation). It automatically generates binary codes by leveraging an innovative information-preserving metric tailored for binary representations. Generally speaking, traditional hashing techniques often underperform compared to conventional information retrieval methods due to significant information loss during the transformation of real-valued data into binary representations. To address this challenge, we theoretically elucidate the relationship between binary representations and their original data, demonstrating that Hamming distances in binary space serve as unbiased estimators of angles between the original data in the Euclidean space. Leveraging this insight, we release a new distance metric to preserve information during the transformation process of representations. This metric minimizes information loss, ensuring that the essential properties of the original data space are retained. Furthermore, we reformulate the hash learning problem as a convex optimization task, which can be solved via singular value decomposition (SVD), thereby facilitating efficient and effective hash function learning.
In a nutshell, the main contributions of this paper are briefly highlighted as follows:
We conduct a theoretical analysis of the information fidelity property during the encoding phase of hash learning. Specifically, when the length of binary representations is sufficiently large, the Hamming distances between these binary representations serve as unbiased estimations of the angles between their corresponding original data. To our knowledge, this theoretical property has not been previously discussed in the literature.
Leveraging this theoretical property, we introduce a novel distance metric aimed at preserving information. Furthermore, we reformulate the hashing problem as a convex optimization problem, enabling the learned binary representations to retain the information embedded within the original data as much as possible.
We propose an innovative hash learning algorithm based on the aforementioned information-preserving metric, making the generated binary codes more compact and powerful. Experimental results on publicly available datasets demonstrate the effectiveness of the proposed hashing algorithm.
Related work
The increasing demand for efficient hashing techniques across a diverse array of real-world applications is driven by their numerous potential advantages, including high efficiency, low communication costs, and reduced storage requirements [13]. In this context, we provide a brief overview of several representative hash learning methods.
Broadly speaking, hash learning techniques can be generally categorized as two groups: data-oblivious hashing and data-aware hashing [7]. Data-oblivious hashing generates hash functions randomly for projection without considering the data during the encoding stage. This approach is also referred to as data-independent hashing. The most classical and prevalent data-oblivious hashing algorithm is Locality-Sensitive Hashing (LSH) [8]. Owing to its extreme speed and robustness, LSH has been extensively extended and adapted in various ways. For example, Etral [14] proposed a method that collectively calculates hash bits, i.e., components of binary codes, to enhance performance gains and subsequently developed a class of one-pass LSH algorithms. Huang et al. [15] adopted reverse query-aware LSH functions, designed as random projections coupled with query-aware interval identification, to approximately identify furthest neighbors in external memory. Another notable variant is Shift-invariant Kernel LSH (SKLSH) [16], which employs a distribution-free encoding scheme to project data. In SKLSH, the Hamming distance between two binary codes correlates with the value of a Gaussian kernel between their corresponding data vectors.
In contrast, data-aware hashing typically constructs hash or projection functions by leveraging the inherent or structural properties of data during the encoding stage. Consequently, it exhibits relatively competitive performance compared to data-oblivious methods. Representative examples of data-aware hashing methods include ITerative Quantization (ITQ) [9], Density Sensitive Hashing (DSH) [17], PCA Hashing (PCAH) [18], Spectral Hashing (SH) [19], FasteR Online Sketc.hing Hashing (FROSH) [10] and Spherical Hashing (SpH) [20]. As an example, FROSH independently transforms high-dimensional data into compact binary codes with the strategy of online sketc.hing, thereby significantly reducing the training time of the hashing model [10]. DistillHash [21] selectively distills data pairs whose labels are consistent with those predicted by a classifier. Subsequently, hash functions are constructed within a Bayesian learning framework, leveraging the distilled data to optimize the hashing process.
In recent years, deep learning techniques have been extensively explored within the realm of hash learning, a domain commonly termed deep hashing. It is noteworthy that the aforementioned hashing algorithms predominantly rely on hand-crafted features to construct models. While these hand-crafted features are comprehensible to humans to some extent, they fail to fully capture multi-level semantic information. Consequently, the retrieval performance of the traditional hashing models cannot be significantly enhanced. To address this limitation, numerous scholars have turned to deep learning techniques, e.g., CNN, AlexNet, VGG and ResNet, to extract high-order semantic features from data. Subsequently, these deep features are used to train hashing models. For instance, PLDH [22] generates binary representations of data based on pseudo-labels, which are derived from the similarity relations of data by a convolutional neural network. SSDH (Structure based unsupervised Deep Hashing) [23] takes semantic structures of data points as a pair-wise loss function as designing a deep hashing network, where only the points with obviously small or large distances are considered. SWH(Self-supervised Weighted Hashing) [24] initially mitigates noise through a sophisticated graph-based approach and then learns weighted binary codes leveraging a pseudo-Siamese structure, thereby enabling the ranking of fine-grained distances. SGH(Stochastic Generative Hashing) [25] learns hash functions by using a generative model, wherein the discrepancy between the encoding and decoding processes is captured through the Minimum Description Length principle.
In addition to deep learning, various advanced learning techniques, such as matrix factorization and graph learning, have also been employed to derive compact binary codes in hash learning [26]. For instance, Liu et al. [2] extracted structural information embedded within data through sparse, parameter-free matrix factorization and subsequently generated binary codes by minimizing quantization loss. Xiao et al. [27] utilized Cauchy loss to quantify the errors in label matrix decomposition, thereby enhancing the robustness of the resultant hashing model. Lian et al. [28] transformed the binary constraints of matrix factorization into discrete constraints during the encoding stage of binary representations and then solved the optimization problem via a block coordinate descent approach. Wang et al. [29] initially represented each image as a graph node via GCN-based auto-encoder, and then derived semantic binary codes by an end-to-end hash model. SconGAN [30] adopts a generative network with dense residual blocks and content fidelity items to augment data samples. Additionally, a supervised contrastive learning mechanism is also introduced to strengthen the discrimination capabilities of features. Consequently, the quality of the generated hashing codes is further improved by the use of similarity loss and semantic retention loss.
Recent advancements in hash learning have focused on constructing hashing models for cross-modal data, which contain abundant and diverse information. Cross-modal hashing typically projects data from different modalities into a common space and subsequently generates binary codes based on this unified space. For example, EMCHL (Extensible Multi-similarity Cross-modal Hash Learning) [31] exploits the Hamming distance and a log-likelihood function to represent the self-similarity of unimodal data and the reciprocal-similarity of cross-modal data, respectively. To obtain semantically meaningful binary codes, a common and sparse latent semantic space of cross-modal data is derived via a low-rank constraint. OSCMFH (Online Supervised Collective Matrix Factorization Hashing) [32] extracts semantic representations by performing matrix factorization on each modality and preserves semantic similarities by constraining their correlations. Binary codes are subsequently generated by quantizing latent semantic modality-specific representations. EDCH (Efficient Discrete Cross-Modal Hashing) [33] captures intra-modal and inter-modal semantic information via label embedding and latent subspace, respectively. Additionally, a discrete optimization strategy is introduced to generate binary representations.
Although great progresses of hash learning have been made, there are still challenges and open problems remained [34]. On one hand, state-of-the-art hashing algorithms tend to be computationally intensive and sensitive to hyper-parameter settings, as a multitude of hyper-parameters must be meticulously configured. On the other hand, conventional hashing methods, while highly efficient and robust, have relatively poor retrieval performance. The underlying reason lies that a great amount of information may be lost during the encoding stage. Even so, LSH remains competitive when the length of learned binary codes exceeds 512 bits [8]. In this work, we endeavor to strike a balance between efficiency and performance by introducing a new distance metric.
Methodology
In this section, we first undertake a theoretical analysis of the relationship between the Hamming distances of binary representations and their corresponding angles in the embedding space. Subsequently, we introduce a novel distance metric to reduce information loss during the encoding stage of binary representations. Based on this information-preserving metric, we further leverage an effective hashing algorithm.
For convenience of discussion, hereafter a bold-faced uppercase letter, e.g., , refers to a matrix representation of data, while a bold-faced lowercase letter, e.g., , denotes a column vector or random variable. The individual element of or is marked as its subscript. For instance, indicates the i-th element of . and are the transpose and inverse of , respectively. The Frobenius norm (a.k.a. -norm) of a matrix or vector is represented as .
Information-preserving metric
Assume is a collection consisting of n images, i.e., . Each image (i=1..n) is denoted as a p-dimensional column vector. Let ) be a hash function to project an image in a p-dimensional Euclidean space into a binary value in Hamming space. Mathematically, the hash function is defined as follows:
1
This definition shows the vector can be encoded as a binary value, i.e., −1 or 1, by using the hash function ). When there are m such hash functions (), the image can be represented as m binary values (). In other words, can be represented as a binary vector , termed as the hash or binary code, after these m hash functions are performed. Under this context, the image collection can be encoded into a binary representation by these hash functions. It is noteworthy that in the literature, binary values are sometimes represented as 0 and 1, i.e., . For the sake of discussion, here we will use these notations interchangeably without causing confusion.Locality-Sensitive Hashing (LSH) is one of the most straightforward hashing methods. It randomly generates a real-value vector from a Gaussian distribution and simply obtains the sign of the dot product between the real-value vector and image data [8]. Formally, for a given image , the binary value generated by LSH is denoted as
2
where is a random vector from a Gaussian distribution. is a variant of sign function in an element-wise manner, where an entry is 1 if the corresponding element is larger than 0; Otherwise, the entry is −1. With m random vectors available, the binary code of image is represented as3
where refers to the matrix comprising of the m random vectors.Given two images and (), the Euclidean distance between them is
4
where denotes the angle between and . After performing the projection operation with the random matrix , the Hamming distance between them with respect to is5
where and are the binary codes, generated by Eq. 2, of and , respectively. Here is the k-th binary value of . According to this definition, it is evident that the maximal Hamming distance between and equals m, corresponding to the length of binary codes.From the definitions above, one can easily observe that significant information may be lost when images are encoded into binary representations through the hash projection operation. As a matter of fact, we observe the following property holds.
Proposition 1
Assume that are two vectors, and is a random matrix consisting of m column vectors, each is independent and identically distributed. For any two positive real values , the following equation holds.
6
Proof
Let and be the binary representation of and with respect to , respectively; that is, we have and according to Eq. 3. Since is a positive real-value, it will not change the sign of the projection of in the Hamming space. Thus, we have . This implies that both and have the same binary representation in the Hamming space with respect to . Based on this fact, we can safely conclude that
7
The proposition shows that the Hamming distance between two vectors will not be changed when they are amplified on the scale of a positive constant. However, their Euclidean distance will be changed contrastively for the same case. In addition, the Hamming distance is consistent with the angle between them in the Euclidean space.
Proposition 2
Assume that is the angle between any two vectors and , and is independent and identically distributed. For , we have the following property:
8
where E is the mathematical expectation and .Proof
According to the definitions above, we haveLet , and , where is an orthogonal matrix. The equation above can be rewritten asEventually, Eq. 8 holds, i.e., .
This proposition indicates that the angle between two vectors can be preserved after they are embedded in the Hamming space. Based on Proposition 2, we can further derive the following property.
Proposition 3
Assume that is the angle between and , and each entry of is independent and identically distributed. The binary representation of in the Hamming space with respect to is . Let . It is an unbiased estimation of , i.e., , and for any , we have
9
Proof
For the unbiased estimation, it can be directly derived according to Proposition 2. Indeed, we have the fact thatNow let’s turn our concerns on the proof of the formulation . Specifically, one can observe thatFor the last term above, it has the maximum value when is small enough, e.g., , and the maximum value is less than 1 (). Thus, the equation holds.
In a similar vain, we can conclude that also holds.
The propositions above signify a fact that the Hamming distance between two vectors after encoded into binary representations is consistent with the angle between them, while the norm information of their Euclidean distance will be lost during the encoding stage. That is to say, the angle information between data can still be preserved, although they are encoded into binary representations and some other information would be lost [35].
Based on the property above, we introduce a new distance metric between two data vectors, and , as follows to preserve information as much as possible.
10
where is the Hamming distance, defined in Eq. 5, between the binary representations of and after being projected with . It is worth mentioning that the distance metric meets the properties of non-negativity, identity, symmetry, and triangular inequality.Objective optimization
The objective of hash learning is to encode real-value data into binary representations while minimizing information loss [2, 5]. To achieve this purpose, here we exploit the distance metric defined in Eq. 10 to construct a hashing model, ensuring that the information embedded within the data can be preserved to the greatest extent possible. To be specific, for any images and a random projection matrix , the objective function of our hashing method is formulated as follows.
11
The objective function above shows that our hashing model aims to preserve the distributions of images after they are projected from the Euclidean space to the Hamming space. Assume all image data have been normalized, i.e., , we can substitute Eq. 4 and Eq. 10 into Eq. 11 to reformulate the objective function as follows:12
where is approximately represented as . To address the issue of imbalanced feature variance, we further perform an orthogonal transformation on the projection vectors to minimize the quantization errors. Let be a random orthogonal matrix. The aforementioned objective function can then be organized as the following optimization problem.13
For the first term of Eq. 13, its equivalent representation form is14
which can be solved directly by using commonly used strategies. Indeed, the optimal solution of Eq. 14 is that comprises of the eigenvectors of .Let’s turn our attention to the second term of Eq. 13. Suppose that and . One can observe that
15
Based on this observation, the second term can be represented as the following equivalent form.16
The binary representation and the rotation matrix can be derived by virtue of iteration operations in an alternative way. To be specific, we first get while is fixed. Similarly, once is available, we can obtain alternatively.Get with fixed. According to the definition above, the binary representation can be obtained straightly via the element-wise sign function, i.e., , when the rotation matrix is fixed.
Get with fixed. When the binary representation matrix is available, the optimization problem of Eq. 16 can be simply written as the following formulation.
17
It is noticeable that Eq. 17 is the orthogonal procrustes problem, which can be solved by means of the technique of singular value decomposition (SVD) [2]. Thus, we first adopt the technique of SVD to decompose into the form of , and then will be regarded as the rotation matrix, i.e., .Algorithm details
Based on the discussion above, we summarize the implementation details of our hashing algorithm with the information-preserving metric as Algorithm 1. The hashing algorithm proposed herein works in a straightforward manner and is readily comprehensible. During the training stage, the model construction mainly consists of three steps: vector initialization (from step 1 to 3), learning (from step 4 to 9), and code generation (from step 10 to 11). Before initializing the projection matrix and the rotation matrix , the data collection must undergo normalization at first. Subsequently, the projection matrix is initially assigned as the top m eigenvectors of , derived from the solution of sub-optimization problem specified in Eq. 14. Meanwhile, the rotation matrix is initialized as an identity matrix. The learning stage aims to obtain the optimal value of by solving the optimization problem formulated in Eq. 16. As elaborated in the subsection above, and are updated alternatively through the iteration process, which terminates after a predefined number of iterations. Once the optimal is available, the projection matrix is updated to . Finally, the binary representations of are derived with respect to using the element-wise sign function, i.e., .
Once the hashing model is constructed, we can further proceed the retrieval of images. Given a query image , we first normalize it and then obtain its binary representation with the projection matrix learned during the training stage. Thereafter, the Hamming distances between the query and the collection with respect to are calculated. Finally, the top-k nearest neighbors of are returned as the retrieval result.
Assume the data collection contains n images with p dimensions. For the initialization step of model construction, the computational cost mainly lies in initializing the projection matrix , i.e., obtaining the eigenvectors of . Generally, the time complexity of this step is . Within the iteration, the most expensive step is to decompose as by using SVD, i.e., step 6, despite and are updated alternatively. As we know, the computational time of SVD is . Hence, the time cost of the iteration is , where c is the maximal number of iteration steps. Besides, updating in step 10 needs time, while obtaining the binary representations is O(nmp). Thus, the overall time complexity of our hashing algorithm is .
Algorithm 1: HIP: Hashing with information preservation
[See PDF for image]
Experimental evaluation
To verify the effectiveness of our proposed hashing method, we conducted a series of comparison experiments with several representative hashing algorithms on three commonly used image collections. This section first presents the experimental settings and then discusses the experimental results.
Experimental settings
We evaluated the retrieval performance of our method (HIP) against baseline methods using three widely adopted image datasets: CIFAR10, MNIST, and CALTECH101. These datasets are frequently utilized to benchmark hashing models in the literature. Below is a brief description of each dataset:
CIFAR-10: This dataset contains six kinds of animals (like frog, cat, dog, horse, deer and bird) and four transports (e.g., plane, car, truck and ship), where each class has 6,000 images with pixels, totally 60,000 images. In the experiments, each image was represented as 512 GIST features. Besides, we randomly took 500 images in each class as test images, and 1,000 images as training ones. As a result, the training collection contained 10,000 images, while the test one had 5,000 colorful images.
MNIST: This dataset consists of 60,000 monochrome images in total, divided into ten classes corresponding to handwritten digits from 0 to 9. Each handwritten digit class associates 6,000 monochrome images of pixels. Before fed into hashing models, each image was transformed into a 784-dimensional grayscale feature vector. Similar to CIFAR-10, we randomly picked 500 images per class for testing, and 1,000 images for training, resulting in test and training datasets of 5,000 and 10,000 images, respectively.
CALTECH101: This dataset includes 9,000 colorful images tagged with 101 object categories commonly found in daily life, such as animals, food and vehicles. In this collection, each class contains varying numbers of images, ranging from 40 to 800. Besides, the images in this collection are of varying sizes, ranging from pixels to pixels. For convenience, each image was represented by 512 GIST features during the experiment. Moreover, we adopted the similar strategy to generate query and training data; that is, we randomly selected 1,000 images for testing and 5,000 images for training.
To make a comprehensive evaluation, we adopted four widely-used protocols, including precision, recall, precision-recall curve and mean average precision (mAP), to evaluate the retrieval performance of the hashing models in our experiments. Let denote a query, and represent the sets of relevant and retrieved images for , respectively. Precision refers to the fraction of its retrieved results that are relevant to the query image, i.e.,
18
In other words, it is the ratio of the quantity of correct retrieved images to the quantity of all retrieved images. Contrastively, recall is defined as the fraction of the relevant images that are successfully retrieved by the hashing models. Formally, the recall metric is19
From the definitions above, one can observe that the score of precision is 1 if every retrieved image is relevant to the query, whereas the score of recall is 1 if all relevant images of the query are retrieved. This also implies that a good retrieval model cannot only retrieve accurate results, but also obtain a majority of relevant images. To characterize this relationship exactly, the precision-recall curve was also adopted in the experiments. The mean average precision denotes the mean value of precision scores of all queries and is formally represented as20
where is the set of queries.Following the routine of experimental protocols in the literature, we randomly selected 1,000 images from test data as queries for each image collection. To mitigate bias, the experimental procedures were repeated ten times, and the average retrieval performance over these repetitions was reported in this paper. The proposed hashing algorithm was implemented in Matlab. For all experiments, the parameters associated with the baseline algorithms were configured to the values recommended in the HABIR toolkit 1. For HIP, the maximal iteration step was empirically assigned to 30. All experiments were conducted on a platform equipped with Intel i7 [email protected] CPU and 12GB main memory.
Results and discussion
As we know, precision and recall are commonly used metrics for assessing the performance of hashing models. Without loss of generality, we followed this tradition by using these metrics to evaluate the effectiveness of our proposed HIP method. Specifically, we trained the hashing models on the training image collections, and subsequently selected 1,000 images at random to serve as queries for estimating the precision and recall scores of the trained models.
[See PDF for image]
Fig. 1
The precision comparison of the hashing models with 64 bits of binary representation
Figure 1 describes the comparison of precision scores for various hashing models on the image collections, when the length of binary representations is set to 64 bits. From the comparison results in Fig. 1, we can easily conclude that compared to the baselines, the proposed hashing model achieved notably precision scores compared to the baseline methods. For instance, HIP exhibited the highest precision among all compared hashing algorithms. Particularly, HIP demonstrated significant advantages over the baselines on the CIFAR10 and CALTECH101 image datasets, with the largest difference between HIP and SpH on CIFAR10 amounting to 12.2%. These facts show that the proposed hashing algorithm can effectively preserve information embedded in the original data. Similar results were observed when the length of binary representations was varied to other values (e.g., 8, 16, 32, 128, etc.). Due to the limitation of space, here we have not presented these additional results individually.
[See PDF for image]
Fig. 2
The recall comparison of the hashing models with 64 bits of binary representation
Figure 2 presents the performance comparison of recall scores for the hashing models on the image collections with a binary representation length of 64 bits. According to the experimental results in Fig. 2, one can draw the similar conclusion that the proposed hashing algorithm still exhibited promising performance when compared to the baseline algorithms. For example, the advantages of HIP were pronounced, particularly on the CIFAR10 and CALTECH101 datasets. Indeed, the largest difference of recall scores between HIP and SpH on the CIFAR10 collection amounted to 13.5%.
It is worth noting that an effective retrieval model should not only accurately retrieve desired results (high precision) but also capture a comprehensive set of positive results (high recall). Due to this fact, we combined precision and recall together, and adopted the precision-recall curve as a metric to evaluate the retrieval performance of the hashing models. Fig. 3 illustrates the precision-recall curves of the hashing models on the image collections when the length of binary representations is 64. Observing the curves in Fig. 3, it is evident that for each curve, the precision score decreased smoothly with the increase of the recall score, and vice versa. It is reasonable because the precision score is sensitive to true positive results, whereas the recall score is sensitive to false negative results. Moreover, the precision-recall curves of HIP were higher than those of the baseline algorithms on all image collections. Another observation is that among the baseline algorithms, ITQ generally outperformed the others, while PCAH performed the worst. These facts manifest that the proposed hashing method has superior retrieval capability compared to the others.
[See PDF for image]
Fig. 3
The precision-recall curves of the hashing models with 64 bits of binary representation
In addition to the precision-recall curve, we also employed the mean Average Precision (mAP), a widely used evaluation metric in the field of information retrieval, to assess and compare the retrieval performance of various hashing models. As the name implies, mAP is the mean value of precision scores across a set of queries. During our experiments, we performed the hashing algorithms on the image collections to generate the binary representations with different lengths (e.g., 32, 64, 128 and 256), and then calculated the corresponding mAP scores for these representations. Table. 1 presents the mAP comparison of the hashing models with different lengths of binary representations. As shown in Table. 1, we can easily note that the proposed hashing algorithm was significantly outperformed the baseline methods, evident from the notably higher mAP scores achieved by HIP compared to the baseline algorithms in most cases. When the length of binary representations is short (e.g., 32), the performance of HIP was slightly worse than those of Bi-half and DistillHash. However, as the length of binary representations increased, the performance of HIP was significantly better than the others. This shows the fact that the introduced information-preserving metric can effectively preserve the information embedded within the original data during the encoding stage, especially when the length of binary representations is long enough. Another interesting observation is that SKLSH exhibited higher mAP scores when the length of binary representation was large (e.g., 256 bits), whereas PCAH achieved better performance for shorter binary representation lengths.
Table 1. The comparison of mAP scores of the hashing models with different bits of binary representation
CIFAR10 | MNIST | CALTECH101 | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
32 | 64 | 128 | 256 | 32 | 64 | 128 | 256 | 32 | 64 | 128 | 256 | |
LSH | 0.151 | 0.222 | 0.301 | 0.359 | 0.282 | 0.452 | 0.558 | 0.461 | 0.288 | 0.402 | 0.463 | 0.416 |
PCAH | 0.116 | 0.103 | 0.086 | 0.348 | 0.336 | 0.302 | 0.231 | 0.631 | 0.135 | 0.116 | 0.096 | 0.511 |
ITQ | 0.296 | 0.337 | 0.359 | 0.079 | 0.581 | 0.659 | 0.691 | 0.162 | 0.444 | 0.481 | 0.512 | 0.079 |
SKLSH | 0.111 | 0.168 | 0.361 | 0.377 | 0.152 | 0.291 | 0.406 | 0.721 | 0.217 | 0.341 | 0.497 | 0.527 |
DSH | 0.241 | 0.314 | 0.324 | 0.436 | 0.387 | 0.524 | 0.597 | 0.671 | 0.281 | 0.362 | 0.377 | 0.637 |
SH | 0.175 | 0.198 | 0.225 | 0.377 | 0.403 | 0.436 | 0.503 | 0.661 | 0.268 | 0.294 | 0.361 | 0.431 |
SpH | 0.245 | 0.294 | 0.333 | 0.241 | 0.315 | 0.368 | 0.425 | 0.506 | 0.302 | 0.349 | 0.383 | 0.359 |
Bi–half | 0.576 | 0.595 | 0.631 | 0.657 | 0.732 | 0.758 | 0.783 | 0.802 | 0.538 | 0.581 | 0.642 | 0.687 |
DistillHash | 0.571 | 0.583 | 0.597 | 0.618 | 0.715 | 0.729 | 0.734 | 0.758 | 0.546 | 0.578 | 0.634 | 0.672 |
HIP | 0.491 | 0.637 | 0.758 | 0.843 | 0.711 | 0.821 | 0.884 | 0.923 | 0.657 | 0.751 | 0.819 | 0.863 |
To demonstrate the retrieval effectiveness of our hashing model, we visualized the retrieval results returned by the hashing algorithms. Fig. 4 exemplifies a query for the digit ‘8’, where where the top 30 images retrieved by each hashing model using 64 hash bits are shown from the MNIST dataset. The choice of the digit 8’ was deliberate, as it is prone to confusion with 0’, 3’, 5’, 6’, and 9’. Upon examining the visualization results, it is evident that our proposed hashing model, HIP, accurately retrieved images of the digit 8’ within the top 30 results. On the other hand, the retrieval results returned by the baseline algorithms were incorrect images partially. From this perspective, the proposed hashing algorithm exhibits competitive performance and surpasses the representative hashing methods.
[See PDF for image]
Fig. 4
The visualized effect of top 30 images retrieved by the hashing models for the query of digit ‘8’ in MNIST
Conclusions
In this paper, we propose a novel hashing model based on the information-preserving metric for image retrieval. To motivate the introduction of this metric, we undertake a theoretical analysis of the relationship between Hamming distances among binary representations and the angles between their corresponding original data in Euclidean space. Our theoretical findings reveal that Hamming distances serve as unbiased estimators of these angles, particularly when the length of binary representations is sufficiently large. Based on this observation, we incorporate the information-preserving metric into our hashing model, thereby ensuring that the information embedded within the original data is preserved to the greatest extent possible during the encoding stage of hash learning. To demonstrate the superiority of our hashing model, we conducted comprehensive experiments on three publicly available image collections. The experimental results show that the proposed hashing algorithm exhibited promising performance when compared to the representative and classic hashing algorithms.
In future work, we plan to extend our information-preserving metric to deep hashing frameworks, leveraging the rich semantic features extracted by deep learning models. Additionally, exploring how to adapt our hashing model to handle multi-view data presents an interesting direction for further research.
Acknowledgements
The authors would like to thank the anonymous referees and the editors for their valuable comments and suggestions, which have improved the paper vastly.
Author contributions
H.L. and Z.W. designed and implemented the proposed algorithms; Z.Z. actively involved in the algorithm designing, debugging and experiment works; Z.L., and Z.W. provided valuable high-level guidance during algorithm designing, implementation; H.L. wrote the paper; X.W. reviewed and edited the paper. All authors read and approved the final manuscript.
Funding
This work was partially funded by the Natural Science Foundation (NSF) of China (Grant No. 61976195) and the Natural Science Foundation of Zhejiang Province (Grant No. LZ23F020003, LR23F020001).
Data availability
No datasets were generated or analysed during the current study.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
The authors have read and approved the final manuscript.
Competing interests
The authors declare no competing interests.
https://github.com/willard-yuan/hashing-baseline-for-image-retrieval.
Publisher'S note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
1. Abdalla, HB. A brief survey on big data: technologies, terminologies and data-intensive applications. J Big Data; 2022; 9,
2. Liu, H; Li, X; Zhang, S; Tian, Q. Adaptive hashing with sparse matrix factorization. IEEE Trans Neural Netw Learn Syst; 2020; 31,
3. Yang, F; Zhang, Q; Ding, X; Ma, F; Cao, J; Tong, D. Semantic preserving asymmetric discrete hashing for cross-modal retrieval. Appl Intell; 2023; 53,
4. Geroski, T; Jakovljevic, DG; Filipovic, N. Big data in multiscale modelling: from medical image processing to personalized models. J Big Data; 2023; 10,
5. Liu, H; Zhou, W; Zhang, H; Li, G; Zhang, S; Li, X. Bit reduction for locality-sensitive hashing. IEEE Trans Neural Netw Learn Syst; 2024; 35,
6. Hancock, JT; Wang, H; Khoshgoftaar, TM; Liang, Q. Data reduction techniques for highly imbalanced medicare big data. J Big Data; 2024; 11,
7. Liu, H; Zhou, W; Wu, Z; Zhang, S; Li, G; Li, X. Refining codes for locality sensitive hashing. IEEE Trans Knowl Data Eng; 2024; 36,
8. Cai, D. A revisit of hashing algorithms for approximate nearest neighbor search. IEEE Trans Knowl Data Eng; 2021; 33,
9. Gong, Y; Lazebnik, S; Gordo, A; Perronnin, F. Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval. IEEE Trans Pattern Anal Mach Intell; 2013; 35,
10. Chen, X; Yang, H; Zhao, S; King, I; Lyu, MR. Making online sketching hashing even faster. IEEE Trans Knowl Data Eng; 2021; 33,
11. Bolhasani, H; Jassbi, SJ; Sharifi, A. DLA-E: a deep learning accelerator for endoscopic images classification. J Big Data; 2023; 10,
12. Li, Y; van Gemert, J. Deep unsupervised image hashing by maximizing bit entropy. Proceedings of the AAAI Conference on Artificial Intelligence; 2021; [DOI: https://dx.doi.org/10.1609/aaai.v35i3.16296]
13. Darban, ZZ; Webb, GI; Pan, S; Aggarwal, CC; Salehi, M. Deep learning for time series anomaly detection: a survey. ACM Computing Survey; 2025; 57,
14. Ertl, O. Probminhash - a class of locality-sensitive hash algorithms for the (probability) jaccard similarity. IEEE Trans Knowl Data Eng; 2022; 34,
15. Huang, Q; Feng, J; Fang, Q; Ng, W. Two efficient hashing schemes for high-dimensional furthest neighbor search. IEEE Trans Knowl Data Eng; 2017; 29,
16. Raginsky M, Lazebnik S. Locality-sensitive binary codes from shift-invariant kernels. In: Proc. of the 23rd Annual Conference on Neural Information Processing Systems (NeurIPS09). 2009. pp. 1509–17. https://proceedings.neurips.cc/paper/2009/hash/a5e00132373a7031000fd987a3c9f87b-Abstract.html.
17. Jin, Z; Li, C; Lin, Y; Cai, D. Density sensitive hashing. IEEE Trans Cybern; 2014; 44,
18. Jégou, H; Perronnin, F; Douze, M; Sánchez, J; Pérez, P; Schmid, C. Aggregating local image descriptors into compact codes. IEEE Trans Pattern Anal Mach Intell; 2012; 34,
19. Weiss Y, Torralba A, Fergus R. Spectral hashing. In: Proc. of the 21nd Annual Conference on Neural Information Processing Systems (NeurIPS08). 2008. pp. 1753–60. https://proceedings.neurips.cc/paper_files/paper/2008/hash/d58072be2820e8682c0a27c0518e805e-Abstract.html.
20. Heo J-P, Lee Y, He J, Chang S-F, Yoon S-E. Spherical hashing. In: Proc. of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR12). 2012. pp. 2957–64. https://doi.org/10.1109/CVPR.2012.6248024.
21. Yang E, Liu T, Deng C, Liu W, Tao D. Distillhash: Unsupervised deep hashing by distilling data pairs. In: Proc. of the 2019 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR). 2019. pp. 2941–50. https://doi.org/10.1109/CVPR.2019.00306.
22. Liu, H; Yin, M; Wu, Z; Zhao, L; Li, Q; Zhu, X et al. PLDH: pseudo-labels based deep hashing. Mathematics; 2023; 11,
23. Yang E, Deng C, Liu T, Liu W, Tao D. Semantic structure-based unsupervised deep hashing. In: Prof. of the 27th Int. Joint Conf. Artificial Intelligence (IJCAI). 2018. pp. 1064–70. https://doi.org/10.24963/ijcai.2018/148.
24. Li P, Li P, Yin W. Self-supervised weighted hashing with topology mining. In: Proc. of the 7th Int. Conf. Control Engineering and Artificial Intelligence (CCEAI23). 2023. 143–8. https://doi.org/10.1145/3580219.3580245.
25. Dai B, Guo R, Kumar S, He N, Song L. Stochastic generative hashing. In: Proc. of the 34th Int. Conf. Machine Learning (ICML17). 2017. pp. 913–22. https://doi.org/10.48550/arXiv.1701.02815.
26. Li P, Li Y, Xie H, Zhang L. Neighborhood-adaptive structure augmented metric learning. In: Proc. of the AAAI Conf. Artificial Intelligence. 2022. pp. 1367–75. https://doi.org/10.1609/aaai.v36i2.20025.
27. Xiao, Y; Zhang, W; Dai, X; Dai, X; Zhang, N. Robust supervised discrete hashing. Neurocomputing; 2022; 483, pp. 398-410. [DOI: https://dx.doi.org/10.1016/j.neucom.2021.09.077]
28. Lian, D; Xie, X; Chen, E. Discrete matrix factorization and extension for fast item recommendation. IEEE Trans Knowl Data Eng; 2021; 33,
29. Wang, Y; Song, J; Zhou, K; Liu, Y. Unsupervised deep hashing with node representation for image retrieval. Pattern Recogn; 2021; 112, [DOI: https://dx.doi.org/10.1016/j.patcog.2020.107785] 107785.
30. Lei, L; Guo, D; Shen, Z; Wu, Z. Image hashing retrieval based on generative adversarial networks. Appl Intell; 2023; 53,
31. Tan, J; Yang, Z; Ye, J; Chen, R; Cheng, Y; Qin, J et al. Cross-modal hash retrieval based on semantic multiple similarity learning and interactive projection matrix learning. Inf Sci; 2023; 648, [DOI: https://dx.doi.org/10.1016/j.ins.2023.119571] 119571.
32. Shu, Z; Li, L; Yu, J; Zhang, D; Yu, Z; Wu, X. Online supervised collective matrix factorization hashing for cross-modal retrieval. Appl Intell; 2023; 53,
33. Yang, F; Zhang, Q; Ma, F; Ding, X; Liu, Y; Tong, D. Efficient discrete cross-modal hashing with semantic correlations and similarity preserving. Inf Sci; 2023; 643, [DOI: https://dx.doi.org/10.1016/j.ins.2023.119222] 119222.
34. Singh, A; Gupta, S. Learning to hash: a comprehensive survey of deep learning-based hashing methods. Knowl Inf Syst; 2022; 64,
35. Yi X, Caramanis C, Price E. Binary embedding: fundamental limits and fast algorithm. In: Proc. the 32nd Int. Conf. Machine Learning (ICML15). 2015. pp. 2162–70. https://proceedings.mlr.press/v37/yi15.html.
© The Author(s) 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.