1. Introduction
The advent of Internet age brings the explosive growth of image resources. Although managing and retrieving images by semantic tags is a common and effective way, there are still a large number of untagged or not fully tagged images. However, it is not easy to carry out manual annotation regarding the cost of human resources and the semantic nuances of annotation under the background of various cultures, religions, and languages. Moreover, the cognition bias caused by subjectivity could induce semantic discrepancies as well. Thus, how to design an efficient automatic image annotation algorithm to provide accurate labels for untagged images has been an urgent problem.
Automatic image annotation (AIA) refers to the process that computers automatically provide one or more semantic tags that can reflect the content of a specific image through algorithms. It is a mapping from images to semantic concepts, namely, the process of understanding images. Image annotation is based on image feature representations, and features utilized in different tasks have different representation abilities [1–3]. For example, global color and texture features have been successfully used in retrieving similar images [4], while local structure features perform well in tasks of object classification and matching [5, 6]. In general, features that depict images from different views can provide complementary information. Thus a rational fusion of multiview features contributes to more comprehensive depiction for images, which can be beneficial to image searching, classification, or other related tasks.
Many multiview learning algorithms have been proposed for operating some tasks such as classification, retrieval, and clustering based on multiview features. According to the levels of feature fusion, multiview learning methods can be grouped into two categories [7]: feature-level fusion such as MKL [8], SVM-2K [9], and CCA [10] and classifier-level fusion such as hierarchical SVM [11]. Some experimental studies show that classifier-level fusion outperforms simple feature concatenation, whereas sophisticated feature-level fusion usually performs better than classifier-level fusion [11, 12].
Recently, many image annotation algorithms use a variety of underlying features to improve annotation performance [8–10]. On one hand, the multiview features improve the accuracy, but on the other hand the strategies decrease the efficiency and applicability of the algorithms because of the increase of feature dimensions. Moreover, many existing multiview learning algorithms are unsupervised; that is, they do not make use of the label information in the training set. Such fused features may not effectively contain the semantic relationship between samples. This paper proposes a semisupervised learning framework based on graph embedding and multiview NMF (GENMF). In GENMF, feature fusion and dimension reduction are firstly performed by the proposed graph embedded multiview NMF algorithm, and then the new obtained features are used to annotate images through KNN-based approach.
2. Related Works
Existing image annotation algorithms can be roughly divided into two categories [13]: model-based learning methods and database-based retrieval methods. Model-based methods explore the relationship between high-level semantic concepts and low-level visual features to discover a mapping function through machine learning or knowledge models for image annotation. Unlike model-based methods, database-based methods do not need to set up the mapping function based on the training set but directly provide a sequence of candidate labels according to the already annotated images in the database.
There are three kinds of model-based learning methods for image annotation: classification based methods, possibility based methods, and topic model-based methods. Classification based methods [14–16] treat tags as specific class labels and explore the mapping relations between low-level visual features and labels through machine learning methods. The essence of this kind of methods is transforming image annotation to image classification. Different classifiers are used to establish mapping functions between low-level features (from images or regions) and semantic concepts. Labels with the high confidence from the classifiers are annotated to images. Different from classification based methods, possibility based methods [17, 18] do not use classifiers to build the mapping functions but explore the relationship between the underlying features of the image and the semantic labels based on unsupervised probability and statistics models. They utilize the relations to calculate the joint probability of images and labels or the conditional probability of labels given an image and then estimate the possible labels through statistical inference. Topic model-based methods [19, 20] use latent topics to associate low-level visual features with high-level semantic concepts to implement image annotation.
The model-based methods have three difficulties in practical applications. First, the learning models trained on the datasets with finite image types and semantic labels can hardly reflect the characteristics of feature distributions in the real world, which leads to unsatisfactory annotation performance when facing new features and semantic labels. Second, the limited size of training sets may result in overfitting and low generalization ability of the models. Third, low-level features may often fail to express high-level semantic information because they belong to different feature spaces. Thus, it is also hard to establish a mapping model between image features and semantic concepts because of the semantic gap.
The essence of retrieval based method is directly providing a list of candidate labels for the images to be tagged based on the existing datasets with complete and valid label information. Most common retrieval methods are based on KNN [21–23]: they retrieve k images with the highest similarity to the input image from the database, and the labels of the k images are sorted based on the statistical relationship or weighted statistical relationship to generate the candidate labels of the input images. The other category is graph-based methods [24–27] that utilize image feature distance to establish relevant graphs of samples. Based on the assumption that neighboring images in the relevant graph have similar labels (label smoothness), the similarity between nodes and the global structural characteristics of the relevant graph are used to propagate and enrich the node information including labels and classes. This kind of semisupervised learning methods is suitable for not fully tagged datasets existing on the Internet.
Traditional graph-based methods usually label images by aggregating multiple features into one feature and building a relation graph based on this feature. In [25], it is pointed out that traditional methods cannot effectively capture the unique information for each feature and proposes to utilize different features to establish relation subgraphs and then link these subgraphs to form a supergraph. Based on the supergraph, label propagation is achieved through the graph-based method. In [26], different feature graphs are built based on different features of the images and then the relationship between images is constructed through the graph-based method based on different feature graphs. Furthermore, the relationship between images and different features can be also constructed. Finally, the two relationships, namely, the relation between images and the relation between images and different features, can be fused by a designed objective function to obtain good candidates for the labels.
In [27], a graph learning KNN (GLKNN) is proposed by combining KNN-based method and graph-based method. GLKNN first uses graph-based method to propagate the labels of the K nearest neighbors to the new image and obtain one sequence of candidate labels, then GLKNN employs the naive-Bayes nearest neighbor algorithm to establish the relationship between labels and image features for obtaining another sequence of candidate labels. Finally, the two candidate label sequences are linearly combined as the final predicted labels. In [28], graph embedding discriminant analysis is applied to classify marine fish marine fish species by constructing intraclass similarity graph and interclass penalty graph. Although the algorithm improves the performance of classification and clustering by utilizing class labels to build graph embedded term, the traditional graph embedding algorithm is not suitable for multilabel problems with multilabel images because there is no intraclass and interclass relationship. In [21, 22], different models based on metric methods are proposed to enhance the representation ability of features and further improve the performance of image annotation. However, the metric based feature processing only linearly embeds the original features and does not reduce the feature dimension. In [13], multiple features are fused by concatenation, which ignores the manifold characters of different features and high feature dimension results in low efficiency of the algorithm.
For reducing the dimensions of each feature for annotation, an extended local sensitive discriminant analysis algorithm is proposed by constructing relevant and irrelevant graphs in [29]. Generally, feature dimension reduction methods based on NMF decomposition are for single-view features. References [30, 31] extend this method to multiview features by simply concatenating multiple vectors into one feature vector before further dimension reduction. However, this concatenation way can cause vector dimension disaster. Besides, multiview features are descriptions from different views for images so that simple connection does not make good sense. Then a multiview NMF model based on shared coefficient matrix is developed for capturing the latent feature patterns in multiview features [32], where different view features have their own basis matrices and share a coefficient matrix. The proposed model is used for solving classification and clustering problems and is not suitable for multilabel problems with multilabel images.
Based on the above reviews, this paper proposes a semisupervised learning model based on multiview NMF and graph embedding. A novel multiview NMF algorithm based on graph embedding is developed to fuse the multiview features and reduce the dimension of the fused features by designing appropriate graph embedded regularization terms. Then, the image annotation is performed by using the new features through a KNN-based algorithm.
3. The Proposed Methods
In this section, we elaborate the proposed semisupervised framework for automatic image annotation. First, the graph embedding terms for multilabel problems are constructed through semantic similarity matrix. Second, an objective function is established by adding graph embedded semantic constraints. Third, the update rules for optimizing are derived in detail. Finally, the overall framework of the algorithm is presented.
3.1. Graph Embedding for Multilabel Problem
The traditional graph embedding model is introduced for classification problems, in which each sample has only one label, so that the Laplacian matrices L and
Let
Having the relevant and irrelevant matrices, the following two constraint items
3.2. An Automatic Image Annotation Model Based on Multiview Feature NMF and Graph Embedding
Let
Furthermore, graph embedding regularization terms (7) and (8) are combined with the above loss function, then
3.3. Update Rules Derivation
The established model is semisupervised, and only part of the data has label information. The objective function can be rewritten in the form of block matrix. The following subsection will give the derivation of update rules.
The update rule of formula (10) is derived as follows:
3.4. Framework of the GENMF
The schematic diagram of the proposed GENMF model can be illustrated as in Figure 1. First, multiview features are extracted from images as the input matrix X in (10). Equations (1)-(8) are utilized to build graph embedding regularization terms as the input matrices L and
Algorithm 1 gives the pseudocode of the GENMF.
Algorithm 1: Multiview NMF with graph embedding for image annotation.
Input: Image set
Output: Predicted label matrix
(
(
(
(
(
(
(
(
4. Experimental Studies
4.1. Dataset and Experiment Design
The main purpose of the proposed algorithm is to improve the performance of automatic image annotation by fusing the multiview features and reducing the feature dimension, which makes it better to represent semantic concepts under semantic constraints in new low-dimensional feature spaces. So this paper selects the dataset Corel5k with 15 different features, and Corel5k consists of 4500 images for training and 499 images for test, which is available on http://lear.inrialpes.fr. The 15 features are all low-level image features including Gist, DenseSift, DenseSiftV3H1, HarrisSift, HarrisSiftV3H1, DenseHue, DenseHueV3H1, HarrisHue, HarrisHueV3H1, Rgb, RgbV3H1, Lab, LabV3H1, Hsv, and HsvV3H1. In the experiment, we select a local feature DenseSiftV3H1, a global feature Gist, and a color feature Hsv.
In the experiments, the multiple features except Gist are regularized through L2-normalization, and the normalized features are input into the GENMF to obtain low-dimensional representations. Then the low-dimensional feature vectors are input into the 2PKNN annotation algorithm to obtain the predicted labels for the test set. The performance of the algorithm is evaluated in terms of four metrics Pre, Rec, F1, and N+. Table 1 lists the parameters used in the experiments.
Table 1
Parameters required in the algorithm and their ranges of values.
Notation | Description | Range of values |
---|---|---|
| Weight for graph embedding terms | |
| Dimension of the new features | |
| Label-relevant coefficient | |
| Label-irrelevant coefficient | |
4.2. Experimental Results
4.2.1. Convergence Curve of Loss Function
Figure 3 shows the convergence curves of loss function with different parameters. It can be observed that, after about 300 iterations, the trend of the loss curve tends to be stable.
[figures omitted; refer to PDF]
4.2.2. The Influence of Different
The relation matrix
4.2.3. The Influence of Different
Figure 5 shows the varying curve of Pre, Rec, F1, and N+ in the case of K = 300 with different
4.2.4. The Influence of Different Feature Dimensions
Figure 6 shows the annotation performance curves when α is taken as 0, 1000, and 2000, respectively, and the value of K changes from 100 to 800 with an increase of 100 each time. The three curves with different values of parameter α show the consistent trend of change. In Figure 6-1, the accuracy increases with the increase of dimension because more information can be retained, and the curve becomes stable until α reaches 2000. The worst performance is at α = 0. Figure 6-2 shows that the recall rate decreases slightly with the increase of dimension because the requirement for retrieval is higher with the increase of dimension. In Figure 6-3, F1 is reflecting the comprehensive effect of the accuracy and recall rate. It can be observed that the F1 increases in the interval
4.2.5. Comparison with Existing Annotation Algorithms
Table 2 presents the comparison results with existing annotation algorithms. RMLF [36] optimizes the final prediction tag score by fusing prediction tag scores of 15 different features. LDMKL [14] and SDMKL [14] use the different classifiers based on the nonlinear kernel of three-layer network to annotate images. 2PKNN [22] uses two steps for annotation: after dealing with data imbalance, images are annotated through a KNN-based method in data-balanced dataset. LJNMF [31], merging features [31], and Scoefficients [31] consider different kinds of NMF modeling, extract new features, and annotate images through a KNN-based method. TagProp (ML) [21] and TagProp (σML) [21] acquire discriminative feature fusion on the training set by designing a metric learning model and annotate images using weighted KNN method. JEC [37] is a KNN-based algorithm based on the average distance of multiple features, which is a benchmark algorithm for image annotation. MRFA [38] proposes a new semantic context modeling and learning method based on multimarkov random fields. SML [39] is a discriminative model that treats each label as one class in multiclass classification problems; GS [38] introduces the regularization-based feature selection algorithm to exploit the sparsity and clustering properties of features.
Table 2
Comparison results with other annotation algorithms.
Methods | Pre | Rec | F1 | N+ |
---|---|---|---|---|
SML | 23 | 29 | 25.7 | 137 |
JEC | 27 | 32 | 29.3 | 139 |
GS | 30 | 33 | 31.4 | 146 |
MRFA | 31 | 36 | 33.3 | 172 |
TagProp(ML) | 31 | 37 | 33.7 | 146 |
TagProp( | 33 | 42 | 37.0 | 160 |
RMLF | 29.7 | 32.6 | 31.1 | - |
Merging features | 33 | 40 | 36.5 | - |
Scoefficients | 30 | 39 | 34.6 | - |
LJNMF(3f’) | 35 | 43 | 39.1 | - |
2PKNN(3f) | 32 | 28 | 30.6 | 177 |
SDMKL | 38 | 25 | 30 | 158 |
LDMKL | 44 | 29 | 34.9 | 179 |
GENMF (3f) | 38 | 39 | 39.2 | 168 |
In Table 2, the note (3f) denotes using the three features selected in this paper, and the note (3f’) indicates using three features that are not the same as in this paper. The results of other algorithms are directly taken from respective literatures and all the 15 features are utilized. Our algorithm uses only three features, and it can be seen in Table 2 that the proposed GENMF achieves the competitive performance.
4.2.6. The Best, Average, and Standard Deviation of the Results
Table 3 shows the best, average, and standard deviation of the results using 10 independent runs. The NMF-based algorithms have a certain randomness, and different initial values may produce different results. Table 3 shows that the influence of different initialization values is limited, but better performance could be expected if a better initialization strategy is chosen. Besides, the average time consumption of the proposed GENMF with the new low-dimensional features is 13.945 seconds to label all 499 test images, whereas utilizing the original features to label takes 34.652 seconds, which is about 2.5 times that of GENMF.
Table 3
The maximum, mean, and standard deviation of results using 10 independent runs.
metrics | Precision | Recall | F1 | N+ |
---|---|---|---|---|
mean | 0.38 | 0.39 | 0.392 | 168 |
SD | 0.017 | 0.010 | 0.012 | 4.50 |
maximum | 0.41 | 0.40 | 0.398 | 175 |
5. Conclusions
In this paper, we propose a semisupervised framework based on graph embedding and multiview nonnegative matrix factorization for automatic image annotation with multilabel images. The main purpose of the proposed algorithm is to improve the performance of automatic image annotation by fusing multiview features and reducing feature dimension, which makes it better to represent semantic concepts under semantic constraints in new low-dimensional feature spaces. For feature fusion and dimension deduction, a novel graph embedding term is constructed based on the relevant graph and the irrelevant graph. Then, the fusion of multiview features and the reduction of dimensionality are realized based on multiview NMF model. Moreover, the updated rules of the model are derived. Finally, images are annotated by using a KNN-based approach. Experimental results validate that the proposed algorithm can achieve competitive performance in terms of accuracy and efficiency.
Conflicts of Interest
The authors declare that there are no conflicts of interest regarding the publication of this paper.
Acknowledgments
The authors are grateful to the support of the National Natural Science Foundation of China (61572104, 61103146, 61425002, and 61751203), the Fundamental Research Funds for the Central Universities (DUT17JC04), and the Project of the Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University (93K172017K03).
[1] L. Mai, H. Jin, Z. Lin, C. Fang, J. Brandt, F. Liu, "Spatial-Semantic Image Search by Visual Feature Synthesis," Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1121-1130, DOI: 10.1109/CVPR.2017.125, .
[2] H. Guan, W. A. Smith, "BRISKS: Binary Features for Spherical Images on a Geodesic Grid," Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4516-4524, DOI: 10.1109/CVPR.2017.519, .
[3] Y. Zhang, W. Lin, Q. Li, W. Cheng, X. Zhang, "Multiple-level feature-based measure for retargeted image quality," IEEE Transactions on Image Processing, vol. 27 no. 1, pp. 451-463, DOI: 10.1109/TIP.2017.2761556, 2018.
[4] F. Zhang, B. W. Wah, "Fundamental principles on learning new features for effective dense matching," IEEE Transactions on Image Processing, vol. 27 no. 2, pp. 822-836, DOI: 10.1109/TIP.2017.2752370, 2018.
[5] H. Zhang, V. M. Patel, R. Chellappa, "Hierarchical Multimodal Metric Learning for Multimodal Classification," Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3057-3065, DOI: 10.1109/CVPR.2017.312, .
[6] P. Li, Q. Wang, H. Zeng, L. Zhang, "Local Log-Euclidean Multivariate Gaussian Descriptor and Its Application to Image Classification," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39 no. 4, pp. 803-817, DOI: 10.1109/TPAMI.2016.2560816, 2017.
[7] Y. Luo, T. Liu, D. Tao, C. Xu, "Multiview matrix completion for multilabel image classification," IEEE Transactions on Image Processing, vol. 24 no. 8, pp. 2355-2368, DOI: 10.1109/TIP.2015.2421309, 2015.
[8] G. R. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, M. I. Jordan, "Learning the kernel matrix with semidefinite programming," Journal of Machine Learning Research, vol. 5, pp. 323-330, 2004.
[9] J. D. R. Farquhar, D. R. Hardoon, H. Meng, J. Shawe-Taylor, S. Szedmak, "Two view learning: SVM-2K, theory and practice," Proceedings of the 2005 Annual Conference on Neural Information Processing Systems, NIPS 2005, pp. 355-362, .
[10] D. R. Hardoon, S. Szedmak, J. Shawe-Taylor, "Canonical correlation analysis: an overview with application to learning methods," Neural Computation, vol. 16 no. 12, pp. 2639-2664, DOI: 10.1162/0899766042321814, 2004.
[11] J. Kludas, E. Bruno, S. Marchand-Maillet, "Information Fusion in Multimedia Information Retrieval," Adaptive Multimedia Retrieval: Retrieval, User, and Semantics, vol. 4918, pp. 147-159, DOI: 10.1007/978-3-540-79860-6_12, 2008.
[12] C. G. M. Snoek, M. Worring, A. W. M. Smeulders, "Early versus late fusion in semantic video analysis," pp. 399-402, DOI: 10.1145/1101149.1101236, .
[13] Y. Gu, X. Qian, Q. Li, M. Wang, R. Hong, Q. Tian, "Image Annotation by Latent Community Detection and Multikernel Learning," IEEE Transactions on Image Processing, vol. 24 no. 11, pp. 3450-3463, DOI: 10.1109/TIP.2015.2443501, 2015.
[14] M. Jiu, H. Sahbi, "Nonlinear Deep Kernel Learning for Image Annotation," Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 1551-1555, .
[15] L. Sun, H. Ge, S. Yoshida, Y. Liang, G. Tan, "Support vector description of clusters for content-based image annotation," Pattern Recognition, vol. 47 no. 3, pp. 1361-1374, DOI: 10.1016/j.patcog.2013.10.015, 2014.
[16] M.-L. Zhang, L. Wu, "LIFT: Multi-label learning with label-specific features," Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI 2011, pp. 1609-1614, .
[17] M. Zand, S. Doraisamy, A. Abdul Halin, M. R. Mustaffa, "Visual and semantic context modeling for scene-centric image annotation," Multimedia Tools and Applications, vol. 76 no. 6, pp. 8547-8571, DOI: 10.1007/s11042-016-3500-5, 2017.
[18] D. Tian, Z. Shi, "Automatic image annotation based on Gaussian mixture model considering cross-modal correlations," Journal of Visual Communication and Image Representation, vol. 44, pp. 50-60, DOI: 10.1016/j.jvcir.2017.01.015, 2017.
[19] J. Tian, Y. Huang, Z. Guo, X. Qi, Z. Chen, T. Huang, "A multi-modal topic model for image annotation using text analysis," IEEE Signal Processing Letters, vol. 22 no. 7, pp. 886-890, DOI: 10.1109/LSP.2014.2375341, 2015.
[20] K. Pliakos, C. Kotropoulos, "PLSA driven image annotation, classification, and tourism recommendation," Proceedings of the IEEE International Conference on Image Processing, pp. 3003-3007, .
[21] M. Guillaumin, T. Mensink, J. Verbeek, C. Schmid, "TagProp: discriminative metric learning in nearest neighbor models for image auto-annotation," pp. 309-316, DOI: 10.1109/iccv.2009.5459266, .
[22] Y. Verma, V. Jawahar C, "Image Annotation Using Metric Learning in Semantic Neighbourhoods," Proceedings of the European Conference on Computer Vision, pp. 836-849, .
[23] M. M. Kalayeh, H. Idrees, M. Shah, "NMF-KNN: image annotation using weighted multi-view non-negative matrix factorization," pp. 184-191, DOI: 10.1109/cvpr.2014.31, .
[24] Z. Chen, M. Chen, K. Q. Weinberger, "Marginalized denoising for link prediction and multi-label learning," Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 1707-1713, .
[25] S. Hamid Amiri, M. Jamzad, "Efficient multi-modal fusion on supergraph for scalable image annotation," Pattern Recognition, vol. 48 no. 7, pp. 2241-2253, DOI: 10.1016/j.patcog.2015.01.015, 2015.
[26] Z. He, C. Chen, J. Bu, P. Li, D. Cai, "Multi-view based multi-label propagation for image annotation," Neurocomputing, vol. 168 no. C, pp. 853-860, DOI: 10.1016/j.neucom.2015.05.039, 2015.
[27] F. Su, L. Xue, "Graph learning on K nearest neighbours for automatic image annotation," Proceedings of the 5th ACM International Conference on Multimedia Retrieval, ICMR 2015, pp. 403-410, .
[28] S. Hasija, M. J. Buragohain, S. Indu, "Fish species classification using graph embedding discriminant analysis," Proceedings of the 2017 International Conference on Machine Vision and Information Technology, CMVIT 2017, pp. 81-86, .
[29] X. Liu, R. Liu, F. Li, Q. Cao, "Graph-based dimensionality reduction for KNN-based image annotation," Proceedings of the 21st International Conference on Pattern Recognition, ICPR 2012, pp. 1253-1256, .
[30] J. BenAbdallah, J. C. Caicedo, F. A. Gonzalez, O. Nasraoui, "Multimodal image annotation using non-negative matrix factorization," Proceedings of the 2010 IEEE/WIC/ACM International Conference on Web Intelligence, WI 2010, pp. 128-135, .
[31] R. Rad, M. Jamzad, "Automatic image annotation by a loosely joint non-negative matrix factorisation," IET Computer Vision, vol. 9 no. 6, pp. 806-813, DOI: 10.1049/iet-cvi.2014.0413, 2015.
[32] Z. Guan, L. Zhang, J. Peng, J. Fan, "Multi-View Concept Learning for Data Representation," IEEE Transactions on Knowledge and Data Engineering, vol. 27 no. 11, pp. 3016-3028, DOI: 10.1109/TKDE.2015.2448542, 2015.
[33] H. Wang, C. Ding, H. Huang, "Multi-label Linear Discriminant Analysis," Proceedings of the European Conference on Computer Vision, pp. 126-139, .
[34] N. Guan, X. Huang, L. Lan, Z. Luo, X. Zhang, "Graph based semi-supervised non-negative matrix factorization for document clustering," Proceedings of the 11th IEEE International Conference on Machine Learning and Applications, ICMLA 2012, pp. 404-408, .
[35] C.-J. Lin, "On the convergence of multiplicative update algorithms for non-negative matrix factorization," IEEE Transactions on Neural Networks and Learning Systems, vol. 18 no. 6, pp. 1589-1596, DOI: 10.1109/tnn.2007.895831, 2007.
[36] Y. Yao, X. Xin, P. Guo, "A rank minimization-based late fusion method for multi-label image annotation," Proceedings of the 23rd International Conference on Pattern Recognition, ICPR 2016, pp. 847-852, .
[37] A. Makadia, V. Pavlovic, S. Kumar, "A New Baseline for Image Annotation," Proceedings of the European Conference on Computer Vision, pp. 316-329, .
[38] Y. Xiang, X. Zhou, T.-S. Chua, C.-W. Ngo, "A revisit of generative model for automatic image annotation using markov random fields," Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2009, pp. 1153-1160, .
[39] G. Carneiro, A. B. Chan, P. J. Moreno, N. Vasconcelos, "Supervised learning of semantic classes for image annotation and retrieval," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 29 no. 3, pp. 394-410, DOI: 10.1109/TPAMI.2007.61, 2007.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2018 Hongwei Ge et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
Automatic image annotation is for more accurate image retrieval and classification by assigning labels to images. This paper proposes a semisupervised framework based on graph embedding and multiview nonnegative matrix factorization (GENMF) for automatic image annotation with multilabel images. First, we construct a graph embedding term in the multiview NMF based on the association diagrams between labels for semantic constraints. Then, the multiview features are fused and dimensions are reduced based on multiview NMF algorithm. Finally, image annotation is achieved by using the new features through a KNN-based approach. Experiments validate that the proposed algorithm has achieved competitive performance in terms of accuracy and efficiency.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer