GAP: Geometric Aggregation of Popularity Metrics

Full text

Turn on search term navigation

1. Introduction

Popularity is without a doubt an abstract notion that is used to express how much attention a certain item, person, or concept has received lately. Today, the estimation of an entity’s popularity is desirable in many areas such as music [1], social media [2], science [3], cinema [4], and the Internet [5]. The temporal dynamical patterns of popularity gain vary from entity to entity and can exhibit either viral or steady behavior [6]. Additionally, when multiple metrics concerning performance in general are available for each entity, an optimal approach for the aggregation of the metrics or the rankings is certainly of interest [7,8]. Many such methods have been proposed in the multi-criteria decision analysis (MCDA) research literature [9,10].

This study particularly focuses on the estimation of music artist popularity. For music related products, the traditional way to measure their popularity has been through sales and music top charts. Currently, there is an abundance of online sources that we can draw data from, including streams, downloads, and queries related to music tracks, albums, artists, and musical genres. The consideration of these modern sources as popularity metrics is reasonable for a number of reasons. The music consumer interest is directed to online music sources rather than the traditional record stores and the purchasing of physical albums. Furthermore, not all countries release charts, or if they do, they may not be easy to obtain, so the comparability among countries is hard with the traditional methods of popularity determination. Therefore, we consider web-based artist popularity metrics, such as YouTube views, Spotify popularity, and Facebook mentions, for aggregation.

Determining the popularity of a music track, artist, or genre has attracted increased research interest during the last few years. Many ways to define music popularity have been proposed making use of the online available information from posts on microblog websites [11,12,13,14] and in the blogosphere [15], search queries and the number of shared files in peer-to-peer networks [13,16], play counts in social media music sites such as Last.fm [12,17], the amount of time of radio play, the music industry awards that it received [18], and popularity indices provided by streaming platforms such as Spotify [17]. Of course, the traditional ways of determining music popularity such as the Billboard Magazine chart are also used for comparison with the modern web-based popularity indices [16,19]. In [18], the authors claimed that three factors, the music acoustic content, the artist’s reputation, and the number of comments regarding the track, in synergy are able to classify a music track as popular or not, with high accuracy. Furthermore, the level of public recognition of a music track has been investigated providing a different aspect in the evaluation of music entities [20].

Although many studies have been conducted on the estimation of artist popularity, the determination of an evaluation method for such popularity scores remains a challenge as no general agreement regarding an acceptable ground truth has been established. This leads researchers to evaluation through comparison with several other existing popularity metrics such as Spotify popularity, page counts, and the charts. In Table 1, we present the evaluation methods (and ground truth) followed by research papers for their proposed popularity scores.

Moreover, according to our knowledge, all popularity scores that have been proposed in the research literature until today are univariate, while the method that we propose herein is the first to combine several diverse sources and metrics of popularity in order to summarize the whole picture of an entity’s popularity. Although the most natural choice for metric aggregation is a simple average, the handling of many different sources is clearly not obvious and might be useful to evaluate and compare other non-linear methods as well. Furthermore, being popular with regard to one or some of the monitored metrics is sufficient to characterize an entity as popular; hence, the robustness against such cases is desirable when using a metric aggregation method. Our method leverages the area of geometrical shapes formed by the metrics’ values in a non-linear manner; thus, we name it Geometric Aggregation of Popularity metrics (GAP0 and GAP1 are two variations of the same concept). Finally, we conduct a comparative study including the average normalized metric value and two other non-linear metric aggregation methods.

The rest of the paper is structured as follows. In Section 2, the proposed methodology is illustrated. In Section 3, the experimental setup is elaborated and results are presented, and in Section 4, conclusions are given.

2. Geometric Aggregation of Popularity Metrics 2.1. Definition

Here, we propose an aggregation method that leverages multi-source web-based information in order to assess the level of an entity’s current popularity. In order to determine the popularity of entity e at time t, we first normalize the respective metric values_ve,t,ifori=1,…,n (where n is the number of monitored metrics for the entity under study) to [0, 1] using a power transformation as in Equation (1):

_me,i,t=_ve,t,iPT

where T is the chosen maximum power transformed value, cf. below,_ve,t,iis the initial value of metric i at time t for entity e,P=log(T)log(_Vt,i)=lo_{g_Vt,i}(T)with_Vt,ithe maximum of_ve,t,iover all entities e, and_me,i,tthe normalized metric value. The choice of the exponent P derives from the observation that_Vt,iP=Tand^0P=0, which result in_me,i,t∈[0,1], given that_ve,t,i∈[0,_Vt,i]. We did not opt for a simple “divide by maximum” or “min-max” normalization because there are metrics with huge variation such as YouTube views that in some cases reach billions, and thus, artists with millions of views would seem non-important. Furthermore, we did not opt for a log transform because there are metrics with a small range such as Spotify popularity, with values from zero to 100. In this case, after the transformation, all normalized values would be between zero and ∼4.61, and a significant Spotify popularity increase, e.g., from 50 to 80 (being 3.91 and 4.38 after log transformation), would not affect the aggregated popularity correspondingly. Power transform alleviates both issues with a relatively high T = 100, which could be optimized if one considers an appropriate ground truth.

After the normalization, we considered the unit circle and n equidistant points_kion it. On each radius from_kito the center, we selected the point_liwith distance_me,i,tfrom_ki. Geometric Aggregation of Popularity metrics (GAP0) is then defined as:

_Eout−_Ein_Eout·100

where_Eoutis the area of the outer regular n-sided polygon determined by_kiand_Einis the area of the inner polygon determined by_li. If an artist performs best on all metrics, the inner polygon would coincide with the circle’s center, and the geometric aggregation of popularity metrics would be 100; while if an artist performs worst on all metrics, the inner polygon would coincide with the outer regular polygon, and the geometric aggregation of popularity metrics would be zero. All other cases result in intermediate values. Of course, different orders of the metrics result in different popularity scores; thus for consistency, we first sorted the metric values and then applied the computations on the sorted sequence of metrics.

In Figure 1a, an example case for the computation of Geometric Aggregation of Popularity metrics (GAP0) is exemplified concerning the artist “The Rasmus” on 2 April 2019, resulting in GAP0 = 62.0. A second approach on Geometric Aggregation of Popularity metrics (GAP1) was to represent the metrics by the sides of the polygon and not by the vertexes. Thus, the inner polygon in this case was the aggregate of n isosceles triangles with side length equal to1−_me,i,t , as depicted in Figure 1b. The popularity was then calculated as in the first approach by applying Equation (2), resulting in GAP1 = 60.4. Furthermore, the simple average of the normalized metrics multiplied by 100 was 38.9.

2.2. Additional Analytical Results on Geometric Aggregation of Popularity Metrics

The calculation of GAP0(m) and GAP1(m), while not straightforward, is actually very simple given the vector of normalized metric values m ={_me,i,t|i=1,⋯,n}, for entity e at time t:

GAP0(m)=1n∑i=1n_me,i,t+_me,i+1,t·(1−_me,i,t)

GAP1(m)=1n∑i=1n2·_me,i,t−_me,i,t2

where_me,n+1,t=_me,1,t.

Proof.

For GAP0(m), the inner polygon’s area is the sum of n triangles’ areas:

∑i=1n12(1−_me,i,t)·(1−_me,i+1,t)·sinθ

whereθ=2π/n.

The outer polygon’s area is the sum of n equal triangles’ areas:n·(12·1·1·sinθ)=n·sinθ2 . Hence, according to Equation (2):

GAP0(m)=n·sinθ2−_∑i=1n12(1−_me,i,t)·(1−_me,i+1,t)·sinθn·sinθ2=1−1n∑(1−_me,i,t)·(1−_me,i+1,t)=∑1−(1−_me,i,t)·(1−_me,i+1,t)n=1n∑i=1n_me,i,t+_me,i+1,t(1−_me,i,t)

For GAP1(m), the inner polygon’s area is the sum of n isosceles triangles’ areas:

∑i=1n12^{(1−_me,i,t)2}·sinθ

The outer polygon’s area is the same as before:

n·sinθ2

Hence,

GAP1(m)=n·sinθ2−∑12^{(1−_me,i,t)2}·sinθn·sinθ2=1−1n∑^{(1−_me,i,t)2}=1n∑i=1n2·_me,i,t−_me,i,t2

□

Furthermore, considering the most natural choice for popularity aggregation, i.e., the average normalized metric values (Average Artist Popularity (AAP)):

AAP(m)=1n∑i=1n_me,i,t

it is remarkable that:

AAP(m)≤GAP1(m)≤GAP0(m)

for all sorted m, with_me,i,t∈[0, 1].

Proof.

The first part of the inequality is pretty straightforward:

_me,i,t≤1⇒_me,i,t2≤_me,i,t⇒0≤_me,i,t−_me,i,t2⇒_me,i,t≤2·_me,i,t−_me,i,t2⇒

1n∑i=1n_me,i,t≤1n∑i=1n2·_me,i,t−_me,i,t2⇒AAP(m)≤GAP1(m)

For the second part of the inequality, we begin with the assumption that m is sorted:

_me,i,t≤_me,i+1,t∀i=1,…,n−1

The difference_Dibetween the methods GAP1 and GAP0 per metric i is:

_Di=2·_me,i,t−_me,i,t2−_me,i,t+_me,i+1,t·(1−_me,i,t)=_me,i,t−_me,i,t2−_me,i+1,t+_me,i+1,t·_me,i,t=_me,i,t(1−_me,i,t)−_me,i+1,t(1−_me,i,t)=(1−_me,i,t)(_me,i,t−_me,i+1,t)≤0,∀i=1,…,n−1

and the corresponding difference fori=nis_Dn=(1−_me,n,t)(_me,n,t−_me,1,t)≥ 0. The total difference between the two models then is:

GAP1(m)−GAP0(m)=1n∑i=1n2·_me,i,t−_me,i,t2−1n∑i=1n_me,i,t+_me,i+1,t·(1−_me,i,t)=

1n∑i=1n2·_me,i,t−_me,i,t2−_me,i,t+_me,i+1,t·(1−_me,i,t)=1n∑i=1n_Di=1n(1−_me,n,t)(_me,n,t−_me,1,t)+∑i=1n−1(1−_me,i,t)(_me,i,t−_me,i+1,t)=

1n(_me,n,t−_me,1,t−_me,n,t2+_me,1,t·_me,n,t+∑i=1n−1(_me,i,t−_me,i+1,t−_me,i,t2+_me,i+1,t·_me,i,t))=

1n(_me,n,t−_me,1,t−_me,n,t2+_me,1,t·_me,n,t+∑i=1n−1_me,i,t−∑i=1n−1_me,i+1,t−∑i=1n−1_me,i,t2+∑i=1n−1_me,i+1,t·_me,i,t)=

1n(_me,n,t−_me,1,t−_me,n,t2+_me,1,t·_me,n,t+_me,1,t−_me,n,t−∑i=1n−1_me,i,t2+∑i=1n−1_me,i+1,t·_me,i,t)=1n(^mT _mr−^mTm)

where_mr=[_me,2,t,_me,3,t,…,_me,n,t,_me,1,t]is m rolled by −1. According to the Cauchy–Schwarz inequality:

|〈m,_mr〉^|2≤〈m,m〉〈_mr,_mr〉⇒^{(^mT _mr)2}≤(^mTm)·(_mrT _mr)=^{(^mTm)2}⟹_me,i,t≥0^mT _mr−^mTm≤0⇒1n(^mT _mr−^mTm)≤0⇒GAP1(m)−GAP0(m)≤0⇒GAP1(m)≤GAP0(m)

□

3. Experimental Setup 3.1. Data Set For this study, our starting point was the list of N = 2349 artists provided by a collaborating record label, called Playground Music. Most of the artists were Swedish, yet artists of several nationalities were also included. For each of these artists, we monitored online popularity metrics from social media and streaming platforms, on a daily basis.

In Table 2, we present the sources and metrics that we used as input to the popularity metric aggregation methods. For each artist, we monitored some or all of these 12 metrics since May 2018, and thus, we could compute the corresponding artist popularity timelines. For Last.fm artist play counts and YouTube channel views, we used as input only the number of plays/views during the last 30 days because the total number may be misleading, in terms of current popularity estimation.

3.2. Competitive Aggregation Methods We employed two non-linear aggregation methods, pertaining to multi-criteria decision analysis and the simple average method (AAP), for evaluation and comparison purposes.

The first non-linear aggregation method was the Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) [10], which takes into account the Euclidean distance of the vector containing an entity’s metric values from the best and the worst possible alternative. The second was the Preference Ranking Organization method for enrichment evaluation (PRO) [9], which takes into account the number of metrics for which an entity outperforms another entity and finally combines all differences in order to compute each entity’s score.

3.3. Evaluation We evaluated all the aggregation methods by comparing the produced artist rankings and actual values with the ground truth using the following measures of similarity:

Spearman’s correlation (_rS)
Pearson’s correlation (_rP)
Mutual Information (MI)
Overall Rank Overlap (ORO) [12]
Spearman’s Footrule distance (F) [7]
Kendall’s tau (_rK)
Kendall’s tau distance (K) [7]

F and K are distance measures, hence the smaller the value, the better, yet all other indices are similarity measures, hence the higher the value, the better. As the ground truth, we used the Last.fm artist play counts and YouTube channel views (summed streams over the last 30 days for both metrics). 3.4. Results In the Introduction, we cited many studies that considered already existing popularity metrics as the ground truth in order to evaluate other popularity scores. We accordingly opted for Last.fm play counts and YouTube channel views (summed streams over the last 30 days) as the ground truth for evaluation purposes. We chose these metrics because we believed that streaming activity reflected artist popularity more accurately than fan count (followers are not always committed to the artist), social media mentions (which are not always related to music), or proprietary “black-box” popularity scores (e.g., Spotify popularity). Furthermore, streaming activity is considered by music business stakeholders as more closely related to artist profits than all other metrics. The five aforementioned aggregation methods, GAP0, GAP1, AAP, TOPSIS, and PRO, were compared and the results are presented here.

In Figure 2, we compare the values of all aggregation methods with the normalized Last.fm artist plays and YouTube channel views with regard to a certain date, being 2 April 2019, using scatter plots. Furthermore, in Table 3, we present the corresponding similarity measures: Pearson correlation (_rP ) and Mutual Information (MI) for the linear and non-linear interrelationship between the aggregation methods and the target variables. We also investigated if the best aggregation method differed significantly from the other methods, in terms of similarity to the target. The statistical significance of the differences was estimated as proposed in [21] (dependent overlapping variables) for Pearson correlation and using a randomization test (We denote byy∈^RNthe target variable, by_x1∈^RNand_x2∈^RNthe under comparison aggregation methods, and by^θ*=I(_x1,y)−I(_x2,y)the test statistic, whereI(·) is the mutual information. Considering an approach similar to the permutation test proposed in [22], the test statistic value_θrof therthMonte-Carlo simulation was computed by the permuted data, which were obtained by pooling_x1and_x2and assigning N of them randomly sampled without replacement to the_x1group. The rest were assigned to the_x2 group. We considered R = 1000 Monte-Carlo simulations for the computation of the p-values, which were then determined by Equation (3):

p−value=|_θr≥^θ*|R

where|·| denotes the cardinality of a set.) for mutual information. To the best of our knowledge, there are many parametric statistical tests for differences in Pearson correlation [23], yet none for differences in mutual information; thus, we opted for the randomization test. It was apparent, from the scatter plots of Figure 2 where the dots are more concentrated and from the correlation analysis of Table 3 where higher similarity scores are illustrated, that all aggregation methods were correlated with Last.fm artist plays to a much higher degree than with YouTube channel views. Thus, we finally chose Last.fm artist plays as the ground truth for our experiments. In Table 4, the similarity of the aggregation methods with Last.fm artist plays on 2 April 2019 is illustrated using all measures of similarity. The statistical significance of the corresponding differences was estimated by Zoo’s method [21] for Pearson correlation and also using the previously described randomization test for all measures of similarity.

In Table 5, the average similarity between Last.fm artist plays and the aggregation methods across time (from 1 July 2018 until 31 May 2019) is exemplified in terms of linear/non-linear correlation and rank correlation/distance. The results showed that GAP1 exhibited the best performance in three out of seven measures of similarity, while AAP in two, GAP0 and PRO in one each, and TOPSIS in zero. Furthermore, the statistical significance of the differences in average similarity was investigated for all similarity measures, using Student’s t-test and by correcting the p-values using the Bonferroni correction for multiple comparisons (α=0.05). (For each similarity measure, we conducted four comparisons (the best aggregation method vs. each of the rest), so4×7=28comparisons were considered, and the 28 corresponding p-values were modified through the Bonferroni correction).

Although the aggregation methods produced similar artist popularities and rankings (not many statistical significant differences were observed), the correlation analysis showed that GAP produced popularity values that were closer to the target than the other aggregation methods when considering the non-linear similarity measure of mutual information and not when considering the linear correlation. This indicated the advantage of GAP to capture more complex popularity patterns than the simple average, which produced higher values only in linear correlation. In terms of ranking, GAP exhibited less distance from the target’s ranking with regard to Spearman’s footrule and Kendall’s tau distance measures and more proximity to the target’s ranking with regard to Kendall’s tau. PRO approximated best the target’s artist ranking with regard to the Spearman correlation coefficient, and both GAP and AAP showed almost identical rankings with regard to overall rank overlap.

In Figure 3, we present the aggregation methods’ timelines for 10 popular artists with the highest discrepancy among the monitored popularity metrics. We focused on artists that exhibited differences in their popularity among different popularity metrics, because otherwise, the aggregation methods would provide the same information as the individual metrics and the comparison among them would not yield noteworthy conclusions. In order to select them, we first uncovered the set A of the 100 most popular artists on a certain date, being 1 April 2019, by sorting the sums of differences between each artist’s metric values and the maximum metric values in our dataset, as shown in Equation (4):

argsort∑i=1n_{m:,i,_t0}−max(_{m:,i,_t0})

where n is the number of metrics,_t0= 1 April 2019,_{m:,i,_t0}is the vector of normalized metric values for metric i, time_t0, and all artists, andmax(v) is the maximum value in vector v. Consequently, we employed Shannon entropy [24] as a measure of discrepancy on the distribution of normalized metric values per artist and selected the 10 artists of set A that exhibited the highest discrepancy, namely lowest entropy, as shown in Equation (5):

argsortE(_{m^a,:,_t0})|a∈A

where_{m^a,:,_t0}is the vector of normalized metric values regarding artist a at time_t0andE(v)is the Shannon entropy computed on vector v. The vector_{m^a,:,_t0}was divided by the sum of its elements in order to sum to one, prior to entropy calculation.

It was observed that these 10 artists retained high aggregated popularity values, in terms of GAP, despite the low level of popularity in some individual metrics, while AAP produced lower popularity values as a result of low popularity in some individual metrics. Furthermore, a more stable trajectory was exhibited by GAP0, GAP1, and AAP compared with TOPSIS and PRO, which were more volatile, which partly explained their inferior performance. The fact that GAP produced higher popularity values when the artist was popular in one or some metrics while not popular in the others was considered as a major advantage comparing to AAP. The reason for that was twofold: (a) first, because it was not common for artists to be popular in all platforms; they tended to be active mainly in one or some of them; and (b) second, because being popular in one or some platforms was sufficient for an artist to be characterized as popular in general.

In Table 6, we present a simulated example in order to showcase this advantage. It was observed that although in most metrics, a low popularity level was exhibited, being popular in Metric 4 enabled GAP to also exhibit a relatively high popularity estimate. On the contrary, AAP assigned a relatively low popularity estimate to the same entity. Finally, in Table 7, three cases of the artists of our dataset with metric values distributed as in the simulated case are exemplified, and the same conclusion was drawn again from these example cases.

4. Discussion In this study, we proposed an aggregation method for popularity metrics that leveraged diverse sources of popularity information such as metrics derived from social media and streaming platforms. This was the first attempt to aggregate multiple popularity sources in the academic literature related to music information retrieval and admittedly yielded satisfactory results on the very useful task of summarizing the whole popularity picture of an artist. Its algorithm used geometrical shapes formatted by the individual metrics’ values of each entity, and it was found to outperform the most natural choice for metric aggregation, being a simple average, with respect to several measures of similarity between the computed metrics and reference data. Furthermore, the proposed aggregation method was robust even when the under study artist was popular only in some of the monitored popularity metrics. Finally, we should mention that our methodology could be extended for use in several other areas such as cinema and football in which actors and players will serve as entities and their social media accounts and other related factors (e.g., tickets/jerseys sold) as metrics. Future work will include the evaluation of all metric aggregation methods on other tasks, such as the prediction of individual metrics’ future values.

Paper	Evaluation Method
Grace et al. 2008	One popularity proxy evaluated by user study (sentiment of comments on artists’ pages in MySpace)
Koenigstein and Shavitt 2009	One popularity proxy compared with the Billboard Hot 100 (P2P search queries from Gnutella)
Schedl et al. 2010	Four popularity proxies compared pairwise (page counts Google-Exalead, Twitter posts, shared folders in Gnutella P2P, Last.fm play counts)
Schedl 2011	One popularity proxy compared with Last.fm’s charts (number of tweets with regard to an artist)
Bellogin et al. 2013	Four popularity proxies compared pairwise (EchoNest score, Spotify popularity, number of Last.fm play counts, number of clicks related to an artist from Bit.ly)
Kim et al. 2014	One popularity proxy used to predict Billboard ranks (number of tweets)

Source	Metric
Deezer	artist fans
Facebook	fans
	mentions
Last.fm	artist listeners
	artist plays (the last 30 days)
ine Soundcloud	artist followers
Spotify	artist followers
	artist popularity
Twitter	user followers
	user listed
YouTube	channel subscribers
	channel views (the last 30 days)

	Last.fm		YouTube
	_rP	MI	_rP	MI
GAP0	0.8086	0.7150	0.5655	0.5084⁵
GAP1	0.8073	0.7164⁴	0.5496	0.4990
AAP	0.8287^2,4	0.7098	0.5539	0.5046
TOPSIS	0.7531	0.6381	0.5908^2,5	0.4912
PRO	0.6325	0.6629	0.4262	0.4484

	_rS	_rP	MI	ORO	F	_rK	K
GAP0	0.8609	0.8086	0.7150	0.8195	0.1526	0.7791	0.1105
GAP1	0.8624	0.8073	0.7164⁴	0.8194	0.1523⁵	0.7799⁵	0.1101⁵
AAP	0.8605	0.8287^2,4	0.7098	0.8195	0.1526	0.7790	0.1105
TOPSIS	0.8315	0.7531	0.6381	0.7938	0.1722	0.7513	0.1244
PRO	0.8656⁵	0.6325	0.6629	0.8069	0.1583	0.7743	0.1128

	_rS	_rP	MI	ORO	F	_rK	K
GAP0	0.8541	0.8047	0.6948⁴	0.8152	0.1568	0.7733	0.1134
GAP1	0.8556	0.8039	0.6921	0.8152	0.1564⁴	0.7741⁴	0.1129⁴
AAP	0.8538	0.8281²	0.6930	0.8152⁴	0.1567	0.7732	0.1134
TOPSIS	0.8240	0.7569	0.6273	0.7892	0.1769	0.7452	0.1274
PRO	0.8566⁵	0.6229	0.6380	0.8031	0.1622	0.7678	0.1161

Metric	Popularity	GAP0	GAP1	AAP
1	0.17	58.1	44.9	32.7
2	0.12
3	0.15
4	0.87

	John Lundvik	Red Hot	Denz
DAF	0.045	0.066	0
FF	0.131	0	0.156
FM	0.140	0	0.036
LAL	0.169	0.109	0.135
LAP	0.358	0.026	0.208
SCAF	0.017	0.299	0.062
SPAF	0.163	0.036	0.191
SAP	0.806	0.020	0.755
TUF	0.089	0	0
TUL	0.034	0	0
YCS	0.080	0.642	0.140
YCV	0.088	0.700	0.261
GAP0	31.5	26.0	29.4
GAP1	27.9	23.3	25.9
AAP	17.7	15.8	16.2

Author Contributions

Data curation, C.K. and M.S.; formal analysis, C.K.; funding acquisition, S.P. and I.K.; investigation, C.K.; methodology, C.K.; project administration, M.S., S.P., and I.K.; software, C.K. and M.S.; supervision, S.P. and I.K.; validation, C.K.; visualization, C.K.; writing, original draft, C.K.; writing, review and editing, C.K. and S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work is partially funded by the European Commission under Contract Number H2020-761634 FuturePulse.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; nor in the decision to publish the results.

Word count: 4493

Show less

© 2020. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Estimating and analyzing the popularity of an entity is an important task for professionals in several areas, e.g., music, social media, and cinema. Furthermore, the ample availability of online data should enhance our insights into the collective consumer behavior. However, effectively modeling popularity and integrating diverse data sources are very challenging problems with no consensus on the optimal approach to tackle them. To this end, we propose a non-linear method for popularity metric aggregation based on geometrical shapes derived from the individual metrics’ values, termed Geometric Aggregation of Popularity metrics (GAP). In this work, we particularly focus on the estimation of artist popularity by aggregating web-based artist popularity metrics. Finally, even though the most natural choice for metric aggregation would be a linear model, our approach leads to stronger rank correlation and non-linear correlation scores compared to linear aggregation schemes. More precisely, our approach outperforms the simple average method in five out of seven evaluation measures.

Details

Title

GAP: Geometric Aggregation of Popularity Metrics

Author

Koutlis, Christos; Manos Schinas; Papadopoulos, Symeon

; Kompatsiaris, Ioannis

First page

323

Publication year

2020

Publication date

2020

Publisher

MDPI AG

e-ISSN

20782489

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/info11060323

ProQuest document ID

2414917408

GAP: Geometric Aggregation of Popularity Metrics

Jump to:

Full text

Abstract

Details

Suggested sources