Corrected Evolutive Kendall’s τ Coefficients for

Full text

Turn on search term navigation

1. Introduction

The analysis of rankings of scores (cardinal rankings) or, particularly, rankings composed of natural numbers (ordinal rankings), have been studied from different perspectives attending to the ultimate goal of the researchers or practitioners (see [1]). When the interest is on obtaining a consensus score that summarizes the opinion of various judges, the used mathematical tools are usually aimed to find a ranking that minimizes a given distance metric (see the seminal paper [2,3] for some properties of different metrics). In such a case, we say that a distance metric minimizes disagreement. We can place in this area the methods called voter systems, ranking aggregation, and others (see the detailed review in [4]).

When the interest is focused on comparing two series of rankings, one of the key points is to obtain a measure that describes the evolution of the series. In this case, we have a series of rankings such that each one of them prioritizes the elements based on the scores obtained at a particular time (see [5]). For example, sports rankings belong to this category. Obviously, at the end of a season, there is no need to find a consensus ranking since, by the nature of sports leagues, it is the last ranking that serves to summarize the result of the overall season. The same happens with the Stock Market, the richest people rankings made by the Fortune magazine [6], university rankings (e.g., [7,8]), songs rankings based on the number of downloads, streaming, or sales (see [9]), etc. Our work is focused on a series of rankings behavior.

The terminology applied to rankings is not unique. For example, in [10] the term partial is used to indicate rankings in which ties are presented, while in [11] the term partial indicates that not all the objects are compared. In this paper, we use the terminology coined in [4,12]. We talk of complete rankings when all the objects are compared (as in a football league) and incomplete when there are absent objects (as in a Top k ranking). We explicitly use the terms with ties or without ties to indicate whether we consider the presence of tied objects in the rankings. We recall that in [11] the term linear order is used when all objects are compared and no ties are allowed (that is, for us, complete rankings with no ties) and the term weak ordering when all objects are compared, but ties are allowed (that is, for us, complete rankings with ties).

Incomplete rankings appear in multiple areas. For example, in national or European grant calls, judges evaluate only a subset of the applications, and therefore each judge handles an incomplete ranking. The same happens in literary contests, where each judge only reads a small number of manuscripts. In the case of the results shown by search engines, it is clear that only the first Top k web pages are displayed, being, as a consequence, an incomplete ranking.

We use, and extend, the results of some previous papers. Some concepts are taken from [5], where a method to compare series of complete rankings with no ties was presented, and from [13], where a method to compare series of complete rankings with ties was analyzed. We also make reference to [14], where some theoretical aspects where studied. In all these works, there are two main ingredients:

1.. The use of generalizations of the classical concept of Kendall’s $τ$ coefficient of disagreement [15,16,17];
2.. The use of graphs associated to the series of rankings as a tool to visualize and also to help in the definition the coefficients that summarize the “behaviour” of the series of rankings.

Regarding to extensions of Kendall’s $τ$ coefficient, the first attempt to incorporate an axiomatic distance metric was in [2], followed by the works [11,18,19].

More recently, in [4] these previous works were revised and a new axiomatic framework for incomplete rankings was introduced. To the best of our knowledge, the last paper devoted to an axiomatic study for incomplete rankings is [12], where it is shown as an extension of Kendall’s $τ$ coefficient to the case of incomplete rankings with ties.

Kendall’s $τ$ has been extensively used, and some extensions can be found in the literature up to the present day on [10,12,20]. In particular, Kendall’s $τ$ has been recently reviewed for ophthalmic research in [21] and it is a tool used in neuroscience studies—e.g., [22]—and in bioinformatics [23].

Regarding the use of graphs to represent a series of rankings, we recall, in particular, that a graph can be used to describe the crossings between two rankings. This graph is called a permutation graph (see [24,25]). When a graph is defined to show the consecutive crossings between a series of m rankings, it is called a Competitivity graph [5]. This concept corresponds to that of intersection graph of a concatenation of permutation diagrams in graph theory (see [26]). For more relations on graphs associated with rankings, see [14].

In this paper, we take some results of [4,12] as our starting point to develop two coefficients to describe the evolution of a series of $m \geq 2$ incomplete rankings with ties. When applied to the case of only two rankings, our measures reduce to the measures given in [4,12].

We also extend the study of a series of complete rankings with ties developed in [13] to the case of incomplete rankings with ties. We make use of the standard modern notation in the field of rankings mainly based on [10,12,27], among others.

We take as our starting point the definition of $τ_{x}$ of [12] that is based on the computation of a certain sum of the form $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j}$ that involves the terms of some matrix A and B that indicate the relative positions of the elements of two rankings. In Theorem 1, we give an expression of this sum as a function of the type of interactions between a pair of elements ${i, j}$ from one ranking to the next one (e.g., interchanges from tie to untie, absence of one of the elements in one ranking, crossings, etc.). This result allows for writing $τ_{x}$ (and ${\hat{τ}}_{x}$ ) in terms of the interactions of the elements of the rankings.

On the one hand, this theoretical result also allows a computation of the sum $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j}$ without computing explicitly the involved matrices. On the other hand, it allows for interpreting the interactions of a series of rankings by using a permutation graph or, more generally speaking, a competitivity graph. The edges are weighted to represent the weight of the corresponding interactions and the whole series of rankings.

We define two coefficients $τ_{e v}^{•}$ and ${\hat{τ}}_{e v}^{•}$ for series of incomplete rankings with ties by using an analogy based on previous well-established definitions. We recall that, in the field of incomplete rankings, “intuition” is usually used for some measures over others since when you handle an incomplete ranking, there is no unique form to interpret the results (see this kind of reasoning in [4,12]). In our case, our measures’ behaviour is checked by ensuring that they are well normalized and that they reduce to well-known cases in limit situations.

Finally, other contributions of the paper are placed on a practical field. We give a methodology to study the movements of rankings (of songs) in Spotify by using two different approaches: the cases of series of incomplete rankings without ties and series of incomplete rankings with ties.

The structure of the paper is as follows. In Section 2, we recall Kendall’s $τ$ and give the fundamental relations that will be useful throughout the paper. In Section 3, we recall the notation and basic results for the case of two incomplete rankings with ties allowed.

In Section 4, we give the fundamental theoretical result of the paper and some remarks that give insight both into the validity and application of this result. In Section 5, we recall some definitions from [13] to measure the evolution of m complete rankings with ties. In Section 6, we present two coefficients, denoted as $τ_{e v}^{•}$ and ${\hat{τ}}_{e v}^{•}$ to characterize the evolution of m incomplete rankings with ties and some examples are given. In Section 7, we illustrate the applicability of the new coefficients by using some real data obtained from Spotify charts. Finally, in Section 8, we outline the main conclusions of the paper.

2. Preliminaries

In [16] it is shown that Kendall’s $τ$ coefficient (also called measure of disarray) associated with two rankings with the same number of elements n, can be written in the form

(1) $τ = 1 - \frac{2 s}{\frac{1}{2} n (n - 1)}$

where s is the minimum number of interchanges required to transform one ranking into the other. This coefficient is a measure of the intensity of rank correlation. The coefficient can also be written as

(2) $τ = \frac{P - Q}{\frac{1}{2} n (n - 1)}$

where P is the number of pair of elements that maintain its relative order when passing from the first ranking to the second one (that is, the first element is above or below the second in both rankings) and Q is the number of pairs of elements that interchange its order (that is, in one ranking, the first element is above the second and, in the other ranking, the first element is below the second, or vice-versa).

Note that Q and s are equal. Furthermore, this quantity can be identified with the number of crossings or inversions when passing from the first ranking to the second. For this reason, throughout the paper, we will keep in mind that Equation (1) gives the equivalence between the number of crossings and the associated $τ$ . This will be important in what follows since we will deal with different extensions of Kendall’s $τ$ coefficient and since one of our preferred tools will be counting the number of crossings, as in [5].

We recall from [27] that a distance metric $d (a, b)$ can be transformed into a correlation coefficient $τ (a, b)$ by the formula

(3) $τ (a, b) = 1 - \frac{2 d (a, b)}{d_{m a x} (a, b)}$

where

d_{m a x} (a, b)

is the maximum possible distance between two rankings. We recall that a distance metric between two rankings

a

and

b

is a non-negative real function f, such that it is symmetric (

f (a, b) = f (b, a)

, for any pair of rankings), regular (

f (a, b) = 0 \leftrightarrow a = b

) and satisfying the triangle inequality (

f (a, c) \leq f (a, b) + f (b, c)

, for any rankings a, b, and c). Note that Equation (1) is of this form, since

n (n - 1) / 2

is the maximum number of crossings between two given rankings. The same happens with the Spearman’s

ρ

coefficient. In [16] the Spearman’s

ρ

for two ordinal complete rankings

x = (x_{1}, x_{2}, \dots, x_{n})

and

y = (y_{1}, y_{2}, \dots, y_{n})

with

x_{i}, y_{i} \in N

is defined by

$ρ = 1 - \frac{6 \sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}{n^{3} - n}$

and this is of the form (3) since it is easy to show that the maximum value of

\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}

occurs when one ranking is the reverse of the other and, as a consequence, the maximum value of the distance metric

d (x, y) = \sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}

\frac{1}{3} (n^{3} - n)

(see [3] for this and other properties of distance metrics).

We also recall that a permutation graph (called competitivity graph in [5]) is associated with two rankings over the same elements in such a way that the nodes represent the elements and two nodes and are connected with an edge if they cross their positions when passing from one ranking to the other.

In this way, it is clear that the number of edges of this graph is, precisely, s. Furthermore, another quantity (borrowed from graph theory) is also introduced in [5]: the Normalized Mean Strength $N S$ ; that is, the normalized sum of the weights of the edges of a weighted graph. When considering only two rankings and its corresponding competitivity graph, we have the following relation

(4) $N S = \frac{1 - τ}{2}$

that gives the equivalence between the Normalized Mean Strength and Kendall’s

τ

for two rankings. Note that

τ \in [- 1, 1]

and

N S \in [0, 1]

. We consider that the measure

N S

is more intuitive than

τ

since it allows us to interpret the movements or activity of a series of rankings as a percentage.

3. Coefficients for Two Incomplete Rankings with Ties

In this section, we recall some definitions used in [4,12]. We will use the next three ingredients in order to define a coefficient to compare two rankings:

1.. A vector to define the ordinal ranking (including the description of absent elements and tied elements);
2.. A matrix to indicate the relative positions of the elements of the ranking (including absent and tied elements);
3.. A formula to define the coefficients for a pair of rankings by using the entries of their associate matrices defined in the previous step.

Let $V = {v_{1}, v_{2}, \dots, v_{n}}$ be the objects to be ranked, with $n > 1$ . The ranking is given by

(5) $a = [a_{1}, a_{2}, \dots, a_{n}]$

where

a_{i}

is the position of

v_{i}

in the ranking. Note that if

a_{i} = a_{j}

, then

v_{i}

and

v_{j}

are tied. If

v_{i}

is not ranked, then it is denoted as

a_{i} = •

. We also define the set

$V_{a} = {v_{i} \in V | a_{i} \neq •} .$

We define an $n \times n$ matrix $A = (A_{i j})$ , with entries $A_{i j}$ associated to $a$ as follows:

(6) $A_{i j} = \{\begin{matrix} 1 & if a_{i} \leq a_{j} \\ - 1 & if a_{i} > a_{j} \\ 0 & if i = j, a_{i} = •, or a_{j} = • \end{matrix}$

According to [12], we define the coefficients

(7) $τ_{x} (a, b) = \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j}}{n (n - 1)}$

and, when

\bar{n} > 1

(8) ${\hat{τ}}_{x} (a, b) = \frac{n (n - 1)}{\bar{n} (\bar{n} - 1)} τ_{x} (a, b)$

where

\bar{n}

is the number of common ranked elements

v_{i}

a

and

b

. That is:

(9) $\bar{n} = | V_{a} \cap V_{b} |$

Example 1.

Let $V = {1, 2, 3, 4, 5, 6, 7, 8}$ , and let us consider two rankings $a$ and $b$ . Then, $a = [6, 4, 5, 5, •, 2, 1, 3]$ represents the incomplete ranking with ties $(7, 6, 8, 2, 3 - 4, 1)$ , where $3 - 4$ indicate tied elements. Analogously, $b = [3, 3, 2, 2, •, 1, •, 4]$ represents the ranking $(6, 3 - 4, 1 - 2, 8)$ . Note that $n = 8$ and $\bar{n} = 6$ .

Note that $τ_{x}$ with complete rankings and no ties reduces to the classic Kendall’s $τ$ given by (1), while ${\hat{τ}}_{x}$ is a renormalization of $τ_{x}$ , verifying $| {\hat{τ}}_{x} | \geq | τ_{x} |$ .

As we will see, Definition 6 in Section 6, is based on an analogy with Equation (1). To that end, it will be necessary to count all the possible cases when passing from $a$ to $b$ (interactions between the relative positions of pair of elements such as crossings, pass from tie to untie, from being in the ranking to quitting it, etc.). We do this in the next section.

4. Main Result

The following result is the fundamental theoretical result of this paper. This result will allow us to write $τ_{x}$ and ${\hat{τ}}_{x}$ in terms of the interactions of the rankings’ elements. It opens the possibility of giving weights to the interactions, as is a common practice in modern definitions of Kendall’s tau [10]. This result also constitutes our starting point to define a coefficient for a series of more than two incomplete rankings. This theorem also allows giving insight into the differences between $τ_{x}$ and ${\hat{τ}}_{x}$ . Some other consequences are detailed in the remarks below and in Corollary 1.

Theorem 1.

Given two vectors $a$ , $b$ representing incomplete rankings of n elements with ties, represented as in (5), and their corresponding matrices $A = (A_{i j})$ and $B = (B_{i j})$ defined by (6), it holds that

(10) $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = n (n - 1) - 4 s - 2 n_{t u} - 2 N_{i n c}$

where

(11) $N_{i n c} = (\binom{n_{• •}}{2}) + (\binom{n_{* •}}{2}) + (\binom{n_{• *}}{2}) + n_{• •} (n_{* •} + n_{• *} + n_{* *}) + n_{* *} (n_{* •} + n_{• *}) + n_{* •} n_{• *}$

s is the number of crossings—that is, the number of pairs ${i, j}$ —such that $a_{i} < a_{j}$ and $b_{i} > b_{j}$ , or $a_{i} > a_{j}$ and $b_{i} < b_{j}$ .
$n_{t u}$ is the number of pairs that are tied in only one ranking (from tie to untie or viceversa), that is, such that $a_{i} = a_{j}$ and $b_{i} \neq b_{j}$ , or $a_{i} \neq a_{j}$ and $b_{i} = b_{j}$ .

In the definitions of s, and $n_{t u}$ , it is assumed that $a_{i}$ and $b_{i}$ are different from •. For the cases when one or more • may appear, the following notation holds:

$n_{• •}$ is the number of entries such that $a_{i} = b_{i} = •$ ;
$n_{• *}$ is the number of entries, such that $a_{i} = •$ and $b_{i} \neq •$ ;
$n_{* •}$ is the number of entries, such that $a_{i} \neq •$ and $b_{i} = •$ .

Finally, it is also needed to define $n_{* *}$ as the number of entries, such that $a_{i} \neq •$ and $b_{i} \neq •$ .

Proof of Theorem 1.

For each pair ${i, j}$ we will evaluate each term $A_{i j} B_{i j} + A_{j i} B_{j i}$ in the expression $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j}$ . The case $i = j$ gives $A_{i i} B_{i i} + A_{i i} B_{i i} = 0$ .

Thus, we focus on pairs ${i, j}$ with $i \neq j$ . There is a total number of $n (n - 1) / 2$ of these pairs. It is useful to consider the basic cell of the pair ${i, j}$ with $i < j$ .

$(\begin{matrix} a_{i} & b_{i} \\ a_{j} & b_{j} \end{matrix})$

where

a_{k}

and

b_{k}

can be natural numbers or a • if the element k is not ranked in

a

b

Let us study first the cases that can appear when no • is present in the basic cell.

The Complete Case (C):

That is $a_{k} \neq •, b_{k} \neq •$ , for all $k \in {1, 2, \dots n}$ . We distinguish four types of basic cells.

Type C.1: Not crossing, and no ties in a nor in b.

For example:

$(\begin{matrix} 1 & 3 \\ 2 & 4 \end{matrix}) or (\begin{matrix} 2 & 4 \\ 1 & 3 \end{matrix}) .$

So that, we have

a_{i} \neq a_{j}

and

b_{i} \neq b_{j}

and two cases can appear:

C.1.1.. If $a_{i} < a_{j} and b_{i} < b_{j}$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = 1 \cdot 1 + (- 1) \cdot (- 1) = 2$ .
C.1.2.. If $a_{i} > a_{j} and b_{i} > b_{j}$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = (- 1) \cdot (- 1) + 1 \cdot 1 = 2$ .

Type C.2: Crossing.

For example:

$(\begin{matrix} 1 & 4 \\ 2 & 3 \end{matrix}) or (\begin{matrix} 2 & 3 \\ 1 & 4 \end{matrix}) .$

Again, we have $a_{i} \neq a_{j}$ and $b_{i} \neq b_{j}$ and two more cases can appear:

C.2.1.. If $a_{i} < a_{j} and b_{i} > b_{j}$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = 1 \cdot (- 1) + (- 1) \cdot 1 = - 2$ .
C.2.2.. If $a_{i} > a_{j} and b_{i} < b_{j}$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = (- 1) \cdot (1) + 1 \cdot (- 1) = - 2$ .

Type C.3: From tie to untie or viceversa.

For example:

$(\begin{matrix} 1 & 3 \\ 1 & 4 \end{matrix}), (\begin{matrix} 1 & 4 \\ 1 & 3 \end{matrix}), (\begin{matrix} 3 & 1 \\ 4 & 1 \end{matrix}), or (\begin{matrix} 4 & 1 \\ 3 & 1 \end{matrix})$

We have $a_{i} = a_{j} and b_{i} \neq b_{j}$ or $a_{i} \neq a_{j} and b_{i} = b_{j}$ . Therefore, four cases can appear:

C.3.1.. If $a_{i} = a_{j} and b_{i} < b_{j}$ then $A_{i j} B_{i j} + A_{j i} B_{j i} = 1 \cdot 1 + 1 \cdot (- 1) = 0$ .
C.3.2.. If $a_{i} = a_{j} and b_{i} > b_{j}$ then $A_{i j} B_{i j} + A_{j i} B_{j i} = 1 \cdot (- 1) + 1 \cdot 1 = 0$ .
C.3.3.. If $a_{i} < a_{j} and b_{i} = b_{j}$ then $A_{i j} B_{i j} + A_{j i} B_{j i} = 1 \cdot 1 + (- 1) \cdot 1 = 0$ .
C.3.4.. If $a_{i} > a_{j} and b_{i} = b_{j}$ then $A_{i j} B_{i j} + A_{j i} B_{j i} = (- 1) \cdot 1 + 1 \cdot 1 = 0$ .

Type C.4: From tie to tie.

For example:

$(\begin{matrix} 1 & 2 \\ 1 & 2 \end{matrix})$

That is, we have:

a_{i} = a_{j} and b_{i} = b_{j}

, and then

A_{i j} B_{i j} + A_{j i} B_{j i} = 1 \cdot 1 + 1 \cdot 1 = 2

We denote the number of pairs of each case using the terminology of Table 1. Note that $n_{t t}$ is the number of pairs that are tied in both rankings, that is, such that $a_{i} = a_{j}$ and $b_{i} = b_{j}$ . Note also that $n_{t u}$ is the number of pairs that go from tie to untie or viceversa.

The Incomplete Case (I):

There is at least one • in the basic cell. In other words, there is some k such that $a_{k} = •$ , or $b_{k} = •$ , or both. We distinguish seven cases:

Type I.1: Four • That is $a_{i} = a_{j} = b_{i} = b_{j} = •$ , or graphically

$(\begin{matrix} • & • \\ • & • \end{matrix})$

Then $A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0$ . Let us denote by $n_{• •}$ the number of null rows that appear in the matrix with columns $a$ and $b$ . Therefore, we have $(\binom{n_{• •}}{2})$ pairs ${i, j}$ of this type.

Type I.2: Three •. That is, a cell of one of these forms

$(\begin{matrix} • & • \\ * & • \end{matrix}), (\begin{matrix} * & • \\ • & • \end{matrix}), (\begin{matrix} • & • \\ • & * \end{matrix}), or (\begin{matrix} • & * \\ • & • \end{matrix})$

where ∗ is a number (not a •). Therefore, we have four cases, but all are similar to this one:

a_{i} \neq •

and

a_{j} = b_{i} = b_{j} = 0

. Then,

A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0

Denoting $n_{* •}$ the number of rows of the form $(* •)$ in the $n \times 2$ matrix $(a b)$ , and $n_{• *}$ the number of rows of the form $(• *)$ in the same matrix, it is clear that the number of pairs ${i, j}$ of this type is: $n_{• •} (n_{* •} + n_{• *})$ .

Type I.3: Two •, one on each ranking. That is, any cell of one of these forms

$(\begin{matrix} • & • \\ * & * \end{matrix}), (\begin{matrix} * & * \\ • & • \end{matrix}), (\begin{matrix} • & * \\ * & • \end{matrix}), or (\begin{matrix} * & • \\ • & * \end{matrix})$

These four cases can be reduced to two:

I.3.1.. If $a_{i} = b_{i} = •, a_{i} \neq • and b_{j} \neq •$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0$ .
I.3.2.. If $a_{i} = •, a_{j} \neq •, b_{i} \neq • and b_{j} = •$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0$ .

Denoting by $n_{* *}$ the number of rows of the form $(* *)$ in the $n \times 2$ matrix $(a b)$ , it is clear that the number of pairs ${i, j}$ of this type is $n_{• •} n_{* *} + n_{* •} n_{• *}$ .

Type I.4: Tied in one ranking and two • in the other. For example,

$(\begin{matrix} 1 & • \\ 1 & • \end{matrix}), (\begin{matrix} • & 1 \\ • & 1 \end{matrix})$

That is, we have two cases, which are similar to this

a_{i} = a_{j} and b_{i} = b_{j} = •

, and then

A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0

Let us denote by $n_{a}$ the number of different natural numbers in $a$ and by $n_{b}$ be the number of different natural numbers in $b$ . Let $n_{i •}$ be the number of rows of the form $(i, •)$ in that matrix, for $i = 1, \dots, n_{a}$ and, analogoulsly, let $n_{• i}$ be the number of rows of the form $(•, i)$ in the matrix $(a b)$ for $i = 1, \dots, n_{b}$ . Then, it is straightforward to see that the number of cases of this type is given by

$\sum_{i = 1}^{n_{a}} (\binom{n_{i •}}{2}) + \sum_{i = 1}^{n_{b}} (\binom{n_{• i}}{2}) .$

Type I.5: Tied in one ranking, one • in the other. For example

$(\begin{matrix} 1 & • \\ 1 & 2 \end{matrix}), (\begin{matrix} 1 & 2 \\ 1 & • \end{matrix}), (\begin{matrix} • & 1 \\ 2 & 1 \end{matrix}), (\begin{matrix} 2 & 1 \\ • & 1 \end{matrix}) .$

We have the following 4 cases:

I.5.1.. If $a_{i} = a_{j} and b_{i} = • and b_{j} \neq •$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0$ .
I.5.2.. If $a_{i} = a_{j} and b_{i} \neq • and b_{j} = •$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0$ .
I.5.3.. If $a_{i} = • and a_{j} \neq • and b_{i} = b_{j}$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0$ .
I.5.4.. If $a_{i} \neq • and a_{j} = • and b_{i} = b_{j}$ , then $A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot 0 + 0 \cdot 0 = 0$ .

Let $n_{i *}$ be the number of rows of the form $(i, *)$ (where ∗ can be i) in the same matrix, with $i \in {1, 2, \dots n_{a}}$ .

Analogously, let $n_{* i}$ be the number of rows of the form $(*, i)$ (where ∗ can be i) in the matrix $(a b)$ . Then, it is straightforward to see that the number of cases of this type is given by

$\sum_{i = 1}^{n_{a}} n_{i *} n_{i •} + \sum_{i = 1}^{n_{b}} n_{* i} n_{• i} .$

Type I.6: Two • in one ranking and different numbers in the other.

For example

$(\begin{matrix} 1 & • \\ 2 & • \end{matrix}), (\begin{matrix} • & 2 \\ • & 1 \end{matrix})$

We have here only two cases:

I.6.1.. If $a_{i} \neq a_{j} and b_{i} = b_{j} = •$ then $A_{i j} B_{i j} + A_{j i} B_{j i} = (\pm 1) \cdot 0 + (\pm 1) \cdot 0 = 0$ .
I.6.2.. If $a_{i} = a_{j} = • and b_{i} \neq b_{j}$ then $A_{i j} B_{i j} + A_{j i} B_{j i} = 0 \cdot (\pm 1) + 0 \cdot (\pm 1) = 0$ .

Then, it is easy to see that the number of pairs ${i, j}$ of this type is

$(\binom{n_{* •}}{2}) + (\binom{n_{• *}}{2}) - \sum_{i = 1}^{n_{a}} n_{i *} n_{i •} - \sum_{i = 1}^{n_{b}} n_{* i} n_{• i}$

where we have subtracted the number of cases of the type I.4.

Type I.7: Only one • and no ties.

For example, they are cases of the form

$(\begin{matrix} 1 & 1 \\ 2 & • \end{matrix}), (\begin{matrix} 1 & • \\ 2 & 1 \end{matrix}), (\begin{matrix} 1 & 1 \\ • & 2 \end{matrix}), (\begin{matrix} • & 2 \\ 1 & 1 \end{matrix})$

We can have four cases that are similar to these

If. $a_{i} < a_{j} and b_{i} \neq •, b_{j} = •$ then $A_{i j} B_{i j} + A_{j i} B_{j i} = 1 \cdot 0 + (- 1) \cdot 0 = 0$ .
If. $a_{i} > a_{j} and b_{i} \neq •, b_{j} = •$ then $A_{i j} B_{i j} + A_{j i} B_{j i} = (- 1) \cdot 0 + 1 \cdot 0 = 0$ .

Let $n_{i *}$ be number of rows of the form $(i, *)$ (where ∗ can be i) in the same matrix, with $i \in {1, 2, \dots, n_{a}}$ and, analogously, let $n_{* i}$ be the number of rows of the form $(*, i)$ (where ∗ can be i) in the matrix $(a b)$ , with $i \in {1, 2, \dots, n_{a}}$ . Then, the number of pairs ${i, j}$ of this type is given by

$n_{* *} (n_{* •} + n_{• *}) - \sum_{i = 1}^{n_{a}} n_{i *} n_{i •} - \sum_{i = 1}^{n_{b}} n_{* i} n_{• i}$

where we have subtracted the number of cases of the type I.5.

In Table 2 we overview the number of cases for each type of the incomplete case.

To end the proof, we add the contributions for all the cases, complete (C) and incomplete (I), to the sum $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j}$ and we obtain

(12) $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = 2 n_{n c} - 2 s + 2 n_{t t}$

Now, taking into account that all the cases must amount up to the total number of pairs we have

(13) $\frac{n (n - 1)}{2} = n_{n c} + s + n_{t t} + n_{t u} + N_{i n c}$

where

N_{i n c}

is the sum of all the cases in Table 2. By plugging

n_{n c} = \frac{n (n - 1)}{2} - s - n_{t t} - n_{t u} - N_{i n c}

into (12), we finally get

$\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = n (n - 1) - 4 s - 2 n_{t u} - 2 N_{i n c}$

where

$N_{i n c} = (\binom{n_{• •}}{2}) + (\binom{n_{* •}}{2}) + (\binom{n_{• *}}{2}) + n_{• •} (n_{* •} + n_{• *} + n_{* *}) + n_{* *} (n_{* •} + n_{• *}) + n_{* •} n_{• *}$

□

In the next example, we illustrate the previous result.

Example 2.

Given the rankings $a = [1, •, 2, •, 3, 2, •, •, •, 1]$ and $b = [2, •, 4, 2, •, 1, 3, 3, •, 2]$ , then $n = 10$ , $n_{• •} = 2$ , $n_{• *} = 3$ , $n_{* •} = 1$ , $n_{* *} = 4$ , $s = 2$ (corresponding to the pairs ${1, 6}$ and ${6, 10}$ ), $n_{t u} = 1$ (corresponding to the pair ${3, 6}$ ), $n_{t t} = 1$ (corresponding to the pair ${1, 10}$ ), $n_{a} = 3$ , $n_{b} = 4$ , $n_{1 •} = n_{2 •} = 0$ , $n_{3 •} = 1$ , $n_{• 1} = 0, n_{• 2} = 1$ , $n_{• 3} = 2$ , $n_{• 4} = 0$ , $n_{1 *} = 2, n_{2 *} = 2$ , $n_{3 *} = 0$ , $n_{* 1} = 1, n_{* 2} = 2$ , $n_{* 3} = 0$ , and, $n_{* 4} = 1$ .

From the parameters of Table 3, we obtain $N_{i n c} = 39$ . Thus, it is easy to check that $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = n (n - 1) - 4 s - 2 n_{t u} - 2 N_{i n c} = 2$ as stated in Theorem 1.

The number of pairs ${i, j}$ is 45, corresponding to the following cells

$(\begin{matrix} 1 & 2 \\ • & • \end{matrix}), (\begin{matrix} 1 & 2 \\ 2 & 4 \end{matrix}), (\begin{matrix} 1 & 2 \\ • & 2 \end{matrix}), (\begin{matrix} 1 & 2 \\ 3 & • \end{matrix}), (\begin{matrix} 1 & 2 \\ 2 & 1 \end{matrix}), (\begin{matrix} 1 & 2 \\ • & 3 \end{matrix}), (\begin{matrix} 1 & 2 \\ • & 3 \end{matrix}), (\begin{matrix} 1 & 2 \\ • & • \end{matrix})$

$(\begin{matrix} 1 & 2 \\ 1 & 2 \end{matrix}), (\begin{matrix} • & • \\ 2 & 4 \end{matrix}), (\begin{matrix} • & • \\ • & 2 \end{matrix}), (\begin{matrix} • & • \\ 3 & • \end{matrix}), (\begin{matrix} • & • \\ 2 & 1 \end{matrix}), (\begin{matrix} • & • \\ • & 3 \end{matrix}), (\begin{matrix} • & • \\ • & 3 \end{matrix}), (\begin{matrix} • & • \\ • & • \end{matrix})$

$(\begin{matrix} • & • \\ 1 & 2 \end{matrix}), (\begin{matrix} 2 & 4 \\ • & 2 \end{matrix}), (\begin{matrix} 2 & 4 \\ 3 & • \end{matrix}), (\begin{matrix} 2 & 4 \\ 2 & 1 \end{matrix}), (\begin{matrix} 2 & 4 \\ • & 3 \end{matrix}), (\begin{matrix} 2 & 4 \\ • & 3 \end{matrix}), (\begin{matrix} 2 & 4 \\ • & • \end{matrix}), (\begin{matrix} 2 & 4 \\ 1 & 2 \end{matrix})$

$(\begin{matrix} • & 2 \\ 3 & • \end{matrix}), (\begin{matrix} • & 2 \\ 2 & 1 \end{matrix}), (\begin{matrix} • & 2 \\ • & 3 \end{matrix}), (\begin{matrix} • & 2 \\ • & 3 \end{matrix}), (\begin{matrix} • & 2 \\ • & • \end{matrix}), (\begin{matrix} • & 2 \\ 1 & 2 \end{matrix}), (\begin{matrix} 3 & • \\ 2 & 1 \end{matrix}), (\begin{matrix} 3 & • \\ • & 3 \end{matrix})$

$(\begin{matrix} 3 & • \\ • & 3 \end{matrix}), (\begin{matrix} 3 & • \\ • & • \end{matrix}), (\begin{matrix} 3 & • \\ 1 & 2 \end{matrix}), (\begin{matrix} 2 & 1 \\ • & 3 \end{matrix}), (\begin{matrix} 2 & 1 \\ • & 3 \end{matrix}), (\begin{matrix} 2 & 1 \\ • & • \end{matrix}), (\begin{matrix} 2 & 1 \\ 1 & 2 \end{matrix}), (\begin{matrix} • & 3 \\ • & 3 \end{matrix})$

$(\begin{matrix} • & 3 \\ • & • \end{matrix}), (\begin{matrix} • & 3 \\ 1 & 2 \end{matrix}), (\begin{matrix} • & 3 \\ • & • \end{matrix}), (\begin{matrix} • & 3 \\ 1 & 2 \end{matrix}), (\begin{matrix} • & • \\ 1 & 2 \end{matrix})$

and the number of cases of each type for the incomplete case appearing on Theorem 1 are shown in Table 3.

Remark 1.

By using (10) and (7) we obtain

(14) $τ_{x} = 1 - \frac{4 (s + \frac{1}{2} n_{t u}) + 2 N_{i n c}}{n (n - 1)}$

that can be thought of an extension of (1) to the case of two incomplete rankings with ties. This formula is one of the original contributions of this paper. Note that the term $N_{i n c}$ is known since it is given by (11). This formula will be useful in Section 6 to define our measure of correlation for a series of incomplete rankings with ties.

Remark 2.

For two complete rankings with ties allowed, Equation (10) simplifies to

(15) $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = n (n - 1) - 4 s - 2 n_{t u}$

If we recall the definition of the distance of Kemeny and Snell [2] depending on a matrix $C (a) = C_{i j} (a)$ such that

(16) $C_{i j} (a) = \{\begin{matrix} 1 & if element i is preferred to element j \\ - 1 & if element j is preferred to element i \\ 0 & if i = j, or if both elements i and j are tied \end{matrix}$

by following a similar procedure as in the proof of Theorem 1 it is easy to show that

(17) $\sum_{i j} | C_{i j} (a) - C_{j i} (b) | = 4 s + 2 n_{t u}$

and by using (15) we get

(18) $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = n (n - 1) - \sum_{i j} | C_{i j} (a) - C_{j i} (b) |$

that it is in agreement with the results shown in [27], but we obtain it as a particular case of Theorem 1.

Remark 3.

The common number of ranked elements in $a$ and $b$ that we denote as $\bar{n}$ in (9) is precisely $n_{* *}$ . Moreover, by using that

$n_{• *} + n_{* •} + n_{• •} = n - \bar{n}$

Let us check that $N_{i n c}$ given by (11) can be rewritten as

(19) $N_{i n c} = (\binom{n}{2}) - (\binom{\bar{n}}{2})$

To that end, it is needed to use that $n_{* *} = \bar{n}$ and

(20) $n_{• *} + n_{* •} + n_{• •} = n - \bar{n}$

To see how it is, we first note that

(21) $\begin{matrix} (\binom{n_{• •}}{2}) + (\binom{n_{* •}}{2}) + (\binom{n_{• *}}{2}) & = & \frac{1}{2} [n_{• •}^{2} + n_{* •}^{2} + n_{• *}^{2} - (n_{• •} + n_{* •} + n_{• *})] \\ = & \frac{1}{2} [n_{• •}^{2} + n_{* •}^{2} + n_{• *}^{2} - n + \bar{n}] \end{matrix}$

Second, we can simplify, by using (20)

(22) $n_{• •} (n_{* •} + n_{• *} + n_{* *}) = n_{• •} (n - n_{• •})$

Third, note that, by using (20),

(23) $n_{* *} (n_{* •} + n_{• *}) = \bar{n} n - {\bar{n}}^{2} - \bar{n} n_{• •}$

Now, by using (21)–(23) we have that $N_{i n c}$ given by (11) becomes

$N_{i n c} = \frac{1}{2} n_{• •}^{2} + \frac{1}{2} {(n_{* •} + n_{• *})}^{2} + \frac{1}{2} (\bar{n} - n) + n_{• •} (n - n_{• •} - \bar{n}) + \bar{n} (n - \bar{n})$

and since

$\frac{1}{2} {(n_{* •} + n_{• *})}^{2} = \frac{1}{2} (n^{2} - 2 n \bar{n} + {\bar{n}}^{2} + 2 \bar{n} n_{• •} - 2 n n_{• •} + n_{• •}^{2})$

we get

$N_{i n c} = \frac{1}{2} (\bar{n} - n) + \frac{1}{2} (n^{2} - 2 n \bar{n} + {\bar{n}}^{2}) + \bar{n} n - {\bar{n}}^{2} = \frac{n (n - 1)}{2} + \frac{\bar{n} - {\bar{n}}^{2}}{2}$

that is to say

$N_{i n c} = (\binom{n}{2}) - (\binom{\bar{n}}{2})$

and the proof is done. Note also that, by using (13), we have: $(\binom{\bar{n}}{2}) = n_{n c} + s + n_{t t} + n_{t u}$ .

This last remark motivates the next result.

Corollary 1.

Given two vectors $a$ , $b$ representing incomplete rankings of n elements with ties and their corresponding matrices $A = (A_{i j})$ and $B = (B_{i j})$ , it holds that

(24) $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = \bar{n} (\bar{n} - 1) - 4 s - 2 n_{t u}$

where $\bar{n}$ is the number of common ranked elements in both rankings—see (9)—s is the number of crossings, that is, the number of pairs ${i, j}$ , such that $a_{i} < a_{j}$ and $b_{i} > b_{j}$ or $a_{i} > a_{j}$ and $b_{i} < b_{j}$ , and $n_{t u}$ is the number of pairs that are tied in only one ranking (from tie to untie or viceversa), that is, such that $a_{i} = a_{j}$ and $b_{i} \neq b_{j}$ , or $a_{i} \neq a_{j}$ and $b_{i} = b_{j}$ .

With (24), it is easy to obtain the maximum and minimum of the expression $\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j}$ . When $s = 0$ and $n_{t u} = 0$ we have

$\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = \bar{n} (\bar{n} - 1)$

that is the maximum value of

\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j}

. Analogously, by taking

s = (\binom{\bar{n}}{2})

, that is the maximum number of crossings and consequently

n_{t u} = 0

, we obtain from (24)

$\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j} = \bar{n} (\bar{n} - 1) - 4 (\binom{\bar{n}}{2}) = - \bar{n} (\bar{n} - 1)$

that is the minimum value of

\sum_{i = 1}^{n} \sum_{j = 1}^{n} A_{i j} B_{i j}

. These facts, that are in agreement with the results shown in [12], explain why

{\hat{τ}}_{x}

defined by (8) takes values in

[- 1, 1]

Remark 4.

By using (7) and (24) we obtain

(25) $τ_{x} = \frac{\bar{n} (\bar{n} - 1)}{n (n - 1)} - \frac{4 s + 2 n_{t u}}{n (n - 1)}$

and from (8) and (25) we get

(26) ${\hat{τ}}_{x} = 1 - \frac{4 s + 2 n_{t u}}{\bar{n} (\bar{n} - 1)}$

Remark 5.

As we have pointed out in (3), a distance metric $d (a, b)$ can be transformed into a correlation coefficient $τ (a, b)$ by the formula

(27) $τ (a, b) = 1 - \frac{2 d (a, b)}{d_{m a x} (a, b)}$

Note that in expression (14), when $N_{i n c} \neq 0$ , the quantity $n (n - 1)$ is not the maximum value of the distance metric $d (a, b) = 2 s + n_{t u} + N_{i n c}$ (see Example 6). This problem does not appear with the use of ${\hat{τ}}_{x}$ since, by using (26) we can identify a “distance metric” given by $\hat{d} (a, b) = 2 s + n_{t u}$ and its maximum value is achieved when $s = \bar{n} (\bar{n} - 1) / 2$ (and consequently $n_{t u} = 0$ ) and has the value of

${\hat{d}}_{m a x} = \bar{n} (\bar{n} - 1) / 2$

Therefore, ${\hat{τ}}_{x}$ should be preferred over $τ_{x}$ in terms of normalization (see [12] for other considerations). This fact will be useful for the definition that we will introduce in Section 6.

In the next examples, we illustrate the two previous remarks. Note that when $s = 0$ and $n_{t u} = 0$ then, by (26), ${\hat{τ}}_{x} = 1$ and it is not affected by the presence of • in the rankings. By analogy with (4), we denote the Normalized Mean Strength of a and b as

$N S (a_{1}, a_{2}) = \frac{(1 - τ_{x})}{2}, and \hat{N S} (a_{1}, a_{2}) = \frac{(1 - {\hat{τ}}_{x})}{2} .$

Example 3.

Let $a_{1} = [1, 2, 3, •, •, •]$ and $a_{2} = [1, •, 2, 3, •, •]$ . It is easy to obtain: $N_{i n c} (a_{1}, a_{2}) = 14$ , $τ_{x} (a_{1}, a_{2}) = 0.1556$ , $N S (a_{1}, a_{2}) = 0.4222$ , ${\hat{τ}}_{x} (a_{1}, a_{2}) = 1$ , and $\hat{N S} (a_{1}, a_{2}) = 0.0$ .

Example 4.

Let $a_{1} = [1, 2, 3, 4, •, •]$ and $a_{2} = [1, •, 2, 3, 4, •]$ . It is easy to obtain: $N_{i n c} (a_{1}, a_{2}) = 12$ , $τ_{x} (a_{1}, a_{2}) = 0.2$ , $N S (a_{1}, a_{2}) = 0.4$ , ${\hat{τ}}_{x} (a_{1}, a_{2}) = 1$ , and $\hat{N S} (a_{1}, a_{2}) = 0.0$ .

The next example shows the results when a ranking is compared to itself and its reverse ranking for the case of complete rankings (note that $τ_{x} = {\hat{τ}}_{x}$ since $\bar{n} = n$ ).

Example 5.

Let $a_{1} = [1, 2, 3, 4, 5, 6]$ and $a_{2} = [6, 5, 4, 3, 2, 1]$ . Then

$\begin{array}{c} a_{1} \to a_{1} & a_{1} \to a_{2} \\ N_{i n c} & 0 & 0 \\ τ_{x} & 1.0 & - 1.0 \\ N S & 0.0 & 1.0 \\ {\hat{τ}}_{x} & 1.0 & - 1.0 \\ \hat{N S} & 0.0 & 1.0 \end{array}$

The next example shows that $τ_{x}$ does not take its limit values when the rankings are incomplete and that ${\hat{τ}}_{x}$ is not defined when there are no elements in common in both rankings.

Example 6.

Let $a_{1} = [1, 2, 3, •, •, •]$ , $a_{2} = [•, •, •, 3, 2, 1]$ , $a_{3} = [1, 2, 3, 4, •, •]$ , and $a_{4} = [•, •, 4, 3, 2, 1]$ , Then

$\begin{array}{c} a_{1} \to a_{1} & a_{1} \to a_{2} & a_{3} \to a_{4} \\ N_{i n c} & 12 & 15 & 14 \\ τ_{x} & 0.2 & 0.0 & - 0.0667 \\ N S & 0.4 & 0.5 & 0.5333 \\ {\hat{τ}}_{x} & 1.0 & not defined & - 1.0 \\ \hat{N S} & 0.0 & not defined & 1.0 \end{array}$

Our main practical result in this paper is the definition of a measure to deal not only with two rankings $a_{1}$ and $a_{2}$ , as we have seen so far, but with a series of incomplete rankings with ties ${a_{1}, a_{2}, \dots a_{m}}$ in which, in practical situations, some kind of time evolution is presented (e.g., a sport ranking during a session where there may be ties or inclusion/elimination of teams, charts of songs ordered on a daily/weekly basis, etc.). In order to define this measure, it will be useful to recall some concepts defined for complete rankings.

5. Treatment of More Than Two Complete Rankings. Known Results

To study the evolution of more than two rankings we will use the concept of Kendall distance defined in [10], where some weights were introduced to measure the changes when passing from one ranking to the next. After that, we will recall how to extend this definition to a series of m complete rankings, as in [13].

5.1. Kendall Distance for Complete Rankings with Penalty Parameters

We recall the definition of Kendall distance with penalty parameters p and q from [10,13].

Definition 1.

Let $a$ and $b$ be two complete rankings with ties of the set $N = {1, \dots, n}$ , and penalty parameters $p \in [0, \frac{1}{2}]$ and $q \in [0, \frac{1}{2}]$ . The Kendall distance with penalty parameters p and q is defined as

(28) $K^{(p, q)} (a, b) = \sum_{{i, j} \in N} {\bar{K}}_{i, j}^{(p, q)} (a, b)$

where ${\bar{K}}_{i, j}^{(p, q)} (a, b)$ is computed according to the following cases:

Case 1: If i and j are not tied in $a$ , nor in $b$ . If they cross their positions when passing from $a$ to $b$ then ${\bar{K}}_{i, j}^{(p, q)} = 1$ . Otherwise, ${\bar{K}}_{i, j}^{(p, q)} = 0$ .
Case 2: If i and j are tied in both $a$ and $b$ . Then ${\bar{K}}_{i, j}^{(p, q)} = q$ .
Case 3: If i and j are tied only in one ranking. Then ${\bar{K}}_{i, j}^{(p, q)} = p$ .

Remark 6.

The penalty parameters p and q are bounded and take into account the cases where there exist tied elements in $a$ , in $b$ , or in both. For our purposes of measuring competitiveness, it is reasonable to assign $p = 1 / 2$ , to represent that they are tied in one ranking, and $q = 0$ to represent that they are tied in both of them. These assignments are inspired by [10]. In particular, they proved that $p \in [0.5, 1]$ in order to get that $K^{(p, 0)}$ was a metric.

Remark 7.

Note that, by using the notation introduced in Theorem 1, it is easy to see that

$K_{i, j}^{(p, q)} (a, b) = s + p n_{t u} + q n_{t t}$

where $n_{t t}$ is the number of pairs ${i, j}$ that go from tie to tie. Therefore, by using (14) with $N_{i n c} = 0$ we get

(29) $τ_{x} (a, b) = 1 - \frac{4 K^{(0.5, 0)} (a, b)}{n (n - 1)}$

that is, once more, a relation of the form (3). We see here another consequence of Theorem 1: it opens the possibility of defining new metrics based on putting penalties to the cases $n_{• •}$ , $n_{• *}$ , etc. since it gives an explicit expression on these cases.

With the previous definitions, we can deal with the general case of the study of a series of complete rankings. We do this in the next section.

5.2. Series of Complete Rankings with Ties

In [13], it was shown how to extend Definition 1 to m complete rankings with ties in a natural way. We recall these definitions here because they will be extended in Section 6 to a series of incomplete rankings.

Definition 2.

Given m complete rankings with ties $a_{1}, a_{2}, \dots a_{m}$ of n elements, we define the evolutive Kendall distance with penalty parameters p and q as

(30) $K_{e v}^{(p, q)} (a_{1}, a_{2}, \dots, a_{m}) = \sum_{i = 1}^{m - 1} K^{(p, q)} (a_{i}, a_{i + 1}) .$

When handling m rankings it is natural to include a new case (see [13]) that consists of a series of ties between a crossing (see Example 7 further on). Thus it is convenient to define a new case in the definition of $K_{e v}^{(p, q)} (a_{1}, a_{2}, \dots, a_{m})$ according to the following rule.

Definition 3.

Given m complete rankings with ties $a_{1}, a_{2}, \dots a_{m}$ of n elements, we define the crossing after ties coefficient ${\bar{K}}_{i, j}^{c a t} (a_{1}, a_{2}, \dots, a_{m})$ following the rule

Case 4.. If there exists a maximal set of rankings $a_{t_{1}}, \dots, a_{t_{k}}$ such that for each $ℓ = 1, \dots, k$ the pair ${i, j}$ is not tied in $a_{t_{ℓ}}$ , but is tied in $a_{t_{ℓ} + 1}, a_{t_{ℓ} + 2}, \dots, a_{t_{ℓ} + s}$ , with $s \geq 1$ , it is not tied in $a_{t_{ℓ} + s + 1}$ and, moreover, ${i, j}$ exchange their relative positions between $a_{t_{ℓ}}$ and $a_{t_{ℓ} + s + 1}$ . In this case ${\bar{K}}_{i, j}^{c a t} (a_{1}, a_{2}, \dots, a_{m}) = k$ , where k is the number of rankings in the maximal set of rankings $a_{t_{1}}, \dots, a_{t_{k}}$ verifying the aforementioned property.

Example 7.

Given the rankings with ties

$\begin{array}{c} r_{1} & r_{2} & r_{3} & r_{4} & r_{5} & r_{6} \\ 1 & 1, 2 & 1, 2 & 2 & 1, 2 & 1 \\ 2 & 3 & 3 & 1 & 3 & 2 \\ 3 & 4 & 4 & 3 & 4 & 3 \\ 4 & 4 & 4 & 4 \end{array}$

the corresponding $a_{i}$ are

$\begin{array}{c} a_{1} & a_{2} & a_{3} & a_{4} & a_{5} & a_{6} \\ 1 & 1 & 1 & 2 & 1 & 1 \\ 2 & 1 & 1 & 1 & 1 & 2 \\ 3 & 2 & 2 & 2 & 2 & 3 \\ 4 & 3 & 3 & 3 & 3 & 4 \end{array}$

we have that the only nonzero crossing after ties coefficient is

${\bar{K}}_{1, 2}^{c a t} (a_{1}, a_{2}, \dots, a_{6}) = 2$

since we have the appearance of the two series

$\begin{array}{c} a_{1} & a_{2} & a_{3} & a_{4} \\ 1 & 1 & 1 & 2 \\ 2 & 1 & 1 & 1 \end{array} and \begin{array}{c} a_{4} & a_{5} & a_{6} \\ 2 & 1 & 1 \\ 1 & 1 & 2 \end{array}$

that show a series of ties between a crossing of the pair ${i = 1, j = 2}$ .

By including the cases given by Definition 3 in the sum defined in Definition 2, in [13] a corrected evolutive distance in the following form is defined.

Definition 4.

Given m complete rankings with ties $a_{1}, a_{2}, \dots a_{m}$ of n elements we define the corrected evolutive Kendall distance with penalty parameters p and q as follows:

(31) $K_{c e v}^{(p, q)} (a_{1}, \dots, a_{m}) = K_{e v}^{(p, q)} (a_{1}, \dots, a_{m}) + \sum_{{i, j}} {\bar{K}}_{i, j}^{c a t} (a_{1}, \dots, a_{m}),$

where the summation is over the pairs ${i, j}$ that verify Case 4 in Definition 3.

Following the same argument as in [13], it is easy to show that

(32) $max [K_{c e v}^{(0.5, 0)} (a_{1}, \dots, a_{m})] = \frac{1}{2} (m - 1) n (n - 1)$

Now, in analogy with (3) and (14), the Kendall’s evolutive coefficient $τ_{e v}$ for a series of m complete rankings with ties can be defined as

(33) $τ_{e v} (a_{1}, a_{2}, \dots a_{m}) = 1 - \frac{4 K_{c e v}^{(0.5, 0)} (a_{1}, \dots, a_{m})}{(m - 1) n (n - 1)} \in [- 1, 1]$

With these previous definitions we can present the new coefficients for incomplete rankings with ties.

6. New Coefficients for Series of Incomplete Rankings with Ties

Given a series ${a_{1}, a_{2}, \dots, a_{m}}$ of incomplete rankings with ties, for each pair of rankings $a_{i}$ and $a_{j}$ we can use Definitions 1–4 straightforwardly to also apply for a series of incomplete rankings by assuming that there is no penalty for the case of absent elements (regarding Definitions 1 and 2) and that these absent elements (denoted by `•’) do not contribute to either ties or to crossings after ties (regarding Definitions 3 and 4). That is, those definitions are applied as they are, ignoring the effect of the absent elements.

Keeping this in mind and, in analogy with (14), given a series of m incomplete rankings we could include the effect of the incomplete cases by defining

(34) $τ_{e v}^{*} = 1 - \frac{2 d_{e v o l} (a_{1}, a_{2}, \dots, a_{m})}{max (d_{e v o l})}$

with

$d_{e v o l} (a_{1}, a_{2}, \dots, a_{m}) = 2 K_{c e v}^{(p = 0.5, q = 0)} (a_{1}, \dots, a_{m}) + \sum_{i = 1}^{m - 1} N_{i n c} (a_{i}, a_{i + 1})$

where

N_{i n c} (a_{i}, a_{i + 1})

is the number of incomplete cases when passing from ranking

a_{i}

to ranking

a_{i + 1}

. Note that the explicit form of

N_{i n c} (a_{i}, a_{i + 1})

for each pair of consecutive rankings is given by (11) in Theorem 1 and Corollary 1. The value of

max (d_{e v o l})

depends on

N_{i n c} (a_{i}, a_{i + 1})

. We have seen in Remark 5 that the definition of

τ_{x}

corresponds to take

d_{m a x} (a, b)

as the value corresponding to

N_{i n c} = 0

(and that is the reason why

τ_{x}

is not well normalized). We can translate here the same reasoning and formalize it in the next definition.

Definition 5.

Given m incomplete rankings with ties $a_{1}, a_{2}, \dots a_{m}$ of n elements we define the corrected evolutive Kendall’s τ coefficient for the series with penalty parameters $p = 0.5$ and $q = 0$ as follows:

(35) $τ_{e v}^{•} = 1 - \frac{4 K_{c e v}^{(0.5, 0)} (a_{1}, \dots, a_{m}) + 2 \sum_{i = 1}^{m - 1} N_{i n c} (a_{i}, a_{i + 1})}{(m - 1) n (n - 1)}$

where $K_{c e v}^{(0.5, 0)} (a_{1}, \dots, a_{m})$ is given by Definition 4, and $N_{i n c} (a_{i}, a_{i + 1})$ is given by (11).

Here we have the same drawback as we showed for $τ_{x}$ in Remark 5: $τ_{e v}^{•}$ is not properly normalized and it cannot get the values $\pm 1$ if any $N_{i n c} (a_{i}, a_{i + 1}) \neq 0$ . Therefore, in analogy with (26), we introduce a new coefficient in the following definition.

Definition 6.

Given m incomplete rankings with ties $a_{1}, a_{2}, \dots a_{m}$ of n elements, such that ${\bar{n}}_{i, i + 1} > 1$ , for all $i = 1, 2, \dots, m - 1$ , we define the scaled corrected evolutive Kendall’s τ coefficient for the series with penalty parameters $p = 0.5$ and $q = 0$ as follows:

(36) ${\hat{τ}}_{e v}^{•} = 1 - \frac{2 K_{c e v}^{(0.5, 0)} (a_{1}, \dots, a_{m})}{max (K_{c e v}^{(0.5, 0)} (a_{1}, \dots, a_{m}))}$

where $K_{c e v}^{(0.5, 0)} (a_{1}, \dots, a_{m})$ is given by Definition 4 and with

(37) $m a x [K_{c e v}^{(0.5, 0)} (a_{1}, \dots, a_{m})] = \frac{1}{2} \sum_{i = 1}^{m - 1} {\bar{n}}_{i, i + 1} ({\bar{n}}_{i, i + 1} - 1)$

where ${\bar{n}}_{i, i + 1}$ denotes the common ranked elements between $a_{i}$ and $a_{i + 1}$ .

Note that we need that, for some i, ${\bar{n}}_{i, i + 1} \neq 0$ .

Remark 8.

In the limit case of m complete rankings with ties, note that Equation (37) collapses to Equation (32) . Note also that ${\hat{τ}}_{e v}^{•}$ is affected by the crossings, the pass from tie to untie (or viceversa) and the long crossings (crossings after ties given by ${\bar{K}}_{i, j}^{c a t} (a_{1}, a_{2}, \dots, a_{m})$ , given by Definition 3), due to the term $2 K_{c e v}^{(p = 0.5, q = 0)} (a_{1}, \dots, a_{m})$ . The effect of the elements that are out of the rankings appear explicitly by the term ${\bar{n}}_{i, i + 1}$ that does not take into account the position in $a_{i}$ nor in $a_{i + 1}$ . ${\hat{τ}}_{e v}^{•}$ is well normalized, that is ${\hat{τ}}_{e v}^{•} \in [- 1, 1]$ .

Example 8.

Let $n = 6$ . Given the series of incomplete rankings with ties $a_{1} = [1, 2, 3, 4, 5, 6]$ , $a_{2} = [1, 2, 3, •, •, •]$ , and $a_{3} = [1, 2, •, •, •, •]$ , an easy computation shows $K_{c e v} (a_{1}, a_{2}, a_{3}) = 0$ and thus ${\hat{τ}}_{e v}^{•} = 1$ . Note that $τ_{e v}^{•} = 0.1333$ .

Example 9.

Let $n = 6$ . Given the series of incomplete rankings with ties $a_{1} = [1, 2, 3, 4, 5, 6]$ , $a_{2} = [3, 2, 1, •, •, •]$ , and $a_{3} = [1, 2, •, •, •, •]$ , it is easy to obtain that $K_{c e v} (a_{1}, a_{2}, a_{3}) = 4 = max (K_{c e v})$ and thus ${\hat{τ}}_{e v}^{•} = - 1$ . Note that $τ_{e v}^{•} = - 0.1333$ .

As we have seen in the above definitions, the importance of Theorem 1 and Corollary 1 consists of giving the explicit formula for $N_{i n c} (a_{i}, a_{i + 1})$ to allow for the computation of the coefficient ${\hat{τ}}_{e v}^{•}$ for the series of m incomplete rankings with ties. Note that ${\hat{τ}}_{e v}^{•} \in [- 1, 1]$ . For the particular case when the rankings are complete, we have $N_{i n c} (a_{i}, a_{i + 1}) = 0$ for all the pairs of consecutive rankings and ${\bar{n}}_{i, i + 1} = n$ , for $i = 1, 2, \dots, m - 1$ , and therefore Equation (36) reduces to the complete case given by Equation (33), that is, ${\hat{τ}}_{e v}^{•}$ collapses to $τ_{e v}$ .

Another contribution of Theorem 1 and Definition 6 is that they are useful to describe the behavior of the series of m rankings in terms of a competitivity graph. We can define a weighted graph for each one of the interactions between the elements when passing from $a_{i}$ to $a_{i + 1}$ : crossings, passing from tie to untie (or vice-versa), and crossing after ties. Moreover, for each kind of graph, we can add the contributions of all the pairs of consecutive rankings to obtain a projected graph for any interaction (crossings, passing from tie to untie (or vice-versa), and crossing after ties). The procedure is the following: First, we construct an undirected graph for each pair of rankings $a_{k}, a_{k + 1}$ by identifying each element i as a node and defining an edge between i and j by the rule: there is an edge connecting ${i, j}$ with weight ${\bar{K}}_{i, j}^{(p, q)} (a_{k}, a_{k + 1})$ when this weight is nonzero. By adding the $m - 1$ pairs of undirected graphs we obtain a projected graph with a total sum of weights $K_{c e v}^{(p = 0.5, q = 0)} (a_{1}, \dots, a_{m})$ . By adding the crossing after ties term to the projected graph we have all the ingredients appearing on Definition 6. We show this procedure by using the next example with $m = 6$ and $n = 8$ .

Example 10.

Given the series of incomplete rankings with ties

$\begin{array}{c} r_{1} & r_{2} & r_{3} & r_{4} & r_{5} & r_{6} \\ 5 & 2 & 4 & 6 & 2 & 1 \\ 7 & 1 & 8 & 1, 4 & 1, 4 & 5 \\ 3 & 8 & 3 & 3 & 6, 7 & 8 \\ 8 & 3 & 2, 6 & 8 & 5 & 3 \\ 1, 4 & 5, 7 & 5, 7 & 2 & 3 & 4 \\ 4 & 1 & 7 & 8 \end{array}$

the corresponding $a_{i}$ are

$\begin{array}{c} a_{1} & a_{2} & a_{3} & a_{4} & a_{5} & a_{6} \\ 5 & 2 & 6 & 2 & 2 & 1 \\ • & 1 & 4 & 5 & 1 & • \\ 3 & 4 & 3 & 3 & 5 & 4 \\ 5 & 6 & 1 & 2 & 2 & 5 \\ 1 & 5 & 5 & • & 4 & 2 \\ • & • & 4 & 1 & 3 & • \\ 2 & 5 & 5 & 6 & 3 & • \\ 4 & 3 & 2 & 4 & 6 & 3 \end{array}$

In this example we have $n = 8$ , and an easy computation leads to the parameters shown in Table 4. For each pair of consecutive rankings it is easy to compute the parameters defined in Theorem 1: $n_{• •}$ , $n_{• *}$ , $n_{* •}$ , $n_{* *}$ , s, $n_{t u}$ , and $n_{t t}$ . Then, by using Equation (10) in Theorem 1 we can obtain, for any pair of rankings, the value $N_{i n c}$ . $\bar{n}$ is the number of common elements, given by (9). The coefficient $τ_{e v}^{•}$ is given by (35) , and the coefficient ${\hat{τ}}_{e v}^{•}$ is given by (36). In analogy with (4) we can define the corresponding normalized mean strengths given by

(38) $N S^{•} = \frac{(1 - τ_{e v}^{•})}{2}$

and

(39) ${\hat{N S}}^{•} = \frac{(1 - {\hat{τ}}_{e v}^{•})}{2}$

Finally, in Table 4 we include the coefficients $τ_{x}$ and ${\hat{τ}}_{x}$ given by (7) and (8), respectively. These last coefficients are included to show that our new coefficients $τ_{e v}^{•}$ and ${\hat{τ}}_{e v}^{•}$ reduce to them when only a pair of rankings are considered.

To compute our new coefficients $τ_{e v}^{•}$ and ${\hat{τ}}_{e v}^{•}$ for the whole series of rankings $a_{1}$ to $a_{6}$ we need some previous parameters. First, we need the value

$\sum_{i = 1}^{5} N_{i n c} (a_{i}, a_{i + 1}) = 52$

To compute

K_{c e v}^{(p = 0.5, q = 0)} (a_{1}, \dots, a_{6})

, given by (31), we need to know, previously, the value of the crossing after ties coefficients

{\bar{K}}_{i, j}^{c a t} (a_{1}, \dots, a_{6})

, given by Definition 3. Note that the unique long crossing occurs for the pair

{1, 4}

: the elements tagged as 1 and 4 are such that 4 is above 1 in

r_{3}

, both elements are tied in rankings

r_{4}

and

r_{5}

, and, finally, 4 is below 1 in ranking

r_{6}

. Note, for example, that the pair

{5, 7}

does not accomplish the conditions of crossing after ties. Therefore the only term that contributes to

\sum_{{i, j}} {\bar{K}}_{i, j}^{c a t}

{\bar{K}}_{1, 4}^{c a t} (a_{1}, \dots, a_{6}) = 1

With respect to $K_{e v}^{(p = 0.5, q = 0)} (a_{1}, \dots, a_{6})$ , given by (30), we need to compute the terms ${\bar{K}}_{i, j}^{(p, q)} (a_{i}, a_{i + 1})$ , given by (28), for any pair of consecutive rankings. A detailed computation shows that, in this example, we have 42 crossings and 6 cases of tie to untie or viceversa. The precise pairs of elements that contribute to these cases are shown in the corresponding projected weighted graphs in Figure 1. The crossing after ties case is represented in Figure 2.

Therefore we have all the ingredients to compute $K_{c e v}^{(p = 0.5, q = 0)}$ . That is

$\begin{matrix} K_{c e v}^{(p = 0.5, q = 0)} (a_{1}, \dots, a_{6}) & = & K_{e v}^{(p = 0.5, q = 0)} (a_{1}, \dots, a_{6}) + \sum_{{i, j}} {\bar{K}}_{i, j}^{c a t} (a_{1}, \dots, a_{6}) \\ = & \sum_{i = 1}^{5} K^{(p, q)} (a_{i}, a_{i + 1}) + \sum_{{i, j}} {\bar{K}}_{i, j}^{c a t} (a_{1}, \dots, a_{6}) \end{matrix}$

and, by Remark (7), we know that

$K_{i, j}^{(p, q)} (a, b) = s + p n_{t u} + q n_{t t}$

Therefore, we have

$\begin{matrix} K_{c e v}^{(p = 0.5, q = 0)} (a_{1}, \dots, a_{6}) & = & (9 + 12 + 8 + 9 + 4) + 0.5 (2 + 0 + 2 + 1 + 1) + {\bar{K}}_{1, 4}^{c a t} (a_{1}, \dots, a_{6}) \\ = & 42 + 3 + 1 = 46 . \end{matrix}$

By using (35), we obtain

$τ_{e v}^{•} = 1 - \frac{4 \cdot 46 + 2 \cdot 52}{(6 - 1) 8 \cdot 7} = 1 - 1.0286 = - 0.0286$

that corresponds to an equivalent normalized mean strength

$N S^{•} = \frac{(1 - τ_{e v}^{•})}{2} = 0.5143$

Finally, regarding ${\hat{τ}}_{e v}^{•}$ , we have

${\hat{τ}}_{e v}^{•} = 1 - \frac{2 \cdot 46}{\frac{1}{2} (6 \cdot 5 + 7 \cdot 6 + 7 \cdot 6 + 7 \cdot 6 + 5 \cdot 4)} = 1 - \frac{4 \cdot 46}{176} = - 0.0455$

that corresponds to

${\hat{N S}}^{•} = 0.5227 .$

All in all, we conclude that ${\hat{τ}}_{e v}^{•}$ is a proper coefficient for the evaluation of m incomplete rankings with ties and can be considered as a natural extension of the coefficient ${\hat{τ}}_{x}$ presented in [12]. In the next section we apply the new coefficients $τ_{e v}^{•}$ and ${\hat{τ}}_{e v}^{•}$ to real rankings appearing on Spotify charts.

7. Results

Spotify is one of the major music streaming services worldwide, with 299 million monthly active users, as of July 2020 [28]. The company Spotify Technology S.A. has been listed on the New York Stock Exchange since 2018. As of September 2020, the company offers a catalog of 60 million tracks and operates in 92 countries from Albania to Vietnam [29]. Spotify divides the monthly active users into four regions [30]: Europe (35%), North America (26%), Latin America (22%) and rest of the world (17%). The app is available on several devices, such as computers, smartphones, tablets, wearable devices, etc. The users can choose between a free service (called Freemium or Ad-Supported) or a Premium service. In any case, the user can listen by streaming any song of the catalog (that is, the user does not own the song’s digital file, but can listen to it). It is accepted that music streaming services have transformed the entire music market—see [31]—and they have evolved very fast, changing their services and capabilities. For example, Spotify has signed some partnerships with Microsoft [32], Sony [33] and Facebook [34] among other big companies. There exists a large amount of literature about Spotify, but it is mainly focused on Economics and Music. To the best of our knowledge, a small number of papers are devoted to the mathematical aspects of the rankings produced by Spotify. Among these papers, we have [35,36]. A paper that studies the relationship between personality and type of music is [37]. See [38] for more details about Spotify.

Like other services on the Internet, Spotify provides some chart lists (song rankings) based on the platform’s number of streamings. To this kind of rankings belongs the Top 200 (see [39,40,41]), that is one of the topics of our study. Another ranking that we are interested in is called Viral 50 which is an evolution of the original Social 50 ranking (see [42,43,44]) that incorporated in the song chart the effect of the social sharing of a track by Spotify users. This sharing included platforms such as Facebook and Twitter. It is not completely clear for us how this rank is computed, but it aims to gather fresh songs that acquire high impact on social networks by new release promotions, special apparitions on tv-shows, music festivals, tours, etc. (see [45] for an example of how a viral song transformed into a Top 100 song in 2013).

Due to the situation caused by the COVID-19 pandemic, the live music business reflected some drawbacks, such as festivals being cancelled worldwide, a reduction in public-performance licensing, and other related factors—see [46]. As an example, Warner Music Group Corp showed a total revenue fall of $1.7 %$ in the first quarter of 2020 compared to the first quarter of 2019 [47]. Spotify also reported some impact on their business, but in the first quarter of 2020, it seemed that the consumption recovered and monthly active users increased faster in the first quarter of 2020 than in the same period of 2019 [30]. Some perturbations in Spotify streaming were also reported by the music analytic company Chartmetric that observed a change in the type of consumption of Spotify streamings by music genre in the period between 3 March 2020 to 9 April 2020, concluding that it seemed that it had been a pandemic-induced lifestyle change [48].

With regard to the Top 50 viral, it is reasonable to think that the fact that many artists (such as Lady Gaga, Alicia Keys, and Cardi B. [46]) have postponed big releases may have decreased the movements in these charts.

7.1. Method to Convert Spotify Lists into Incomplete Rankings

Both Spotify Top 200 and Viral 50 lists can be treated as incomplete rankings since some elements (songs) quit the list and some others that appear on the list (new songs). Let us call any of these rankings as Top k rankings. In order to handle these Top k rankings, our methodology consists of the following steps:

1.. Select a set of m lists ${v_{1}, v_{2}, \dots, v_{m}}$ with k entries in each $v_{i}$ .
2.. Denote as n the number of different songs that appear on these m lists. We tag these songs from 1 to n, following the order they first appear, reading the lists from the first to the last one, and each list from top to bottom. Denote $t_{i}$ the tagged version of $v_{i}$ , for $i = 1, 2, \dots, m$ , including all the n songs.
3.. Denote $r_{1}$ a vector with entries from 1 to n. The first k values correspond to the elements in $v_{1}$ .
4.. Construct the rankings $r_{i}$ for $2 = 1, \dots m$ , in the following form:
- (a). The first k entries of $r_{i}$ are copied from $t_{i}$ ;
- (b). The rest of the entries form a vector $s_{i}$ and come from the the elements that quit from $t_{i - 1}$ plus the elements that, being in $s_{i - 1}$ , are not included in $t_{i}$ .
These $n - k$ elements preserve their relative order. This order is not important since these elements are not included in the Top k ranking $t_{i}$ .
5.. From each $t_{i}$ , we construct the corresponding incomplete ranking $a_{i}$ given by (5).

Example 11.

Let us consider three Top 4 lists ( $v_{1}, v_{2}, v_{3}$ ) and construct the corresponding three rankings ( $a_{1}, a_{2}, a_{3}$ ). Here we have $m = 3$ and $k = 4$ .

$|\begin{matrix} \begin{matrix} v_{1} & v_{2} & v_{3} \end{matrix} \\ \begin{matrix} A & B & F \\ B & C & C \\ C & E & B \\ D & A & E \end{matrix} \end{matrix}| \to \begin{matrix} \begin{matrix} t_{1} & t_{2} & t_{3} \end{matrix} \\ \begin{matrix} s_{1} & s_{2} & s_{3} \end{matrix} \end{matrix} \begin{matrix} \begin{matrix} r_{1} & r_{2} & r_{3} \end{matrix} \\ \{\begin{matrix} \begin{matrix} \begin{matrix} 1 & 2 & 6 \\ 2 & 3 & 3 \\ 3 & 5 & 2 \\ 4 & 1 & 5 \end{matrix} \end{matrix} \end{matrix}\} \\ \{\begin{matrix} \begin{matrix} 5 & 4 & 1 \end{matrix} \\ \begin{matrix} 6 & 6 & 4 \end{matrix} \end{matrix}\} \end{matrix} ⟶ \begin{matrix} a_{1} & a_{2} & a_{3} \\ 1 & 4 & • \\ 2 & 1 & 3 \\ 3 & 2 & 2 \\ 4 & • & • \\ • & 3 & 4 \\ • & • & 1 \end{matrix}$

We have denoted as $s_{i}$ the elements beyond the k position in each ranking $r_{i}$ . The rankings $a_{i}$ are constructed looking at $r_{i}$ from positions 1 to 4. Since the elements that do no belong to $t_{i}$ are in $s_{i}$ , we tagged them as •.

7.2. Comparison of Two Series of Top 200 Rankings

From the site [49] we downloaded the series of Top 200 (Global) rankings corresponding to the following time intervals:

2019 Series: 18 weekly rankings ranging from 28 December 2018 to 3 May 2019.
2020 Series: 18 weekly rankings ranging from 27 December 2019 to 1 May 2020.

The term Global means that the charts were produced from streaming on Spotify from all over the world. By using the methodology explained in the previous section, we convert the 18 downloaded rankings to a series of incomplete rankings (with no ties) $a_{1}, \dots, a_{18}$ , and we compute our parameters. This is repeated for each considered year. The results are shown in Table 5.

In Table 5 we have denoted by $< {\bar{n}}_{i, i + 1} >$ the average of ${{\bar{n}}_{i, i + 1}}$ for $i = 1, 2, \dots, 17$ , that is the mean number of common elements from each pair of consecutive rankings. We see that the number of songs involved in the 2019 series is $n = 474$ , which is lower than the 2020 series number. This fact could indicate that there was more activity in the 2020 series since more new songs appeared than the previous year. By extension, we can also conclude that the activity on Spotify of the users was higher in the 2020 series.

The same tendency is observed by looking at $N_{i n c}$ and ${\bar{n}}_{i, i + 1}$ . Our coefficients $N S^{•}$ and ${\hat{N S}}^{•}$ corroborate this intuition since they take higher values in the 2020 Series than in the 2019 Series. Analogously, by looking at $τ_{e v}^{•}$ and ${\hat{τ}}_{e v}^{•}$ , we see a decrease when comparing the 2019 Series with the 2020 Series. Recall that the coefficients $N S^{•}$ and ${\hat{N S}}^{•}$ introduced in this paper offer a measure of the movements in the rankings, since they take into account the number of crossings and, in this case, that we do not have ties, due to the effect of absent elements.

In the same manner, as we did in Example 10, we can construct the projected graph corresponding to the crossings for each series. We show these graphs in Figure 3, that have been plotted with MATLAB by using the option ”subspace”.

7.3. Comparison of Two Series of Viral-50 Rankings

From the site [50], we downloaded the series of Viral 50 (Global) weekly rankings corresponding to the following periods:

2019 Series: 18 weekly rankings ranging from 3 January 2019 to 2 May 2019.
2020 Series: 18 weekly rankings ranging from 2 January 2020 to 30 April 2020.

For each considered year, we convert the 18 downloaded rankings to a series of incomplete rankings (with no ties) $a_{1}, \dots, a_{18}$ , and we computed again the aforementioned parameters. The results are shown in Table 6.

The number of songs involved in the 2019 series is $n = 315$ , that is greater than the number involved in the 2020 series. This fact could indicate that there was less viral activity in the 2020 series since fewer new songs appeared than the previous year. The same tendency is observed at $N_{i n c}$ . This intuition is corroborated by our coefficients. $N S^{•}$ and ${\hat{N S}}^{•}$ since they take lower values in the 2020 series than in the 2019 series. We also see an increase in $τ_{e v}^{•}$ and ${\hat{τ}}_{e v}^{•}$ when comparing the 2019 series with the 2020 series.

If we compare these results with those obtained in the previous section, we conclude that Spotify’s viral activity was negatively affected by the Pandemic. This may seem reasonable since many events that produce sharing in Social Networks, such as shows, new releases, and performances, were postponed during these months, as we have already discussed. We again plot the projected graph corresponding to the crossings for each series in Figure 4.

7.4. Comparison of a Series of Top 200 and a Series of Viral 50 Rankings

Given that our coefficients $τ_{e v}^{•}$ , ${\hat{τ}}_{e v}^{•}$ , $N S^{•}$ , and ${\hat{N S}}^{•}$ are normalized, we can compare series of rankings of different type. Looking at Table 5 and Table 6, we conclude (e.g., looking at ${\hat{N S}}^{•}$ ) that the Viral-50 rankings present more activity than the Top 200 rankings. For example in the 2019 series the value of ${\hat{N S}}^{•}$ is $0.1982$ for the Viral-50 rankings, and only $0.0730$ for the Top 200 rankings. This conclusion seems reasonable, taking into account that the Viral-50 rankings are constructed by looking at the behaviours of songs that may rapidly change, since they are viral phenomena.

7.5. Comparison of the Evolution of Two Series of Incomplete Ranking with Ties

Spotify charts Top 200 do not present ties, but we can construct incomplete rankings with ties if we take into account the Top 200 ranking and the rest of the songs that appear in the whole studied interval. In detail, to obtain a series of incomplete rankings with ties from a Top 200 series on Spotify, we will consider the whole list of tracks along with the m rankings and focus on what happens in positions greater than 200. Using the terminology used in Example 11 we consider the elements that appear on the rankings, denoted as $s_{1}$ , $s_{2}, \dots$ . In this ranking we consider the following:

(i). All the tracks in $s_{1}$ are tied. That is $a_{1} = [•_{1, 200} 1^{n - 200}]$ where $•_{1, 200}$ is a row vector of 200 entries of the type •, and $1^{n - 200}$ is the row vector of all-ones, with $n - 200$ entries, being n the total number of different tracks in the m rankings.
(ii). For $i = 2, 3 \dots m$ , we consider that in $s_{i}$ we have (at most) two buckets of tied elements. In one bucket we have the elements (if any) that come from $t_{i - 1}$ . In the other bucket, we consider the rest of the elements of $s_{i}$

The next example with a series of $m = 7$ Top 4 charts illustrates this methodology.

Example 12.

Let us consider the series of seven Top 4 tracks $v_{i}$ with $n = 10$ elements ${A, B, \dots, J}$ given by the rankings

$\begin{matrix} A & A & F & G & G & G & J \\ B & B & E & H & H & I & C \\ C & E & A & C & C & B & A \\ D & F & C & E & E & A & H \end{matrix}$

from these rankings we construct the rankings $t_{i}$ and $s_{i}$ to obtain the rankings in the form

$\begin{matrix} \begin{matrix} \begin{matrix} \end{matrix} t_{i} \\ s_{i} \end{matrix} & \begin{array}{c} 1 & 1 & 6 & 7 & 7 & 7 & 10 \\ 2 & 2 & 5 & 8 & 8 & 9 & 3 \\ 3 & 5 & 1 & 3 & 3 & 2 & 1 \\ 4 & 6 & 3 & 5 & 5 & 1 & 8 \\ 5 & 3 & 2 & 6 & 6 & 8 & 7 \\ 6 & 4 & 4 & 1 & 1 & 3 & 9 \\ 7 & 7 & 7 & 2 & 2 & 5 & 2 \\ 8 & 8 & 8 & 4 & 4 & 6 & 5 \\ 9 & 9 & 9 & 9 & 9 & 4 & 6 \\ 10 & 10 & 10 & 10 & 10 & 10 & 4 \end{array} \end{matrix}$

Now, we consider the rankings $s_{i}$ as a series of incomplete rankings with ties with the convention explained above and we compute the corresponding $a_{i}$ vectors to obtain the rankings

$\begin{matrix} a_{1} & a_{2} & a_{3} & a_{4} & a_{5} & a_{6} & a_{7} \\ • & • & • & 1 & 1 & • & • \\ • & • & 1 & 2 & 1 & • & 1 \\ • & 1 & • & • & • & 1 & • \\ • & 1 & 2 & 2 & 1 & 2 & 2 \\ 1 & • & • & • & • & 1 & 2 \\ 1 & • & • & 1 & 1 & 2 & 2 \\ 1 & 2 & 2 & • & • & • & 1 \\ 1 & 2 & 2 & • & • & 1 & • \\ 1 & 2 & 2 & 2 & 1 & • & 1 \\ 1 & 2 & 2 & 2 & 1 & 2 & • \end{matrix}$

Note that, since there are at most two buckets, the entries of $a_{i}$ belong to the set ${1, 2, •}$ . Note also that in $s_{5}$ there is only one bucket.

By using this methodology, we have converted the series of rankings studied in Section 7.2 to the corresponding series $a_{i}$ with ties. The parameters obtained are shown in Table 7.

If we look at n, $N_{i n c}$ , $< {\bar{n}}_{i, i + 1} >$ , and ${\hat{N S}}^{•}$ in Table 7, we conclude that there has been more activity in the 2020 Series than in the 2019 Series. However, by looking at ${N S}^{•}$ (and $τ_{e v}^{•}$ ), the conclusion seems to be the reverse. Here we see, therefore, that $τ_{e v}^{•}$ and ${\hat{τ}}_{e v}^{•}$ can present different tendencies. This is related to the form in which they are normalized, as we have commented in Remark 5 and in Section 6. These results provide an example of how the transformation from $τ_{e v}^{•}$ to ${\hat{τ}}_{e v}^{•}$ is not linear, since $τ_{e v}^{•}$ increases from 2019 to 2020 but ${\hat{τ}}_{e v}^{•}$ decreases in the same period.

In Figure 5, we show the plot of the giant component corresponding to the projected graph showing the interactions of the form tie to untie or viceversa. That is, there is a link between elements (nodes) i and j when the pair ${i, j}$ goes from tie to untie (or vice versa) in any pair of consecutive rankings $a_{i}$ and $a_{i + 1}$ . We see many more interactions of this type in the 2020 series than in the 2019 series.

In Figure 6, we show the plot of the giant component corresponding to the projected graph showing the interactions of the form tie to tie, that is, there is a link between elements (nodes) i and j when the pair ${i, j}$ goes from tie to tie in any pair of consecutive rankings $a_{i}$ and $a_{i + 1}$ . We also see many more interactions of this type in the 2020 series than in the 2019 series.

Therefore, and taking into account the values of Table 7, we can conclude (for this artificial model of incomplete ranking with ties) that there was more activity in the 2020 series than in the 2019 series.

We have shown the application of the new coefficients introduced in this work, as long as the utility of the visualizations based on the projected graph plots of the (evolutive) competitive graph associated to a series of incomplete rankings with or without ties.

8. Conclusions

We present the main conclusions of our work:

We provide a theoretical result that allows for understanding, in terms of the type of interactions between pairs of elements in a series of incomplete rankings with ties, two recently introduced coefficients, given in [4,12].
We have defined two new coefficients to characterize a series of incomplete rankings with ties in terms of the interactions mentioned above.
We have presented a methodology to treat Spotify charts (both Top 200 and Viral 50) as a series of incomplete rankings. This methodology allows us to obtain conclusions about the movements in the lists and, therefore, on the activity of the users of the app.
We have obtained an artificial series of incomplete rankings with ties based on Spotify Top 200 lists, to apply our coefficients and show the applicability of the method.
The main theoretical result (Theorem 1) may serve to define new coefficients by giving weight to the interactions between pairs of elements when going from one ranking to the next one. The applications can be of interest in other fields (neuroscience, sports, bioinformatics, etc.).

Author Contributions

All authors contributed equally to this paper and have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Spanish Government, Ministerio de Economía y Competividad, grant number MTM2016-75963-P.

Acknowledgments

We thank the four anonymous reviewers for their constructive comments, which helped us to improve the readability of the manuscript.

Conflicts of Interest

The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

View Image - Figure 1. Projected weighted graphs representing the pairs of elements that contribute to crossings (left panel) and the pairs corresponding to the case tie to untie or viceversa (right panel), occurring in Example 10.

Figure 1. Projected weighted graphs representing the pairs of elements that contribute to crossings (left panel) and the pairs corresponding to the case tie to untie or viceversa (right panel), occurring in Example 10.

Figure 2. Projected weighted graph representing the crossing after ties cases occurred in Example 10.

View Image - Figure 3. Graph based on crossings corresponding to the giant connected component of Top 200 2019 Series (left panel, 360 nodes, 16,115 edges) and Top 200 2020 Series (right panel, 374 nodes, 16,564 edges).

Figure 3. Graph based on crossings corresponding to the giant connected component of Top 200 2019 Series (left panel, 360 nodes, 16,115 edges) and Top 200 2020 Series (right panel, 374 nodes, 16,564 edges).

View Image - Figure 4. Graph based on crossings corresponding to the giant connected component of Viral-50 2019 Series (left panel, 185 nodes, 1685 edges) and Viral-50 2020 Series (right panel, 186 nodes, 1447 edges).

Figure 4. Graph based on crossings corresponding to the giant connected component of Viral-50 2019 Series (left panel, 185 nodes, 1685 edges) and Viral-50 2020 Series (right panel, 186 nodes, 1447 edges).

View Image - Figure 5. Graph based on crossings of the type from tie to untie or vice versa corresponding to the giant connected component of 2019 Series (left, 377 nodes, 49,915 edges) and 2020 Series (right, 457 nodes, 82,051 edges). We also have 97 isolated nodes in the 2019 Series and 99 in the 2020 Series.

Figure 5. Graph based on crossings of the type from tie to untie or vice versa corresponding to the giant connected component of 2019 Series (left, 377 nodes, 49,915 edges) and 2020 Series (right, 457 nodes, 82,051 edges). We also have 97 isolated nodes in the 2019 Series and 99 in the 2020 Series.

View Image - Figure 6. Graph based on crossings of the type from tie to tie corresponding to the giant connected component of 2019 Series (left, 382 nodes, 65,269 edges) and 2020 Series (right, 462 nodes, 101,101 edges).

Figure 6. Graph based on crossings of the type from tie to tie corresponding to the giant connected component of 2019 Series (left, 382 nodes, 65,269 edges) and 2020 Series (right, 462 nodes, 101,101 edges).

Table 1

Number of pairs ${i, j}$ corresponding to each type for the complete cases.

Type	Number of Pairs
C.1	$n_{n c}$
C.2	s
C.3	$n_{t u}$
C.4	$n_{t t}$

Table 2

Number of pairs ${i, j}$ corresponding to each type for the incomplete cases.

Type	Number of Pairs ${i, j}$
I.1	$(\binom{n_{• •}}{2})$
I.2	$n_{• •} (n_{* •} + n_{• *})$
I.3	$n_{• •} n_{* } + n_{ •} n_{• *}$
I.4	$\sum_{i = 1}^{n_{a}} (\binom{n_{i •}}{2}) + \sum_{i = 1}^{n_{b}} (\binom{n_{• i}}{2})$
I.5	$\sum_{i = 1}^{n_{a}} n_{i } n_{i •} + \sum_{i = 1}^{n_{b}} n_{ i} n_{• i}$
I.6	$(\binom{n_{* •}}{2}) + (\binom{n_{• *}}{2}) - \sum_{i = 1}^{n_{a}} (\binom{n_{i •}}{2}) - \sum_{i = 1}^{n_{b}} (\binom{n_{• i}}{2})$
I.7	$n_{* } (n_{ •} + n_{• }) - \sum_{i = 1}^{n_{a}} n_{i } n_{i •} - \sum_{i = 1}^{n_{b}} n_{* i} n_{• i}$

Table 3

Number of pairs ${i, j}$ that have some •, corresponding to Example 2. Note that the sum of all the types is, by definition in (11) , $N_{i n c}$ .

Type	Number of Pairs ${i, j}$
I.1	$(\binom{n_{• •}}{2}) = 1$
I.2	$n_{• •} (n_{* •} + n_{• *}) = 8$
I.3	$n_{• •} n_{* } + n_{ •} n_{• *} = 11$
I.4	$\sum_{i = 1}^{n_{a}} (\binom{n_{i •}}{2}) + \sum_{i = 1}^{n_{b}} (\binom{n_{• i}}{2}) = 1$
I.5	$\sum_{i = 1}^{n_{a}} n_{i } n_{i •} + \sum_{i = 1}^{n_{b}} n_{ i} n_{• i} = 2$
I.6	$(\binom{n_{* •}}{2}) + (\binom{n_{• *}}{2}) - \sum_{i = 1}^{n_{a}} (\binom{n_{i •}}{2}) - \sum_{i = 1}^{n_{b}} (\binom{n_{• i}}{2}) = 2$
I.7	$n_{* } (n_{ •} + n_{• }) - \sum_{i = 1}^{n_{a}} n_{i } n_{i •} - \sum_{i = 1}^{n_{b}} n_{* i} n_{• i} = 14$

Table 4

Parameters for pairs of consecutive rankings. Example 10.

	$a_{1} \to a_{2}$	$a_{2} \to a_{3}$	$a_{3} \to a_{4}$	$a_{4} \to a_{5}$	$a_{5} \to a_{6}$
$n_{• •}$	1	0	0	0	0
$n_{• *}$	1	1	0	1	0
$n_{* •}$	0	0	1	0	3
$n_{* *}$	6	7	7	7	5
s	9	12	8	9	4
$n_{t u}$	2	0	2	1	1
$n_{t t}$	0	0	0	0	0
$N_{i n c}$	13	7	7	7	18
$\bar{n}$	6	7	7	7	5
${\hat{τ}}_{e v}^{•}$	−0.3333	−0.1429	0.1429	0.0952	0.1000
${\hat{N S}}^{•}$	0.6667	0.5714	0.4268	0.4524	0.4500
$τ_{e v}^{•}$	−0.1786	−0.1071	0.1071	0.0714	0.0357
$N S^{•}$	0.5893	0.5536	0.4464	0.4643	0.4821
$τ_{x}$	−0.1786	−0.1071	0.1071	0.0714	0.0357
${\hat{τ}}_{x}$	−0.3333	−0.1429	0.1429	0.0952	0.1000

Table 5

Parameters for two series of incomplete rankings obtained from Spotify Top 200 lists.

	2019 Series	2020 Series
n	474	556
$N_{i n c}$	$1.6 \times 10^{6}$	$2.4 \times 10^{6}$
$< {\bar{n}}_{i, i + 1} >$	182	175
$τ_{e v}^{•}$	0.1256	0.0836
$N S^{•}$	0.4372	0.4582
${\hat{τ}}_{e v}^{•}$	0.8540	0.8421
${\hat{N S}}^{•}$	0.0730	0.0789

Table 6

Parameters for two series of incomplete rankings obtained from Spotify Viral 50 lists.

	2019 Series	2020 Series
n	315	300
$N_{i n c}$	$8.3 \times 10^{5}$	$7.5 \times 10^{5}$
$< {\bar{n}}_{i, i + 1} >$	33.6	35
$τ_{e v}^{•}$	0.0067	0.0093
$N S^{•}$	0.4966	0.4954
${\hat{τ}}_{e v}^{•}$	0.6037	0.6922
${\hat{N S}}^{•}$	0.1982	0.1539

Table 7

Series of incomplete rankings with ties obtained from Spotify Top 200 charts.

	2019 Series	2020 Series
n	474	556
$N_{i n c}$	$1.4 \times 10^{6}$	$1.7 \times 10^{6}$
$< {\bar{n}}_{i, i + 1} >$	256	331
$τ_{e v}^{•}$	0.2577	0.3108
$N S^{•}$	0.3712	0.3446
${\hat{τ}}_{e v}^{•}$	0.8848	0.8757
${\hat{N S}}^{•}$	0.0576	0.0621

Word count: 10951

Show less

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Mathematical analysis of rankings is essential for a wide range of scientific, public, and industrial applications (e.g., group decision-making, organizational methods, R&D sponsorship, recommender systems, voter systems, sports competitions, grant proposals rankings, web searchers, Internet streaming-on-demand media providers, etc.). Recently, some methods for incomplete aggregate rankings (rankings in which not all the elements are ranked) with ties, based on the classic Kendall’s tau coefficient, have been presented. We are interested in ordinal rankings (that is, we can order the elements to be the first, the second, etc.) allowing ties between the elements (e.g., two elements may be in the first position). We extend a previous coefficient for comparing a series of complete rankings with ties to two new coefficients for comparing a series of incomplete rankings with ties. We make use of the newest definitions of Kendall’s tau extensions. We also offer a theoretical result to interpret these coefficients in terms of the type of interactions that the elements of two consecutive rankings may show (e.g., they preserve their positions, cross their positions, and they are tied in one ranking but untied in the other ranking, etc.). We give some small examples to illustrate all the newly presented parameters and coefficients. We also apply our coefficients to compare some series of Spotify charts, both Top 200 and Viral 50, showing the applicability and utility of the proposed measures.

Details

Title

Corrected Evolutive Kendall’s τ Coefficients for Incomplete Rankings with Ties: Application to Case of Spotify Lists

Author

Pedroche, Francisco¹

; Conejero, J Alberto²

¹ Institut de Matemàtica Multidisciplinària, Universitat Politècnica de València, Camí de Vera s/n, 46022 València, Spain
² Instituto Universitario de Matemática Pura y Aplicada, Universitat Politècnica de València, Camí de Vera s/n, 46022 València, Spain; [email protected]

First page

1828

Publication year

2020

Publication date

2020

Publisher

MDPI AG

e-ISSN

22277390

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/math8101828

ProQuest document ID

2548821253

Corrected Evolutive Kendall’s τ Coefficients for Incomplete Rankings with Ties: Application to Case of Spotify Lists

Jump to:

Full text

Abstract

Details

Suggested sources