Full text

Turn on search term navigation

1. Introduction

We have been working since 2015 on the problem of testing the alignment of protein domain families which are proposed by expert biologists and bioinformaticians. We have found that the use of selected entropy measures is very proficient for testing the results published by those professionals and they favour a rigorous ANOVA statistical analysis [1]. In order to reduce the search space for admissible values of entropy measures, we have emphasized the need for work in the region related to strict concavity of these entropies. This study has been undertaken in a previous work, and we present in Section 2 a summary of those developments. In the present work, we aim to complement the results of a previous publication [2], and a subsequent restriction on the parameter space has to be performed in order to guarantee the synergy of the probability distributions to be tested. Non-synergetic distributions are not worthwhile for working because they will not preserve the fundamental property of getting more information of amino acids into t-sets of columns than to sum up the information obtained from individual columns. In Section 3, a brief digression is then made for introducing the Sharma–Mittal class of entropy measures. Section 4 emphasizes the aspects of synergy of the distributions and their consequences for the reduction of the parameter space of Sharma–Mittal entropies. In Section 5, we treat the analysis of the maximal extension of the parameter space, and we repeat the reduction process imposed by the requirement of fully synergetic distributions of Section 4. We conclude the paper in Section 6 by studying the relation of Hölder and generalized Khinchin–Shannon (GKS) inequalities.

2. The Construction of the Probabilistic Space

Let us consider a set of $m_{f}$ domains ( $m_{f}$ rows) from a chosen family of protein domains. In order to associate a rectangular array with this family, to be taken as its representative in the probabilistic space we are constructing, we specify its number of columns as $n_{f} = n$ . This means that among $m_{f}$ rows, we disregard all rows such that the number of their amino acids satisfies $n_{f} < n$ and preserve $m_{f}^{'}$ rows whose number of amino acids satisfies $n_{f} \leq n$ , but disregard ( $n_{f} - n$ ) amino acids in these $m_{f}^{'}$ rows. We then choose m rows from among the $m_{f}^{'}$ rows to obtain $m \times n$ rectangular arrays. There are $m_{f}^{'}! / [m! (m_{f}^{'} - m)!]$ of these $m \times n$ rectangular arrays. Any one of them can be used as a representative of the domain family to be analysed in the statistical procedure to be implemented.

The next step is to assign a joint probability of occurrence of a set of variables $a_{1}, \dots, a_{t}$ in columns $j_{1}, \dots, j_{t}$ to be given by

(1) $p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = \frac{n_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})}{m},$

where

n_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})

stands for the number of occurrences of the set

a_{1}, \dots, a_{t}

in the t columns of the subarray

m \times t

of the representative array

m \times n

(

1 \leq t \leq n

). The symbols

a_{1}, \dots, a_{t}

will be running over the letters of the one-letter code for the twenty amino acids:

a_{j} (1 \leq j \leq t) \in

{A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}.

We then have

(2) $\sum_{a_{1}, \dots, a_{t}} n_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) \equiv m .$

We also introduce the conditional probabilities of occurrence, which are given implicitly by

(3) $p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) \equiv p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) p_{j_{t}} (a_{t}),$

where

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})

is the probability of occurrence of the amino acids in the columns

j_{1}, \dots, j_{t - 1}

, if the distribution of amino acids in the

j_{t}

-th column is known a priori.

The Bayes’ law for probabilities of occurrence [2,3] can be written as

(4) $\begin{matrix} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) & \equiv p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) p_{j_{t}} (a_{t}) \\ = p_{j_{t} j_{1} \dots j_{t - 1}} (a_{t} | a_{1}, \dots, a_{t - 1}) p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}) \\ = p_{j_{t} j_{t - 1} j_{1} \dots j_{t - 2}} (a_{t}, a_{t - 1} | a_{1}, \dots, a_{t - 2}) p_{j_{1} \dots j_{t - 2}} (a_{1}, \dots, a_{t - 2}) \\ = \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \dots \\ = p_{j_{t} \dots j_{3} j_{1} j_{2}} (a_{t}, \dots, a_{3} | a_{1}, a_{2}) p_{j_{1} j_{2}} (a_{1}, a_{2}) \\ = p_{j_{t} \dots j_{2} j_{1}} (a_{t}, \dots, a_{2} | a_{1}) p_{j_{1}} (a_{1}) \\ = p_{j_{t} \dots j_{1}} (a_{t}, \dots, a_{1}) . \end{matrix}$

The equality of the three first right-side members, as well as the equality of the three last ones, does correspond to the application of Bayes’ law [2,3]. The symmetries for the joint probability distribution

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})

are due to the ordering of the columns for the distributions of amino acids.

From the ordering $j_{1} < j_{2} < \dots < j_{t}$ , the values assumed by the variables $j_{1}, \dots, j_{t}$ are respectively given by

(5) $\begin{matrix} j_{1} & = 1, 2, \dots, n - t + 1 \\ j_{2} & = j_{1} + 1, j_{1} + 2, \dots, n - t + 2 \\ ⋮, 1 \leq t \leq n \\ j_{t - 1} & = j_{t - 2} + 1, j_{t - 2} + 2, \dots, n - 1 \\ j_{t} & = j_{t - 1} + 1, j_{t - 1} + 2, \dots, n . \end{matrix}$

We then have

(\binom{n}{t}) = \frac{n!}{t! (n - t)!}

geometric objects

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})

of t columns and

{(20)}^{t}

components each.

3. The Sharma–Mittal Class of Entropy Measures

As emphasized in Ref. [2], the introduction of random variable functions such as entropy measures associated with the probabilities of occurrence, is suitable to provide an analysis of the evolution of these probabilities through the regions of the parameter space of entropies. The class of Sharma–Mittal entropy measures seems to be particularly adapted to this task when related to the occurrence of amino acids in the objects $p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})$ . The thermodynamic interpretation of the notion of entropy greatly helps to classify the distribution of its values associated with protein domain databases and to interpret its evolution through the Fokker–Planck equations to be treated in forthcoming articles in this line of research.

The two-parameter Sharma–Mittal class of entropy measures is usually given by

(6) ${(S M)}_{j_{1} \dots j_{t}}^{(r, s)} = \frac{{(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{1 - r}{1 - s}} - 1}{1 - r}; {(S M)}_{j_{t}}^{(r, s)} = \frac{{(α_{j_{t}}^{(s)})}^{\frac{1 - r}{1 - s}} - 1}{1 - r},$

where

(7) $α_{j_{1} \dots j_{t}}^{(s)} = \sum_{a_{1}, \dots, a_{t}} {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s}; α_{j_{t}}^{(s)} = \sum_{a_{t}} {[p_{j_{t}} (a_{t})]}^{s} .$

The parameters r, s must bound a region corresponding to a strict concavity in the parameter space. A necessary requirement to be satisfied [3] is

(8) $\frac{\partial^{2} {(S M)}_{j_{1} \dots j_{t}}^{(r, s)}}{\partial {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{2}} = s {(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{s - r}{1 - s}} \cdot {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s - 2} \cdot [\frac{s (s - r)}{{(1 - s)}^{2}} {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) - 1] < 0,$

where

{\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})

stands for the escort probability associated with the joint probability

p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})

, or,

(9) ${\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = \frac{{[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s}}{α_{j_{1} \dots j_{t}}^{(s)}}; {\hat{p}}_{j_{t}} (a_{t}) = \frac{{[p_{j_{t}} (a_{t})]}^{s}}{α_{j_{t}}^{(s)}} .$

Equation (8) leads to

(10) $r \geq s > 0 .$

Some special cases of one-parameter entropies are commonplace in the scientific literature [3,4,5,6,7,8,9]:

The $r = s$ region is the domain of the Havrda–Charvat [6] entropy measure $H_{j_{1} \dots j_{t}}^{(s)}$ ,

(11) $H_{j_{1} \dots j_{t}}^{(s)} = \frac{α_{j_{1} \dots j_{t}}^{(s)} - 1}{1 - s} .$

The

r = 2 - s

0 \leq s \leq 1

, region will stand for the domain of the Landsberg–Vedral [7] entropy measure,

L_{j_{1} \dots j_{t}}^{(s)}

(12) $L_{j_{1} \dots j_{t}}^{(s)} = \frac{α_{j_{1} \dots j_{t}}^{(s)} - 1}{(1 - s) α_{j_{1} \dots j_{t}}^{(s)}} \equiv \frac{H_{j_{1} \dots j_{t}}^{(s)}}{α_{j_{1} \dots j_{t}}^{(s)}} .$

The Renyi

R_{j_{1} \dots j_{t}}^{(s)}

[8] and the “non-extensive” Gaussian [9]

G_{j_{1} \dots j_{t}}^{(r)}

entropy measures are obtained from limit processes:

(13) $\begin{matrix} R_{j_{1} \dots j_{t}}^{(s)} & \equiv lim_{r \to 1} \frac{{(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{1 - r}{1 - s}} - 1}{1 - r} = lim_{r \to 1} \frac{\frac{d}{d r} [{(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{1 - r}{1 - s}} - 1]}{\frac{d}{d r} (1 - r)} = lim_{r \to 1} \frac{e^{\frac{1 - r}{1 - s} \cdot log α_{j_{1} \dots j_{t}}^{(s)}} \cdot (\frac{- 1}{1 - s}) log α_{j_{1} \dots j_{t}}^{s}}{- 1} \\ = \frac{log α_{j_{1} \dots j_{t}}^{(s)}}{1 - s} . \end{matrix}$

$\begin{matrix} G_{j_{1} \dots j_{t}}^{(r)} & \equiv lim_{s \to 1} \frac{{(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{1 - r}{1 - s}} - 1}{1 - r} = \frac{1}{1 - r} [exp ((1 - r) \cdot lim_{s \to 1} \frac{\frac{d}{d s} log α_{j_{1} \dots j_{t}}^{(s)}}{\frac{d}{d s} (1 - s)}) - 1] \\ = \frac{1}{1 - r} [exp (- (1 - r) \cdot lim_{s \to 1} \frac{\frac{d}{d s} α_{j_{1} \dots j_{t}}^{(s)}}{α_{j_{1} \dots j_{t}}^{(s)}}) - 1] . \end{matrix}$

After using the definition of

α_{j_{1} \dots j_{t}}^{(s)}

, Equation (7), and

lim_{s \to 1} α_{j_{1} \dots j_{t}}^{(s)} = 1

from Equations (1) and (2), we get:

(14) $G_{j_{1} \dots j_{t}}^{(r)} = \frac{e^{(1 - r) S_{j_{1} \dots j_{t}}} - 1}{1 - r},$

where

S_{j_{1} \dots j_{t}}

is the Gibbs–Shannon entropy measure

(15) $S_{j_{1} \dots j_{t}} = - \sum_{a_{1}, \dots, a_{t}} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) log p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) .$

The Gibbs–Shannon entropy measure, Equation (15), is also obtained by taking the convenient limits of the special cases of Sharma–Mittal entropies, Equations (11)–(14):

(16) $lim_{s \to 1} H_{j_{1} \dots j_{t}}^{(s)} = lim_{s \to 1} L_{j_{1} \dots j_{t}}^{(s)} = lim_{s \to 1} R_{j_{1} \dots j_{t}}^{(s)} = lim_{r \to 1} G_{j_{1} \dots j_{t}}^{(r)} = S_{j_{1} \dots j_{t}} .$

We shall analyse in the next section the structure of the two-parameter space of Sharma–Mittal entropy by taking into consideration these special cases.

We are now reminded that for the limit of Gibbs–Shannon entropy, a conditional entropy measure is defined [3] by

(17) $S_{j_{1} \dots j_{t - 1} | j_{t}} = - \sum_{a_{1}, \dots, a_{t}} p_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) log p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) .$

We then have analogously for the conditional Sharma–Mittal entropy measure [3]

(18) ${(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} = \frac{{(\sum_{a_{1}, \dots, a_{t}} {\hat{p}}_{j_{t}} (a_{t}) {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s})}^{\frac{1 - r}{1 - s}} - 1}{1 - r} .$

It is easy to show by trivial calculation that, analogously to Equation (16), we will have

(19) $lim_{s \to 1} lim_{r \to s} {(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} = lim_{s \to 1} lim_{r \to 2 - s} {(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} = lim_{s \to 1} lim_{r \to 1} {(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} = lim_{r \to 1} lim_{s \to 1} {(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} = S_{j_{1} \dots j_{t - 1} | j_{t}} .$

From Equations (6), (7) and (18) and the application of the Bayes’ law, Equation (4), we can write

(20) ${(S M)}_{j_{1} \dots j_{t}} = {(S M)}_{j_{t}} + {(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} + (1 - r) {(S M)}_{j_{t}} {(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} .$

4. Aspects of Synergy and the Reduction of the Parameter Space for Fully Synergetic Distributions

For the Gibbs–Shannon entropy measure, the inequality written by A. Y. Khinchin [3,10] is

(21) $S_{j_{1} \dots j_{t - 1} | j_{t}} \leq S_{j_{1} \dots j_{t - 1}} .$

This inequality would be described by Khinchin as: “On the average, the knowledge a priori of the distribution on the column

j_{t}

can only decrease the uncertainty of the distribution on the

j_{1}, \dots, j_{t - 1}

columns”. We can write an analogous inequality for the Sharma–Mittal class of entropies

(22) ${(S M)}_{j_{1} \dots j_{t - 1} | j_{t}} \leq {(S M)}_{j_{1} \dots j_{t - 1}} .$

We then get from Equations (20) and (22)

(23) ${(S M)}_{j_{1} \dots j_{t}} \leq {(S M)}_{j_{t}} + {(S M)}_{j_{1} \dots j_{t - 1}} + (1 - r) {(S M)}_{j_{t}} {(S M)}_{j_{1} \dots j_{t - 1}} .$

After iteration of this equation,

t \to t - 1 \to t - 2 \to \dots

, we can also write

(24) ${(S M)}_{j_{1} \dots j_{t}} \leq \frac{\prod_{l = 1}^{t} [1 + (1 - r) {(S M)}_{j_{l}}] - 1}{1 - r} .$

The inequalities in (21)–(24) are associated with what are called “synergetic conditions”. In this section, we also derive the fully synergetic conditions as GKS inequalities.

After using Equations (7) and (9) in Equation (23), we get

(25) $\frac{{(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{1 - r}{1 - s}}}{1 - r} \leq \frac{{(α_{j_{t}}^{(s)})}^{\frac{1 - r}{1 - s}} \cdot {(α_{j_{1} \dots j_{t - 1}}^{(s)})}^{\frac{1 - r}{1 - s}}}{1 - r},$

and after iteration and use of Equation (24)

(26) $\frac{{(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{1 - r}{1 - s}}}{1 - r} \leq \frac{\prod_{l = 1}^{t} {(α_{j_{l}}^{(s)})}^{\frac{1 - r}{1 - s}}}{1 - r} .$

The hatched region of strict concavity in the parameter space of Sharma–Mittal entropies, $C = {(s, r) | r \geq s > 0}$ , is depicted in Figure 1. The special cases corresponding to Havrda–Charvat’s ( $r = s$ ), Landsberg–Vedral’s ( $r = 2 - s$ ), Renyi’s ( $r = 1$ ), and “non-extensive” Gaussian’s ( $s = 1$ ) entropies are also represented.

We can identify three subregions in Figure 1. They will correspond to

(27) $R_{I} = {(s, r) | 1 > r \geq s > 0} \Rightarrow α_{j_{1} \dots j_{t}}^{(s < 1)} \leq \prod_{l = 1}^{t} α_{j_{t}}^{(s < 1)},$

(28) $R_{II} = {(s, r) | r \geq s > 1} \Rightarrow α_{j_{1} \dots j_{t}}^{(s > 1)} \geq \prod_{l = 1}^{t} α_{j_{t}}^{(s > 1)},$

(29) $R_{III} = {(s, r) | r > 1 > s > 0} \Rightarrow α_{j_{1} \dots j_{t}}^{(s < 1)} \leq \prod_{l = 1}^{t} α_{j_{t}}^{(s < 1)}$

where the ordering of

α

-symbols has been obtained from Equation (26). The subregions R

_{I}

and R

_{III}

are what we call fully synergetic subregions, and the corresponding inequalities are the GKS inequalities [2].

The subregions R $_{I}$ , R $_{II}$ , and R $_{III}$ are depicted in Figure 2a–c, respectively. The union of subregions R $_{I}$ and R $_{III}$ is the fully synergetic Khinchin–Shannon restriction to be imposed on the strict concavity region of Figure 1 and it is depicted in Figure 2d below.

5. The Maximal Extension of the Parameter Space and Its Reduction for Fully Synergetic Distribution

In Figure 1 and Figure 2d, we have depicted the structure of the strict concavity region for Sharma–Mittal entropy measures and its reduction to a subregion by the application of the requirement of fully synergetic distributions, respectively. Our analysis has used a coarse-grained approach to concavity given by Equations (8) and (10). We now introduce some necessary refinements for characterizing the probability of occurrence in subarrays of m rows and t columns, $m \times t$ . For t columns, there are ${(20)}^{t}$ possibilities of occurrence of amino acids, which could be a large number, but we could count not individual amino acids, but groups of t-sets of amino acids ( $μ$ -groups) which appear on the m rows of the $m \times t$ array. We characterize these $μ$ -groups by $μ = 1, \dots, m$ , from all equal $μ$ -groups ( $μ = 1$ ) to m different $μ$ -groups ( $μ = m$ ). We also call $q_{μ}$ , the number of equal t-sets of a given $μ$ -group.

In Equation (2), the sum is over all the amino acids that make up the geometric object defined in Equation (1), the probability of occurrence. We can now perform the sum over $μ$ -groups and write

(30) $\sum_{a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}} p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}) = \sum_{a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}} \frac{n_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})}{m} = \sum_{μ = 1}^{m} \frac{q_{μ}}{m} = 1,$

where

(a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})

are the t-sets of a

μ

-group. We also have from Equation (7)

(31) $\sum_{a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}} {[p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})]}^{s} = \sum_{a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}} {[\frac{n_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})}{m}]}^{s} = \sum_{μ = 1}^{m} {(\frac{q_{μ}}{m})}^{s} = α_{j_{1} \dots j_{t}}^{(s)} .$

From Equations (30) and (31), we can now proceed to the calculation of the Hessian matrix for Sharma–Mittal entropy measures. We have for the first derivative of ${(S M)}_{j_{1} \dots j_{t}}$

(32) $\frac{\partial {(S M)}_{j_{1} \dots j_{t}}}{\partial p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})} = \frac{s}{1 - s} {(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{s - r}{1 - s}} {[p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})]}^{s - 1} .$

We then have for a generic element of the Hessian matrix [2]

(33) $\begin{matrix} H_{q_{μ} q_{ν}} & = \frac{\partial^{2} {(S M)}_{j_{1} \dots j_{t}}}{\partial p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}) \partial p_{j_{1} \dots j_{t}} (a_{1}^{q_{ν}}, \dots, a_{t}^{q_{ν}})} \\ = s {(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{s - r}{1 - s}} {[p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})]}^{s - 2} [\frac{s (s - r)}{{(1 - s)}^{2}} \frac{p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})}{p_{j_{1} \dots j_{t}} (a_{1}^{q_{ν}}, \dots, a_{t}^{q_{ν}})} \cdot {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{ν}}, \dots, a_{t}^{q_{ν}}) - δ_{μ ν}], \end{matrix}$

where

{\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})

is the escort probability associated to

p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})

, or

(34) ${\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}) \equiv \frac{{[p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})]}^{s}}{\sum_{b_{1}^{q_{ν}}, \dots, b_{t}^{q_{ν}}} p_{j_{1} \dots j_{t}} (b_{1}^{q_{ν}}, \dots, b_{t}^{q_{ν}})} .$

The principal minors are given by

(35) $\begin{matrix} det H_{q_{μ} q_{ν}} (μ, ν = 1, \dots, k) = {(- 1)}^{k - 1} s^{k} {(α_{j_{1} \dots j_{t}}^{(s)})}^{\frac{k (s - r)}{1 - s}} {[\prod_{μ = 1}^{k} p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})]}^{s - 2} \\ \cdot [\frac{s (s - r)}{{(1 - s)}^{2}} \sum_{μ = 1}^{k} {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}) - 1], k = 1, \dots, m, \end{matrix}$

and we have

(36) $\sum_{μ = 1}^{k} {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}}) = \sum_{μ = 1}^{k} \frac{{[p_{j_{1} \dots j_{t}} (a_{1}^{q_{μ}}, \dots, a_{t}^{q_{μ}})]}^{s}}{\sum_{μ = 1}^{m} {(\frac{q_{μ}}{m})}^{s}} = \frac{\sum_{μ = 1}^{k} {(\frac{q_{μ}}{m})}^{s}}{\sum_{μ = 1}^{m} {(\frac{q_{μ}}{m})}^{s}} \equiv σ_{k} (s)$

according to Equation (31).

From Equations (35) and (36), the requirement of strict concavity will lead to

(37) $\frac{s (s - r)}{{(1 - s)}^{2}} σ_{k} (s) - 1 < 0 .$

We then have

(38) $det H_{q_{μ} q_{ν}} (μ, ν = 1, \dots, k) < 0, k odd; > 0, k even .$

This does correspond to the criterion of negative definiteness of the Hessian matrix for strict concavity of multivariate functions [11].

Each k-value is associated with the k-epigraph region, which is the k-extension of the strict concavity region presented in Figure 1. These regions are given by

(39) $C_{k_{\max}} = {(s, r) | r \geq s > 0} \cup \{(s, r) | s > r > s - \frac{{(1 - s)}^{2}}{s σ_{k} (s)}\}, k = 1, \dots, m .$

The greatest lower bound of the sequence of k-curves is given by $σ_{m} (s) = 1$ . We then have

(40) $r_{m} (s) = 2 - \frac{1}{s} .$

We can then write for the maximal extended region of strict concavity

(41) $C_{\max} = {(s, r) | r \geq s > 0} \cup \{(s, r) | s > r > 2 - \frac{1}{s}\} .$

The region corresponding to Equation (41) is depicted in Figure 3 below.

We are now ready to undertake the application of restrictions for fully synergetic distributions (validity of GKS inequalities) to the maximal strict concavity region of Figure 3.

We start by identifying two regions included in Figure 3. They will be given by

(42) $R_{IV} = \{(s, r) | 1 > s > r \geq 2 - \frac{1}{s} > 0\} \Rightarrow α_{j_{1} \dots j_{t}}^{(s < 1)} \leq \prod_{l = 1}^{t} α_{j_{t}}^{(s < 1)},$

(43) $R_{V} = \{(s, r) | s > r \geq 2 - \frac{1}{s} > 1\} \Rightarrow α_{j_{1} \dots j_{t}}^{(s > 1)} \geq \prod_{l = 1}^{t} α_{j_{t}}^{(s > 1)} .$

These regions are depicted in Figure 4a,b, respectively.

In order to find the reduced region corresponding to Figure 3, analogously to what has been done for Figure 1, we also need the subregions R $_{I}$ , R $_{III}$ , Equations (27) and (29): the resulting subregion of fully synergetic distributions is given by $R_{IV} \cup R_{I} \cup R_{III}$ and is depicted in Figure 5.

6. Hölder Inequalities and GKS Inequalities: A Possible Conjecture

In this section, we study the relation between GKS inequalities [2] and Hölder inequalities by using examples of distributions obtained from databases of protein domain families. In order to start, some definitions and properties of the probabilistic space are now in order.

Let us first introduce the definition of the conditional probability of occurrence of the escort probability of occurrence [12]. This is a simple application to escort probabilities of Equation (3):

(44) ${\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) = \frac{{\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})}{{\hat{p}}_{j_{t}} (a_{t})} .$

From the definitions of escort probabilities, Equation (9), we can write

(45) ${\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}) = \frac{{(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s}}{\sum_{b_{1}, \dots, b_{t}} {(p_{j_{1} \dots j_{t}} (b_{1}, \dots, b_{t}))}^{s}},$

and

(46) ${\hat{p}}_{j_{t}} (a_{t}) = \frac{{(p_{j_{t}} (a_{t}))}^{s}}{\sum_{b_{t}} {(p_{j_{t}} (b_{t}))}^{s}} .$

In Equations (44)–(46), the symbols

a_{1}, \dots, a_{t}

;

b_{1}, \dots, b_{t}

assume the representative letters of the one-letter code for the 20 amino acids,

a_{j}

;

b_{j}

(1 \leq j \leq t) \in

{A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y}.

After substituting Equations (45) and (46) into Equation (44), we get

(47) ${\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) = \frac{{(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}))}^{s} {(p_{j_{t}} (a_{t}))}^{s}}{\sum_{b_{1}, \dots, b_{t}} {(p_{j_{t}} (b_{t}))}^{s} {(p_{j_{1} \dots j_{t}} (b_{1}, \dots, b_{t - 1} | b_{t}))}^{s} \cdot {\hat{p}}_{j_{t}} (a_{t})},$

and from Equation (46)

(48) ${\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) = \frac{{(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}))}^{s}}{\sum_{b_{1}, \dots, b_{t}} {\hat{p}}_{j_{t}} (b_{t}) {(p_{j_{1} \dots j_{t}} (b_{1}, \dots, b_{t - 1} | b_{t}))}^{s}} .$

We also write the definition of escort probability of occurrence of the conditional probability of occurrence [12]

(49) $\hat{p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})} = \frac{{(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}))}^{s}}{\sum_{b_{1}, \dots, b_{t - 1}} {(p_{j_{1} \dots j_{t}} (b_{1}, \dots, b_{t - 1} | b_{t}))}^{s}} .$

We can check the definitions of Equations (48) and (49) from the equality of the two escort probabilities with the original conditional probability, for

s = 1

(50) $s = 1 \Rightarrow {\hat{p}}_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) = \hat{p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})} = p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}) .$

We should note that the denominators of the right-hand sides of Equations (48) and (49), or,

(51) $Z \equiv \sum_{a_{1}, \dots, a_{t}} {\hat{p}}_{j_{t}} (a_{t}) {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} = \frac{α_{j_{1} \dots j_{t}}^{(s)}}{α_{j_{t}}^{(s)}},$

and

(52) $X (a_{t}) \equiv \sum_{a_{1}, \dots, a_{t - 1}} {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t}))}^{s}$

will be equal if all amino acids in the

j_{t}

column are equal. If we have, for instance, the

j_{t}

column given by:

(53) $j_{t} \leftrightarrow \underset{m}{\underset{︸}{(A, A, A, A, \dots, A)}} .$

The unit vectors of probabilities

{\hat{p}}_{j_{t}}

and

p_{j_{t}}

will also be equal and given by

(54) ${({\hat{p}}_{j_{t}})}^{T} = {(p_{j_{t}})}^{T} = \underset{20}{\underset{︸}{(1, 0, 0, 0, \dots, 0)}} .$

This means that for this special case of an event of rare occurrence, we also have the equality of the conditional of the escort probability and the escort probability of the conditional probability, or the left-hand sides of Equations (48) and (49), respectively.

For a $j_{t}$ -column with a generic distribution of amino acids, the denominators Z and $X (a_{t})$ on the right-hand sides of Equations (48) and (49) will no longer be equal. An ordering of these denominators should be decided from the probabilities of amino acid occurrence in a chosen protein domain family.

This study is undertaken with the help of the functions Z and $X (a_{t})$ of Equations (51) and (52) and with the functions J and U, defined below:

(55) $J \equiv \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s},$

(56) $U \equiv \sum_{a_{1}, \dots, a_{t - 1}} {(p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}))}^{s} \equiv α_{j_{1} \dots j_{t - 1}}^{(s)} .$

Our method will then be the comparison of pairs of functions in order to proceed with the search for the effect of fully synergetic distributions of amino acids.

There are six comparisons to study:

(I). $X (a_{t}) ≷ Z$

(57) $\frac{1}{{(p_{j_{t}} (a_{t}))}^{s}} \sum_{a_{1}, \dots, a_{t - 1}} {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s} ≷ \frac{α_{j_{1} \dots j_{t}}^{(s)}}{α_{j_{t}}^{(s)}}; p_{j_{t}} (a_{t}) \neq 0 .$

(II). $X (a_{t}) ≷ J$

(58) $\begin{matrix} \frac{1}{{(p_{j_{t}} (a_{t}))}^{s}} \sum_{a_{1}, \dots, a_{t - 1}} {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s} ≷ \frac{1}{α_{j_{t}}^{(s)}} \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s - 1} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s}; \\ p_{j_{t}} (a_{t}) \neq 0 . \end{matrix}$

(III). $X (a_{t}) ≷ U$

(59) $\frac{1}{{(p_{j_{t}} (a_{t}))}^{s}} \sum_{a_{1}, \dots, a_{t - 1}} {(p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t}))}^{s} ≷ α_{j_{1} \dots j_{t - 1}}^{(s)} .$

(IV). $U ≷ J$

(60) $\sum_{a_{1}, \dots, a_{t - 1}} {(p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}))}^{s} \equiv α_{j_{1} \dots j_{t - 1}}^{(s)} ≷ \frac{H}{{(α_{j_{t}}^{(s)})}^{s}},$

where

H

is defined by,

(61) $H \equiv \sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s - 1} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s} .$

(V). $J ≷ Z$

(62) $\sum_{a_{1}, \dots, a_{t - 1}} {[\sum_{a_{t}} {\hat{p}}_{j_{t}} (a_{t}) p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t - 1} | a_{t})]}^{s} \equiv \frac{H}{{(α_{j_{t}}^{(s)})}^{s}} ≷ \frac{α_{j_{1} \dots j_{t}}^{(s)}}{α_{j_{t}}^{(s)}} .$

(VI). $U ≷ Z$

(63) $\sum_{a_{1}, \dots, a_{t - 1}} {(p_{j_{1} \dots j_{t - 1}} (a_{1}, \dots, a_{t - 1}))}^{s} \equiv α_{j_{1} \dots j_{t - 1}}^{(s)} ≷ \frac{α_{j_{1} \dots j_{t}}^{(s)}}{α_{j_{t}}^{(s)}} .$

Equations (57)–(59) should be multiplied by ${(p_{j_{t}} (a_{t}))}^{s}$ and after that, each one has to be summed over $a_{t}$ . We then have, respectively,

(64) $α_{j_{1} \dots j_{t}}^{(s)} ≷ α_{j_{1} \dots j_{t}}^{(s)},$

(65) ${(α_{j_{t}}^{(s)})}^{s - 1} α_{j_{1} \dots j_{t}}^{(s)} ≷ H,$

(66) $α_{j_{1} \dots j_{t}}^{(s)} ≷ α_{j_{t}}^{(s)} \cdot α_{j_{1} \dots j_{t - 1}}^{(s)} .$

Equations (60), (62) and (63) can be written, respectively, as

(67) ${(α_{j_{t}}^{(s)})}^{s} \cdot α_{j_{1} \dots j_{t - 1}}^{(s)} ≷ H$

(68) $H ≷ {(α_{j_{t}}^{(s)})}^{s - 1} \cdot α_{j_{1} \dots j_{t}}^{(s)},$

(69) $α_{j_{t}}^{(s)} \cdot α_{j_{1} \dots j_{t - 1}}^{(s)} ≷ α_{j_{1} \dots j_{t}}^{(s)},$

The Hölder’s inequality as applied to probabilities of occurrence [3] is written as

(70) $\frac{1}{{(α_{j_{t}}^{(s)})}^{s}} {[\sum_{a_{t}} {(p_{j_{t}} (a_{t}))}^{s - 1} p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s} \geq \frac{\sum_{a_{t}} {[p_{j_{1} \dots j_{t}} (a_{1}, \dots, a_{t})]}^{s}}{α_{j_{t}}^{(s)}} .$

After multiplying by

{(α_{j_{t}}^{(s)})}^{s}

and summing over

a_{1}, \dots, a_{t - 1}

, we get

(71) $H \geq {(α_{j_{t}}^{(s)})}^{s - 1} α_{j_{1} \dots j_{t}}^{(s)}, s \leq 1 .$

We also define

(72) $O \equiv {(α_{j_{t}}^{(s)})}^{s - 1} α_{j_{1} \dots j_{t}}^{(s)},$

(73) $B \equiv {(α_{j_{t}}^{(s)})}^{s} α_{j_{1} \dots j_{t - 1}}^{(s)} .$

We then summarize the results obtained:

Equation (64) is only an identity: $α_{j_{1} \dots j_{t}}^{(s)} = α_{j_{1} \dots j_{t}}^{(s)}$ .
Equations (65) and (68) can be ordered by Hölder’s inequality, Equations (70) and (71).
Equations (66) and (69) can be ordered by GKS inequalities, corresponding to fully synergetic distributions of amino acids, $α_{j_{1} \dots j_{t}}^{(s < 1)} \leq α_{j_{t}}^{(s < 1)} \cdot α_{j_{1} \dots j_{t - 1}}^{(s < 1)}$ .
Equation (67) cannot be ordered without additional experimental/phenomenological information on the probabilities of occurrence to be obtained from updated versions of protein domain family databases [13].

We now collect the formulae obtained from the analysis performed on this section. Equations (65) and (68) are ordered by Hölder’s inequality. We write

(74) $H - O \geq 0, s < 1 .$

Equations (66) and (69) are ordered by GKS inequality. We write

(75) $B - O \geq 0, s < 1 .$

After using Equation (73), we can write Equation (67) as

(76) $B - H ≷ 0 .$

In Figure 6a,b we have depicted the curves corresponding to functions $H - O$ and $B - O$ for seven 3-sets of contiguous columns and 80 rows, chosen from databases Pfam 27.0 and Pfam 35.0, respectively. There are also inset figures in order to show the curves for $s \geq 1$ .

In Figure 7a,b, we do the same for the differences $B - H$ . We emphasize that for the 3-sets such that $B - H \geq 0$ , $0 \leq s \leq 1$ , the GKS inequalities $B - O \geq 0$ will result from the validity of Hölder’s inequality. We have worked with the PF01926 protein domain family to perform all the calculations.

7. Concluding Remarks

The first comment we want to make to the present work is about the possibility of working in a region of the parameter space that preserves the strict concavity and the fully synergetic structure of the Sharma–Mittal class of entropy measure distributions to be visited by solutions of a new successful statistical mechanics approach. The usual work with Havrda–Charvat distributions describes the evolution along the boundary ( $r = s$ ) of the region ( $r \geq s > 0$ ) that was correctly considered to correspond to strict concavity, but it is also known to be non-synergetic for $s > 1$ . We now have the opportunity to develop this statistical mechanics approach along an extended boundary, preserving the strict concavity and providing the study of the evolution of fully synergetic entropy distributions. A first sketch of these developments will be presented in a forthcoming publication.

With respect to Figure 6 and Figure 7, we could hypothesize that if the ordering of B and $H$ could not be obtained, this would be due to the poor alignment of some protein domain families we have been using, but we are not confident enough that we could do this, because we would need much more information “in silico” to be obtained from many other protein domain families. In other words, we expect that a good alignment of a protein domain family will result in the ordering of B and $H$ , but we need to verify this in a large number of families from different Pfam versions before we proceed with a proposal of a method to improve the Pfam database. This looks promising for good scientific work in the line of research we have been aiming to introduce in Ref. [2] and in this contribution.

Author Contributions

Conceptualization, R.P.M. and S.C.d.A.N.; methodology, R.P.M. and S.C.d.A.N.; formal analysis, R.P.M. and S.C.d.A.N.; writing—original draft preparation, R.P.M.; writing—review and editing, R.P.M. and S.C.d.A.N.; visualization, R.P.M. and S.C.d.A.N.; supervision, R.P.M.; project administration, R.P.M. All authors have read and agreed to the published version of the manuscript.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviation is used in this manuscript:

GKS	Generalized Khinchin-Shannon

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

View Image - Figure 1. The strict concavity region [Forumla omitted. See PDF.] of Sharma–Mittal class of entropy measures. It is the epigraph of the curve ([Forumla omitted. See PDF.]), and this corresponds to the Havrda–Charvat entropy, which is depicted in brown. The Landsberg–Vedral’s ([Forumla omitted. See PDF.]), Renyi’s ([Forumla omitted. See PDF.]), and “non-extensive” Gaussian’s ([Forumla omitted. See PDF.]) are depicted in green, blue, and red, respectively.

Figure 1. The strict concavity region [Forumla omitted. See PDF.] of Sharma–Mittal class of entropy measures. It is the epigraph of the curve ([Forumla omitted. See PDF.]), and this corresponds to the Havrda–Charvat entropy, which is depicted in brown. The Landsberg–Vedral’s ([Forumla omitted. See PDF.]), Renyi’s ([Forumla omitted. See PDF.]), and “non-extensive” Gaussian’s ([Forumla omitted. See PDF.]) are depicted in green, blue, and red, respectively.

View Image - Figure 2. Subregions of the strict concavity region [Forumla omitted. See PDF.] of the Sharma–Mittal class of entropy measures. (a) Khinchin–Shannon subregion R[Forumla omitted. See PDF.]—fully synergetic.; (b) The non-synergetic subregion R[Forumla omitted. See PDF.]; (c) Khinchin–Shannon subregion R[Forumla omitted. See PDF.]—fully synergetic; (d) Khinchin–Shannon subregion [Forumla omitted. See PDF.]—fully synergetic. The reduced subregion [Forumla omitted. See PDF.] of Figure 1 is obtained by taking into consideration fully synergetic distributions only.

Figure 2. Subregions of the strict concavity region [Forumla omitted. See PDF.] of the Sharma–Mittal class of entropy measures. (a) Khinchin–Shannon subregion R[Forumla omitted. See PDF.]—fully synergetic.; (b) The non-synergetic subregion R[Forumla omitted. See PDF.]; (c) Khinchin–Shannon subregion R[Forumla omitted. See PDF.]—fully synergetic; (d) Khinchin–Shannon subregion [Forumla omitted. See PDF.]—fully synergetic. The reduced subregion [Forumla omitted. See PDF.] of Figure 1 is obtained by taking into consideration fully synergetic distributions only.

View Image - Figure 3. The maximal strict concavity region of the Sharma–Mittal class of entropy measures. The hatched region is the epigraph of the curve [Forumla omitted. See PDF.] which is depicted in black. The Havrda–Charvat ([Forumla omitted. See PDF.]) region is in brown. The Landsberg–Vedral ([Forumla omitted. See PDF.]), Renyi ([Forumla omitted. See PDF.]), and “non-extensive” Gaussian ([Forumla omitted. See PDF.]) regions are depicted in green, blue, and red, respectively.

Figure 3. The maximal strict concavity region of the Sharma–Mittal class of entropy measures. The hatched region is the epigraph of the curve [Forumla omitted. See PDF.] which is depicted in black. The Havrda–Charvat ([Forumla omitted. See PDF.]) region is in brown. The Landsberg–Vedral ([Forumla omitted. See PDF.]), Renyi ([Forumla omitted. See PDF.]), and “non-extensive” Gaussian ([Forumla omitted. See PDF.]) regions are depicted in green, blue, and red, respectively.

View Image - Figure 4. Subregions of the maximal strict concavity region of the Sharma–Mittal class of entropy measures (Figure 3). (a) Khinchin–Shannon subregion R[Forumla omitted. See PDF.]—fully synergetic; (b) The non-synergetic subregion R[Forumla omitted. See PDF.].

Figure 4. Subregions of the maximal strict concavity region of the Sharma–Mittal class of entropy measures (Figure 3). (a) Khinchin–Shannon subregion R[Forumla omitted. See PDF.]—fully synergetic; (b) The non-synergetic subregion R[Forumla omitted. See PDF.].

View Image - Figure 5. [Forumla omitted. See PDF.] is the reduction of the region of Figure 3 by taking into consideration the fully synergetic distributions only.

Figure 5. [Forumla omitted. See PDF.] is the reduction of the region of Figure 3 by taking into consideration the fully synergetic distributions only.

View Image - Figure 6. Hölder ([Forumla omitted. See PDF.], [Forumla omitted. See PDF.]) distributions (dashed curves) and Khinchin–Shannon ([Forumla omitted. See PDF.], [Forumla omitted. See PDF.]) distributions (continuous curves) of the PF01926 protein domain family from (a) protein domain family PF01926 obtained from Pfam 27.0 and (b) protein domain family PF01926 obtained from Pfam 35.0. The top-right inset shows details of the curves for [Forumla omitted. See PDF.].

Figure 6. Hölder ([Forumla omitted. See PDF.], [Forumla omitted. See PDF.]) distributions (dashed curves) and Khinchin–Shannon ([Forumla omitted. See PDF.], [Forumla omitted. See PDF.]) distributions (continuous curves) of the PF01926 protein domain family from (a) protein domain family PF01926 obtained from Pfam 27.0 and (b) protein domain family PF01926 obtained from Pfam 35.0. The top-right inset shows details of the curves for [Forumla omitted. See PDF.].

View Image - Figure 7. [Forumla omitted. See PDF.] difference of the PF01926 protein domain family from (a) protein domain family PF01926 obtained from Pfam 27.0 and (b) protein domain family PF01926 obtained from Pfam 35.0. The top-right inset shows details of the curves for [Forumla omitted. See PDF.]. [Forumla omitted. See PDF.], [Forumla omitted. See PDF.] ⟹ [Forumla omitted. See PDF.].

Figure 7. [Forumla omitted. See PDF.] difference of the PF01926 protein domain family from (a) protein domain family PF01926 obtained from Pfam 27.0 and (b) protein domain family PF01926 obtained from Pfam 35.0. The top-right inset shows details of the curves for [Forumla omitted. See PDF.]. [Forumla omitted. See PDF.], [Forumla omitted. See PDF.] ⟹ [Forumla omitted. See PDF.].

References

1. Mondaini, R.P.; de Albuquerque Neto, S.C. The Statistical Analysis of Protein Domain Family Distributions via Jaccard Entropy Measures. Trends in Biomathematics: Modeling Cells, Flows, Epidemics, and the Environment; Mondaini, R.P. Springer: Cham, Switzerland, 2020; pp. 169-207.

2. Mondaini, R.P.; de Albuquerque Neto, S.C. Alternative Entropy Measures and Generalized Khinchin-Shannon Inequalities. Entropy; 2021; 23, 1618. [DOI: https://dx.doi.org/10.3390/e23121618] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/34945924]

3. Mondaini, R.P.; de Albuquerque Neto, S.C. Khinchin–Shannon Generalized Inequalities for “Non-additive” Entropy Measures. Trends in Biomathematics: Mathematical Modeling for Health, Harvesting, and Population Dynamics; Mondaini, R.P. Springer: Cham, Switzerland, 2019; pp. 177-190.

4. Beck, C. Generalized Information and Entropy Measures in Physics. Contemp. Phys.; 2009; 50, pp. 495-510. [DOI: https://dx.doi.org/10.1080/00107510902823517]

5. Sharma, B.D.; Mittal, D.P. New Non-additive Measures of Entropy for Discrete Probability Distributions. J. Math. Sci.; 1975; 10, pp. 28-40.

6. Havrda, J.; Charvat, F. Quantification Method of Classification Processes. Concept of Structural α-entropy. Kybernetica; 1967; 3, pp. 30-35.

7. Landsberg, P.T.; Vedral, V. Distributions and Channel Capacities in Generalized Statistical Mechanics. Phys. Lett. A; 1998; 247, pp. 211-217. [DOI: https://dx.doi.org/10.1016/S0375-9601(98)00500-3]

8. Rényi, A. On Measures of Entropy and Information. Contributions to the Theory of Statistics, Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, CA, USA, 20 June–30 July 1960; Neyman, J. University of California Press: Berkeley, CA, USA, 1961; Volume 1, pp. 547-561.

9. Oikonomou, T. Properties of the “Non-extensive Gaussian” Entropy. Phys. A; 2007; 381, pp. 155-163. [DOI: https://dx.doi.org/10.1016/j.physa.2007.03.010]

10. Khinchin, A.Y. Mathematical Foundations of Information Theory; Dover Publications, Inc.: New York, NY, USA, 1957.

11. Marsden, J.E.; Tromba, A. Vector Calculus; 6th ed. W. H. Freeman and Company Publishers: New York, NY, USA, 2012.

12. Mondaini, R.P.; de Albuquerque Neto, S.C. The Maximal Extension of the Strict Concavity Region on the Parameter Space for Sharma-Mittal Entropy Measures. Trends in Biomathematics: Stability and Oscillations in Environmental Social and Biological Models; Mondaini, R.P. Springer: Cham, Switzerland, 2022.

13. Mistry, J.; Chuguransky, S.; Williams, L.; Qureshi, M.; Salazar, G.A.; Sonnhammer, E.L.L.; Tosatto, S.C.E.; Paladin, L.; Raj, S.; Richardson, L.J. et al. Pfam: The Protein Families Database in 2021. Nucleic Acids Res.; 2021; 49, pp. D412-D419. [DOI: https://dx.doi.org/10.1093/nar/gkaa913]

Word count: 3695

Show less

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

In this contribution, we specify the conditions for assuring the validity of the synergy of the distribution of probabilities of occurrence. We also study the subsequent restriction on the maximal extension of the strict concavity region on the parameter space of Sharma–Mittal entropy measures, which has been derived in a previous paper in this journal. The present paper is then a necessary complement to that publication. Some applications of the techniques introduced here are applied to protein domain families (Pfam databases, versions 27.0 and 35.0). The results will show evidence of their usefulness for testing the classification work performed with methods of alignment that are used by expert biologists.

Details

Title

Essential Conditions for the Full Synergy of Probability of Occurrence Distributions

Author

Mondaini, Rubem P

; Simão C de Albuquerque Neto

First page

993

Publication year

2022

Publication date

2022

Publisher

MDPI AG

e-ISSN

10994300

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/e24070993

ProQuest document ID

2693979245

Essential Conditions for the Full Synergy of Probability of Occurrence Distributions

Jump to:

Full text

Abstract

Details

Suggested sources