Content area
Many biological and medical questions can be modeled using time-to-event data in finite-state Markov chains, with the phase-type distribution describing intervals between events. We solve the inverse problem: given a phase-type distribution, can we identify the transition rate parameters of the underlying Markov chain? For a specific class of solvable Markov models, we show this problem has a unique solution up to finite symmetry transformations, and we outline a recursive method for computing symbolic solutions for these models across any number of states. Using the Thomas decomposition technique from computer algebra, we further provide symbolic solutions for any model. Interestingly, different models with the same state count but distinct transition graphs can yield identical phase-type distributions. To distinguish among these, we propose additional properties beyond just the time to the next event. We demonstrate the method’s applicability by inferring transcriptional regulation models from single-cell transcription imaging data.
Introduction
Continuous-time finite Markov chains have been largely used to model biological and medical phenomena involving multiple discrete states (Bharucha-Reid, 1997). For instance, the progression of a disease may have multiple stages, including varying severity grades, recovery after treatment, and potential relapse. Similarly, gene expression functions intermittently, and often requires several prior non-productive and productive stages to prepare and initiate transcription or translation, or to pause these processes. In both examples, the number of states and the timescales of transitions between them are usually unknown.
A common dataset for such models consists of a sequence of specific events, such as the occurrence of an observed state. In renewal processes, relevant to many applications, the times separating successive events are independent, identically distributed variables, and the sequence is characterized by the distribution of these intervals (Feller, 1966). In simple cases, such as Poisson processes, this distribution is exponential. However, when multiple states are involved, this distribution deviates from the exponential. The renewal process scenario applies only when there is one observable state and the process resets to the same unobserved state after each observation. These scenarios are modeled by phase-type distributions, introduced by (Neuts, 1975), (see Fig. 1).
Phase-type distributions were used in biology and medicine for modeling transcription and translation bursting (Kumar et al., 2015; Soltani et al., 2016; Kumar and Kulkarni, 2019; Bokes et al., 2020; Tantale et al., 2021; Pimmett et al., 2021; Douaihy et al., 2023; Zhang et al., 2024), stochastic enzymatic reactions (Moffitt and Bustamante, 2014), ion channel dynamics (LaMar et al., 2011), drug kinetics (Faddy, 1993), population genetics (Hobolth et al., 2019; Hössjer et al., 2018), and public health issues (Fackrell, 2009; Liquet et al., 2012; Asanjarani et al., 2021; Stone et al., 2022).
In this paper, our aim is to determine the underlying Markov chain, the number of states, and the transition rates between states from the phase-type distribution. This inverse problem has previously been tackled using maximum likelihood approaches, where the data likelihood was computed for an arbitrarily chosen model and its maximization yielded optimal model parameters (Asanjarani et al., 2021; Stone et al., 2022).
In contrast to this direct inference approach, our method decomposes the inference problem into two parts. The first part involves regressing a parametric multi-exponential representation of the phase-type distribution (Dufresne, 2007; Douaihy et al., 2023). The second part of our approach is entirely algebraic. We formulate the inverse problem as a system of polynomial equations in the transition rate parameters and solve this system for different numbers of states and transition topologies. In doing so, we obtain explicit formulas relating transition rate parameters to phase-type distribution parameters for all Markov chains that generate the same multi-exponential distribution.
Other approaches to finding Markov chain representations of a phase-type distribution, using generating functions or the Laplace transform (Maier, 1991; Commault and Chemla, 1996), propose some, but not all, Markov chain representations and do not always offer explicit formulas for the transition rate parameters.
We provide a code implementation of our algorithms that formulate the inverse problem of any phase-type distribution as a system of polynomial equations and solve it using Thomas decomposition, a computer algebra method. Our approach also determines whether the model is solvable, meaning whether the solution to the inverse problem is unique or unique up to transformations by finite symmetries.
To demonstrate the relevance of our approach, we apply it to the context of transcriptional bursting, a well-documented phenomenon in gene expression research (Dufourt et al., 2018; Tantale et al., 2021; Pimmett et al., 2021; Bellec et al., 2022; Douaihy et al., 2023; Damour et al., 2023).
The structure of the paper is as follows: In Section 2, we discuss the phase-type distribution problem and find its solution in the non-degenerate case, specifically when the generator of the Markov chain has distinct eigenvalues. Section 3 introduces the symbolic inverse problem, which entails finding the transition rate parameters from the phase-type distribution parameters. We demonstrate that solving this inverse problem involves a system of polynomial equations symmetric in the phase-type distribution parameters. Additionally, we discuss solvable models in Section 3, where the inverse problem has a unique solution. In Section 4, we present an example of a solvable model with an arbitrary number of states. Section 5 utilizes Thomas decomposition to solve the inverse problem. Section 6 addresses model degeneracy issues. Finally, in Section 7, we apply our findings to analyze transcriptional bursting data.
[See PDF for image]
Fig. 1
Phase-type distribution for a three-state Markov chain model with state 4 as the observable state. A typical dataset records the moments when 4 is reached. The chain returns instantly from 4 to 3 and restarts. The interval between successive observations, also defined as the time to reach 4 from 3, follows a phase-type distribution
Phase-Type Distribution, Direct Problem
Statistical Definition
Consider the successive observations of a state, as illustrated in Fig. 1. Following each observation, the system returns to a state s, so the interval between successive observations corresponds to the time required to reach the observed state from s, denoted by . The distribution of is of the phase-type if it is derived from convolutions and mixtures (convex combinations) of exponential distributions (Commault and Mocanu, 2003). Alternatively, it can be defined as having a Laplace transform that is a rational function (see Theorem 1 in (Commault and Mocanu, 2003)). Phase-type distributions generalize exponential distributions, including mixtures of exponential or Erlang distributions.
This paper considers multi-exponential distributions, characterized by a complementary cumulative distribution function (survival function) of the following type:
1
where are all negative, and . are not necessarily all positive, but are constrained by the conditions for all .This family of distributions includes mixtures of exponentials when all are positive but is not restricted to this case (some can be negative). The memoryless exponential distribution corresponds to the case , while cases with correspond to phase-type distributions with memory. Focusing on multi-exponential distributions is not restrictive, as any phase-type distribution can be approximated with arbitrary precision by distributions of this type. Furthermore, as will be shown in the next section, finite Markov chains generically lead to multi-exponential distributions.
Finite State Continuous Time Markov Chains
In this subsection, we introduce phase-type distributions using Markov chain representations and demonstrate the conditions under which we obtain the multi-exponential distributions described by (1). We assume that the reader is familiar with the basic theory of finite-state continuous-time Markov chains and refer to (Bharucha-Reid, 1997; Meyn and Tweedie, 2012) for excellent presentations of this subject. We also use the well known result from operational research that any phase-type distribution can be defined as the distribution of the absorption time of a finite-state, continuous-time Markov chain (Commault and Mocanu, 2003; Bladt, 2005).
To formalize the problem, we consider a continuous time Markov chain M(t), where the first N states are unobservable and the state is the observed state.
The chain is defined by its generator , an matrix with entries representing the transition rate from the state i to the state j for , and (zero row sum).
We are interested in the distribution of , the first passage time to state starting from the state s with :where M(t) is the state of the chain at time t.
Let us consider two similar models. In the model with return, the chain returns instantly to state s once reaching () and the process restarts. Since s is the unique return state, the observation timings form a renewal process with inter-event times following the distribution of (see (Feller, 1966) for an introduction to renewal processes). In the model with absorption, is absorbing and for all . In this case, is the time to absorption starting from s.
The distribution of is the same for both models but suited to different applications: the absorption model applies when observations stop after the event, like in survival analysis or gene fixation, while the return model fits recurring events, such as transcription bursting or epidemiology. Even when the return model is contextually appropriate, we use the equivalent absorption model for distribution calculations due to technical considerations.
If several return states are possible, then the model with return is no longer a renewal process. The distribution of the inter-event times depends on the last return state and is no longer a phase-type distribution.
As examples of Markov models we have generated all the models having N unobservable states and satisfying the two conditions:
The matrix has independent, non-diagonal elements (kinetic parameters). This assumption is necessary to establish a one-to-one relation between the parameters of the distribution and the model’s kinetic parameters, ensuring a unique solution to the inverse problem.
Only the state N leads to , a simplifying assumption that fits the application discussed and aids in presentation. General results without this assumption are presented in the Sect. 3.7.
With the absorption assumption, the generator of the Markov chain model M9 that has , represented in the Fig. 2 is
2
For the calculation of the distribution of it is convenient to introduce the state probabilities for . The variables satisfy the following system of linear differential equations (the master equation):3
with the initial conditions , where is the Kronecker symbol and (called dual generator) is the transpose of the generator matrix , and . Because the last column of is zero ( is absorbing), the variables satisfy an autonomous ODE system. Indeed, let be the matrix obtained by eliminating the last line and the last column of . For the example considered, we have4
Then satisfies5
with initial conditions , where is the Kronecker delta.Unless stated otherwise, we assume that the state can be reached only from state N. Thus, the remaining variable satisfies
6
with the initial condition .Let us consider that the eigenvalues of the matrix satisfy the following non-degeneracy condition
7
Remark 1
It can be shown that is always satisfied. Indeed, are probabilities, therefore remain bounded for all t, which means that one cannot have for some . If some , then there are solutions of (5) that increase without bound when t increases. We will show later that strong connectedness of the transition graph of the Markov chain reduced to the vertices implies that none of the eigenvalues can be zero. One should note that, unlike , is not a continuous time Markov chain generator and does not satisfy the zero row sum rule.
Considering that the condition (7) is satisfied, the solutions of (5),(6) read
8
where and for are eigenvalues and eigenvectors of , respectively.Because is absorbing,is the cumulative distribution function of the time . We also define the survival function of as follows
9
where10
The constants satisfy11
which follows from .This shows that in the non-degenerate case the distribution of is multi-exponential with independent parameters and .
[See PDF for image]
Fig. 2
Models with and return to the state 3. We show only those that are ergodic (every state can be reached from any other state through a directed path) and not related by symmetries (permutations of states 1, 2, 3). All these models can generate exactly the same phase-type distribution
Inverse Problem
The inverse problem involves assuming that the survival function, and thus the parameters , and , are known. These can be estimated from data using least squares or maximum likelihood (Dufresne, 2007; Tantale et al., 2021; Douaihy et al., 2023; Liu, 2022). The goal is to identify which Markov chains can represent this phase-type distribution. A number of general results are known for this problem. We recall here the following, important result, that follows from Theorem 12 in (Commault and Mocanu, 2003):
Proposition 1
A multi-exponential phase-type distribution with N exponentials can be represented by a Markov chain model of order N (the order is defined as the number of unobservable states).
In this section we introduce the problem of computing the transition rate parameters, which are the non-zero, non-diagonal elements of the matrix of a Markov chain of order N, from the parameters of the phase-type distribution. We are interested in models where this problem could have a unique solution, therefore the matrix has only non-zero, non-diagonal elements. We range these elements in the dimensional vector . After reordering, we place the element in the last position of , namelyWe show below that the inverse problem consists in solving polynomial equations for .
Vieta’s Formulas
The eigenvalues are the roots of the the characteristic polynomial of , defined as
12
The coefficients of the characteristic polynomial are polynomials with integer coefficients on the transition rates .A first set of equations relating eigenvalues to the kinetic equations results from the Vieta’s formulas
13
Eigenvector Equations
The amplitude parameters of the survival function occur in (10) together with eigenvector components and solution coefficients . We need to relate the latter to the transition rate parameters .
First, we solve the eigenvector equation
14
We look for solutions of (14) of the formwith . In the Sect. 3.6 we will see for which models this choice is possible.Because equations (14) have integer coefficients in , the eigenvector components are rational functions of and .
The initial conditions satisfied by the variables and (8) provide a linear system of equations for the constants :
15
Because of the non-degeneracy condition (7), the system (15) has a unique solution , where . The solutions are rational functions of , and .From (10) we obtain independent equations for the transition rates :
16
where .The inverse problem is defined by the system of equations formed by (13) and (16). When this system has solutions, the transition rates can be expressed as functions of the survival function parameters and , .
Illustrative Example
As an example let us consider again the model M9 with a return state , represented in Fig. 2. For this model the characteristic polynomial isThe system (14) has the solution:The coefficients satisfy
17
It follows that18
The inverse problem for the model M9 consists of solving the following system of five equations:19
20
21
22
23
Symmetrized Systems
There is a difference between systems (13) and (16). System (13) is entirely expressed using elementary symmetric polynomials in , whereas there is no obvious symmetry in system (16). This difference is clearly visible in the example of model M9: equations (19), (20), and (21) are symmetrized, while equations (22) and (23) are not symmetric in and .
We show here that the system formed by (13) and (16) is equivalent to a system symmetrized in both and . The advantage of a symmetrized system over a non-symmetrized one is that it handles simpler formulas, decreasing the computational burden of the symbolic tools.
To this aim we use the following identities (due to Jacobi-Trudi (Fulton and Harris, 1991)) that are valid for any distinct N numbers ():
24
where is the complete symmetric polynomial of degree in N variables and . More precisely, the complete symmetric polynomials in the N variables have the form25
In order to relate (16) and (24) we first relate the coefficients to eigenvalues.Denote by the determinant of the submatrix of the matrix obtained by deleting its s-th row and i-th column, for . Because elements of are linear combinations with integer coefficients of and components of , , where is the ring of polynomials with integer coefficients in , and .
From the definition of , it follows for that and that the (leading) coefficient of at the monomial equals 1, while for we have . Thus, we have
26
27
Lemma 1
Assume that for and that and for . Then
28
and29
where are the eigenvector components, and form the unique solution of the systemProof
If , then, using Cramer’s rule for the system (14) with , we findTherefore, it holdswhilewhere are taken from (29), due to (24) and to the fact that the degree in of is and is smaller than for and , respectively.
The structure of the equations (16) suggests a natural way to symmetrize them. For we define to be the sum of . Using (16),(28) and (29) we obtain for this sum the formula
30
We can distinguish two cases:i) . In this case, according to (26) has degree in . Using (30) and the Jacobi-Trudi identities (24) we find the following symmetrized equations:
31
The inverse problem is thus equivalent to solving the symmetrized equations in (13) and (31) and computing the transition rates as functions of the symmetric polynomials and . We should note that the complete symmetric polynomials can be expressed using the elementary symmetric polynomials , that is , and so on.ii) . In this case, according to (27) has degree at most . Using (30) and (24) it follows that
32
This means that the inverse problem is not well posed in this case. The parameters , are no longer independent, they need to satisfy the constraint (32). Furthermore, we can no longer determine all the transition rate parameters uniquely. There remain equations that can be used to constrain the transition rate parameters33
In this case the symmetrized system made of (13),(33) can be used to compute the transition rates as functions of the symmetric polynomials and and of one or several indeterminate transition rates.While the examples in this paper assume , the case is also relevant in practice. In this case, additional data such as state occupancy probabilities and lifetimes help to constrain the kinetic parameters by eliminating the remaining indeterminate variables. Symmetrized equations provide further constraints that must be solved alongside those from the supplementary data.
Symmetrized System for the Illustrative Example
For the illustrative example of the model M9, the matrix readsConsidering that the return state is we haveand the symmetrized equationsIf the return state is , we haveand the symmetrized equationsFinally, if the return state is , we haveand the symmetrized equations
[See PDF for image]
Fig. 3
Non ergodic models with return. In both examples the state 2 is not in the strongly connected component of the return state 3. a) the state 2 is absorbing, so the return time can be infinite with non-zero probability, meaning that the distribution is not of the phase-type. b) the state 2 cannot be reached from the return state 3, allowing the ergodic chain obtained by eliminating state the state 2 to be used to compute the phase-type distribution
Solvable Models
Different Markov chain models are distinguished by their transition graph defined as the directed graph with vertices and arcs .
Let us consider that this graph satisfies some conditions related to the underlying modeled process:
is reachable only from N.
From there is only one return arc, towards s, where .
For a directed graph G and its vertices v, w we denote if there is a path in G from v to w. We say that v, w are equivalent if . Equivalence classes of nodes are called strongly connected components (SCC). If G consists of a single SCC then we call Gstrongly connected. A Markov chain model is called ergodic if and only if its transition graph is strongly connected, meaning that every node can be reached from any other node through a directed path. Let us consider the SCC of the return state s, i.e. the smallest strongly connected sub-graph containing s. Non ergodic models contain nodes w outside this SCC. Two situations may arise for such nodes, as represented in Fig. 3. In Fig. 3a), but the converse is not true. With non-zero probability, the chain starting in s never returns to s. This situation does not lead to a phase-type distribution and will not be considered. In Fig. 3b), w can not be reached from s. In such a case, an ergodic model with fewer states, limited to the SCC of s, generates the phase-type distribution. For these reasons we can restrict our analysis to ergodic models only. We would like to know for which ergodic models the inverse problem has a unique solution, eventually up to transformation by discrete symmetries.
Definition 1
We say that an model is solvable if the following solutions are satisfied:
The transition graph is strongly connected.
The model has transition rate parameters (non-zero, non-diagonal elements of ).
The system made by the equations (13) and (16) or equivalently the system made by the equations (13) and (31) or the system (13),(33) has unique solutions up to transformations by discrete symmetries, on a open domain of dimension .
It is very difficult to obtain general sufficient conditions for solvability but we can state a necessary condition.
Proposition 2
A solvable model necessarily satisfies .
Proof
If , then (32) holds, making the survival function parameters independent. The inverse problem then involves at most independent equations for the kinetic parameters.
Remark 2
In Sect. 3.7 we relax this necessary condition to , where is the set of predecessors of in the transition graph.
Ergodic models also satisfy the following property that has been used in condition (7) and is needed for writing the solution (8).
Proposition 3
All ergodic models satisfy , where are the eigenvalues of .
Proof
Suppose that some . Then we haveSumming all the equations of the above system term by term, we obtain This implies that . Therefore, also satisfieswhere is obtained from by setting .
Note that satisfies the zero column-wise sum and is the dual generator of a reduced Markov chain obtained by deleting the state . Therefore is a steady state probability distribution of the reduced Markov chain with states and dual generator .
The full model, and consequently the reduced chain, are ergodic. Or, any ergodic Markov chain has a unique steady state distribution in which all states have non-zero probabilities, which contradicts .
The following Proposition guarantees that the system of equations (16), that are used to define the inverse problem, can be obtained for all ergodic models.
Proposition 4
Consider that G is strongly connected. Then the equation has solutions with for all .
Proof
With the notations of the Sect. 3.2, the polynomial has leading term and therefore is not identically zero (see (26)). Lemma 1 implies that there are eigenvector solutions with . Furthermore, for , are not identically zero because, otherwise, would also be zero. However, is an eigenvector of for the eigenvalue , representing a steady state of the Markov chain. Following the same argument as in the proof of Proposition 3, for ergodic chains, for all .
Figure 2 shows all the ergodic models with satisfying the conditions C1 and C2 and having non-zero transition rate parameters. The numbering of the models stems from the fact that there are nine models that satisfy the conditions, and among them, only five are ergodic.
Symmetrized Equations and Solvability Without the Condition C1
In this subsection, we lift the simplifying assumption that is reachable only from N and allow multiple states to lead to N. The independent kinetic parameters of the model are non-zero, non-diagonal elements of .
Let be the set of states that lead to (the predecessors of in the transition graph).
Then, (6), (10), and (30) becomewhere are defined as in Sect. 3.4.
The Vieta’s formulas (13), which provide the first N equations of the inverse problem, remain unchanged. The remaining symmetrized equations resulting from the eigenvectors are now more complex.
Like in Sect. 3.4, we have two cases:
i) .
In this case , and the argument in Proposition 2 holds: the model is not solvable. The remaining symmetrized equations resulting from eigenvectors read
34
where are defined as in Sect. 3.4.ii) .
In this case we have equations resulting from the eigenvectors:
35
where are defined as in Sect. 3.4.Solution of the Inverse Problem for the Unbranched Chain Model
[See PDF for image]
Fig. 4
A solvable model with an arbitrarily large number of states: the unbranched chain model. For this model, the solution of the inverse problem can be computed symbolically for any number of states N
In this section, we present an example of a solvable model with an arbitrary number of states and propose a method to compute the solutions of the inverse problem.
Consider now a Markov chain with states arranged in a line and reversibly connected, as illustrated in Fig. 4. For this modelThis model is a generalization, of arbitrary length N, of the model M2 represented in Fig. 2 that has . To ensure that the model is solvable, as stated in Proposition 2, we consider the return state to be .
The solution of the direct problem presented in the Sect. 2 provides an algebraic map1
36
The goal of this section is to prove that f is invertible and that its inverse is a rational map. This also means that the unbranched chain model is solvable for any N. However, as we will see later, the inverse map is not a rational function for all solvable models; for instance inversion formulas may involve radicals. From this perspective, the unbranched chain model is special and, in some ways, simpler.Let represent a vector, where are solutions of (15).
Note that (15), (31) imply that
37
38
where is the matrix with the columns and are the eigenvalues of .The following lemmas will provide an algorithmic approach to constructing the solution to the inverse problem.
Lemma 2
For a suitable rational function with rational coefficients and with a denominator it holdsfor .
Proof
For each eigenvalue of the matrix it holdsfor , considering that .
This provides by recursion on a rational function with rational coefficients such that
39
Moreover, the denominator of equals .Then for it holds
40
where denotes a diagonal matrix.The -th coordinate of the middle vector in (40) equals
41
The same -th coordinate of the right vector in (40) equals42
Observe that the j-th coordinate of the right vector in (40) vanishes for .Multiplying both sides of (39) for by and summing them up over , we obtain that
43
for a suitable rational function with rational coefficients and with a denominatortaking into account the equality of (41) and of (42).Lemma 3
For an appropriate rational function with rational coefficients and with a denominator it holds
44
Proof
The -th coordinate of the middle vector in (40) equals
45
The same -th coordinate of the right vector in (40) equals46
for an appropriate polynomial with integer coefficients.Multiplying both sides of (39) for by and summing them up over , we obtain thatfor a suitable rational function with rational coefficients and with a denominator taking into account the equality of (45) and of (46) (while the both latter multiplied by ).
Applying alternatingly Lemma 3 and Lemma 2 for consecutively, we conclude with the following main result of this section.
Theorem 1
There is an algorithm which produces rational functions with rational coefficients such that
Remark 3
Together with (38) Theorem 1 assures the inverse rational map to f (see (36)) on the open dense subset of determined by conditions and ;
the proof of Theorem 1 provides an algorithm which produces explicitly rational functions by recursion on .
Remark 4
In fact, for any model (not necessarily, the unbranched chain model elaborated in this section) one can consider a rational map similar to the one defined by (36). Thus, are rational functions in . Therefore, there is a field extensionwhere denotes the field generated by over the field of complex numbers. It is known (see e.g. (Shafarevich, 1972), Ch. 1) that the degree of this extension equals the number of solutions in of the system of rational equations at a generic point . Recall that the degree is defined as the dimension of the vector space over the field . The degree can be infinite. Theorem 1 states that for the unbranched chain model the degree equals 1. This provides a rigorous algebraic reformulation of the statement that the inverse problem is in some ways simpler for the unbranched chain model.
We conjecture that for all other models the degree is greater than 1.
Using the Thomas Decomposition for Solving the Inverse Problem
Thomas decomposition is a computer algebra algorithm that decomposes systems of polynomial equations and inequations into simpler systems. These can be more easily tackled by iteratively solving univariate polynomial equations. We apply this technique to compute the solutions of the inverse problem. The method is illustrated by solving the inverse problem for all ergodic models with but can also be applied to larger N. Additionally, the method helps determine whether a model is solvable.
The Thomas Decomposition of an Algebraic System
In this section we introduce briefly the notion of the algebraic Thomas decomposition (see the appendix of (Lange-Hegermann et al., 2021) for a similar introduction). We will then apply the algebraic Thomas decomposition in the subsequent sections to our symmetrized systems to determine their solutions.
Let be a polynomial ring in n variables over the complex numbers . An algebraic system is defined as a finite set of polynomial equations and inequations, that is as the set
47
with polynomials , in and integers r, . The solution set of the algebraic system (47) is defined as the set of all satisfying the equations and inequations of , that is asTo fix ideas, we will use terminology from algebraic geometry, but this can be skipped without any loss by non-specialist readers. In algebraic geometry, Zariski closed sets, also called varieties, are sets of solutions of systems of polynomial equations. Zariski open sets are complements of closed sets (thus sets of solutions of systems of polynomial inequations) and Zariski locally closed sets are intersections between open and closed sets.Geometrically, is the difference of the two varietiesand so it is a locally Zariski closed subset of .
In order to introduce the notion of an algebraic Thomas decomposition of a system we make the following definitions. On the variables of our polynomial ring we define a total ordering (sometimes also called a ranking) by setting for . With respect to this ranking the leader of a non-constant polynomial is defined as the greatest variable appearing in . In case is a constant polynomial, we set . If we consider every polynomial as a univariate polynomial in its leader, say , then the coefficients of as a polynomial in are polynomials in . The coefficient of the highest power of in is called the initial of which we denote by . The separant of a polynomial is defined as the partial derivative of with respect to its leader.
Definition 2
Let be the algebraic system of (47). Then is called a simple algebraic system with respect to a ranking, if the following conditions are satisfied:
The leaders of all equations and inequations are pairwise different, i.e. we have This property is called triangularity.
For every the equation has no solution in . We call this property non-vanishing initials.
For every the equation has no solution in . This is called square-freeness.
The advantage of a simple algebraic system is that one can obtain its solution set by iteratively solving univariate polynomials. This is a consequence of the triangularity of a simple algebraic system. Indeed, the triangularity implies that there is at most one equation with leader or at most one inequation with leader . Note that or respectively is a univariate polynomial in . The square-freeness implies that the number of zeros in of (respectively ) is equal the degree of (respectively ). In case of the equation , any root of can be chosen as the first coordinate of a solution of . In case of the inequation , all elements of except for the roots of can here be chosen as the first coordinate of a solution. If there is no equation or inequation with leader , then the first coordinate is free, that is can be chosen arbitrary in . Now we make the first iteration step. Again by triangularity there is at most one equation or inequation with leader . If there is an equation or inequation with leader , then we substitute for in this equation or inequation and obtain so a univariate polynomial in . Condition 2 of Definition 2 guarantees that the degree of the so obtained polynomial is independent of the choice of and the square-freeness implies that the number of roots of this polynomial is equal to its degree. According to the three possible cases, that is there is an equation, inequation or neither of them, we determine as described above for the second coordinate of a solution of . An iteration of this process yields successively a solution of . Moreover, any solution of the simple algebraic system can be obtained by this process.
Definition 3
A Thomas decomposition of an algebraic system as in (47) consists of finitely many simple algebraic systems such that is the disjoint union of .
It was proved by Thomas in (Thomas, 1962; Thomas, 1937) that any algebraic system has a Thomas decomposition which is in general not unique. A Thomas decomposition can be determined algorithmically (see (Bächler et al., 2012)) and there is an implementation in Maple. A description of the implementation can be found in (Bächler and Lange-Hegermann, 2008-2012; Gerdt et al., 2019).
We will compute in the subsequent sections a Thomas decomposition of the symmetrized systems M2, M3, M4, M8 and M9 () by applying the Maple implementation to them. To this end we need to define a ranking on the polynomial ringTo simplify notation we collect the variables into and . Since we want to solve the symmetrized systems for the variables , we rank them always higher than the collection of variables . Among the variables we sometimes change the ranking for the different symmetrized systems. This is done to minimize the output of the Thomas decomposition and has no deeper meeting. The ranking of the variables implies that each simple algebraic system returned by the Thomas decomposition yields solutions for the variables which are only valid for solutions of the variables satisfying equation and inequation conditions in over .
A Thomas Decomposition for Model 2
The model studied here is a particular case of the one from Sect. 4 and Theorem 1. We are going to determine the solutions of the algebraic system of equationsTo this end we compute with Maple an algebraic Thomas decomposition of with respect to the rankingMaple returns eleven simple systems . We will only compute here the solutions in for the first simple systems, since it the most generic one, meaning that the solutions are valid for specializations of the variables satisfying only inequations, i.e the solutions for are valid over an Zariski open subset of . The first simple system consists of the equations and inequationsOne easily checks that the last three inequations do not involve any variable . Thus the inequations
48
define the above described Zariski open subset. Solving now successively the remaining equations for the variables (they are all linear in their respective leaders), we obtain the solutionswhich are only valid for those satisfying the inequations in (48). If one is now interested in a solutions for with respect to a which does not satisfy the inequations of (48), then there is a simple system among , which one can solve successively for as described in the previous subsection. The remaining simple systems can be found in Sect. A.1 of the appendix.The conditions (48) do not contain equalities, therefore the inverse problem has solutions on an open domain of dimension 5. According to the Sect. 3.6 this means that the model M2 is solvable.
A Thomas Decomposition for Model 3
The symmetrized algebraic system for model 3 isand we compute an algebraic Thomas decomposition for it with respect to the rankingMaple returns 10 simple systems . One easily checks by comparing the number of equations only involving the variables (see appendix A.2 for the remaining simple systems ), that the first simple systemis the most generic one. Here, even in the most generic case, the solutions for are only valid for lying on a Zariski locally closed subset defined by
49
Solving again the remaining equations successively for , , , , , we obtain the solutionswhere the expression for and depend on the choice made for .The conditions (49) contain one equality, therefore the inverse problem has solutions on an open domain of dimension 4. Furthermore, the set of solutions is infinite. According to the Sect. 3.6 this means that the model M3 is not solvable.
A Thomas Decomposition for Model 4
In case of model 4 the symmetrized algebraic system isUsing the rankingthe Maple implementation of algebraic Thomas decomposition returns 21 simple systems (see Sect. A.3 of the appendix for the systems ). The first simple systemis the most generic one. Indeed, all inequations appearing in involve only the variables , namelyand so the solutions for are valid for all of the Zariski open subset defined by these inequations. We can now successively solve the remaining equations in for the variables , , , and , where the equations for , , , are all linear in their respective leaders except for the equation with leader ,which is quadratic. Since we want to consider only real roots for , we require that the discriminant is positive, that is we change the inequation into . Solving successively the equations with leader , , , , using for the quadratic formula, we obtain the solutionssubject to the conditions
50
Note that the solution for depends on the choice of the root .The generic simple system provides a finite set of solutions of the inverse problem on the open domain of dimension 5 defined by (50). According to the Sect. 3.6 this means that the model M4 is solvable.
A Thomas Decomposition for Model 8
In case of model 8 the symmetrized algebraic system isWe compute the algebraic Thomas decomposition of with respect to the rankingand obtain from Maple twelve simple systems . One easily checks by comparing the number of equations which have leader one of the variables of , that the first simple systemis the most generic one. Analogously as in case of model 4 one determines the solutionsof subject to the conditions
51
Note that here the solution for , and depend on the choice of the root and that we changed the inequation for the discriminant into to guarantee that the roots are real. The remaining simple systems are presented in Sect. A.4 of the appendix.The generic simple system provides a finite set of solutions of the inverse problem on the open domain of dimension 5 defined by (51). According to the Sect. 3.6 this means that the model M8 is solvable.
A Thomas Decomposition for Model 9
We compute an algebraic Thomas decomposition of the algebraic system
52
for model 9 with respect to the rankingThe Maple implementation returns 15 simple systems , where the most general simple system isSimilar as for model 4 and 8 one determines the solutions53
subject to the conditions54
Note that the solutions for , , depend on the choice of the root and to guarantee real roots we changed the inequation representing the discriminant into . The remaining 14 simple algebraic systems can be found in Sect. A.5 of the appendix.The generic simple system provides a finite set of solutions of the inverse problem on the open domain of dimension 5 defined by (54). According to the Sect. 3.6 this means that the model M9 is solvable.
Model Degeneracy Effects
Several examples were used elsewhere (O’Cinneide, 1989) to illustrate the model degeneracy of the phase-type distribution. Indeed, given a phase-type distribution, one can have two or more distinct representations in terms of the same chain but with different rate parameters, and a Markov chain may provide representations of all distributions represented by a different chain. In this section, we explore the degeneracy of multi-exponential phase-type distributions and illustrate this phenomenon with examples significant in biological applications.
Discrimination Between Variant Models
For a fixed number of states N, there are several variant models that predict exactly the same phase-type distribution and fit the data equally well. The variant models can be generated automatically. For instance in Fig. 2 we list, up to permutation symmetries of the transition graphs, all the models that produce three exponential () phase-type distributions. Among these models, M2, M4, M8, and M9 are solvable. One can not discriminate between any two of the four solvable models by using only time-to-event data. The non-solvable model M3 does not generate any multi-exponential phase-type distribution, but only those distributions whose parameters satisfy (49). Additionally, the inverse problem has an infinite set of solutions for the model M3. Although in practice we favor solvable models because in this case the inverse problem is well posed, ultimately only experimental results should be used to exclude non-solvable variants.
Mathematically, it is interesting to classify solvable models. One classification can be performed, based on the number of solutions. For instance in Fig. 2 the inverse problem has a unique solution for model M2, whereas it has two distinct solutions for models M4, M8, M9. Model M9 is obviously symmetric with respect to permutation of the vertices 1 and 2, which explains the two-fold degeneracy of the solutions, but M4 and M8 have no obvious permutation symmetries. It is therefore interesting to identify possible algebraic relations relating these three models together. These relations will be discussed in the next subsection.
To identify model variants experimentally, we aim to evaluate the potential of additional data, beyond time-to-event measurements, to differentiate between models. Two natural candidates for these data are the steady-state occupation probabilities of the Markov chain states and the lifetime of each state, defined as the average time spent in a state before transitioning. Depending on the application, experimental methods are available to estimate these quantities. We will discuss some of these methods in the Sect. 7 dedicated to transcriptional bursting.
Symmetries and Classification of Solvable Models
Solvable models that generate the same phase-type distribution must satisfy certain common conditions, which we outline in this section. These conditions allow us to identify mappings that relate the parameters of one solvable model to those of another.
Let us consider the matrix that is obtained from by setting to zero. This matrix is the dual generator of a reduced Markov chain describing the transitions between the states . Then the steady state probabilities of the states can be computed as solutions of the system:
55
Let be the lifetime of the state i, where . This can be calculated from the diagonal elements of the matrix :56
The result of these calculations for is given in the Table 1.Table 1. Reciprocal lifetimes and steady state occupation probability for the solvable models M2, M4, M8, M9
Model | ||||
|---|---|---|---|---|
M2 | ||||
M4 | ||||
M8 | ||||
M9 |
The steady state probabilities and lifetimes allow us to formulate some general constraints on solvable models as follows
Proposition 5
All solvable models that generate the same phase-type distribution have
the same value of ,
the same value of the lifetime of the state N,
the same value of the steady state probability of the state N.
Proof
(i) is a direct consequence of (31). From (31) and (13) it follows:From the definitions of and and elementary properties of the determinant we findIt followswhich implies (ii).
Let be the mean value of the phase-type distribution. The mean number of events on an interval [0, T] is . The same number is equal to because the model goes to only from the state N with an intensity . It follows thatfor all models. We already know that is the same for all models, therefore (iii) follows.
As can be seen in Fig. 2, M9 is a symmetrical model. The transformation defined by , that is equivalent to a permutation of the vertex labels 1 and 2, leaves the inverse problem (52) invariant and therefore transforms any solution of the problem into another solution. As , where e is the identity, the symmetry group is isomorphic to . Solutions that are not symmetric are then two-fold degenerated.
In order to see the symmetry of the solutions we rewrite (53) as
57
(57) shows that each of the two solutions of the inverse problem is not symmetric, whereas the set of two solutions is symmetric with respect to that transforms one solution into the other.The models M4 and M8 have no symmetry of the transition graphs and no obvious algebraic symmetry of the inverse problem. However, the following property shows that these models are tightly related to M9.
Proposition 6
There are bijective rational maps , relating parameters of model M to parameters of the model (i.e. ) where are any pair in the list , and are open domains containing all the real positive parameters resulting from the inverse problem of . satisfy the following constraints
The parameter is the same for all the models.
The lifetimes of the states 1, 2, 3 is the same for all the models.
Under steady state conditions, the probability to be in the state 3 is the same for all the models.
Proof
Consider that there are mappings relating parameters of solvable models that have the same phase-type distribution, in particular the same parameters of the phase type distributions. Then, (i) and (ii) follow from the Proposition 5. The same Proposition implies that is the same for all models. Longer, but elementary calculations using the solutions of the inverse problem obtained in the section allow us to check that and are the same for all the models in the list M4,M8,M9. However, this is no true if we include M2 in the list (this model has different values of and ). Reciprocally, if we impose (i), (ii), (iii) one finds four equations relating and . These equations, if they have solutions, define the map . As a matter of fact, solutions exist and are easy to compute. For instance if are the parameters of M9 corresponding to the parameters of M8, then using the Table 1 the conditions (ii), 6 read
58
The solutions of system (58) read59
and define the bijective rational map .Similarly, solving
60
with respect to leads to the equations61
that define the map .Remark 5
The Proposition 6 explains the two-fold degeneracy of the solutions of the inverse problem for models M4,M8. Indeed the solutions of M4 read as and where is a solution of M9.
Remark 6
The Proposition 6 allows to compute the symmetries of the models M8 and M4. The symmetry groups are isomorphic to and generated by , and , respectively.
Remark 7
Of course, there are also mappings that relate the parameters of models M4, M8, or M9 to those of model M2, resulting in the same phase-type distribution; however, these mappings are many-to-one. This is consistent with the remark in Sect. 4 that the degree of the field extension from the field generated by survival function parameters to the field generated by kinetic parameters is one for M2 and larger than one for the other models.
[See PDF for image]
Fig. 5
Experimental setting for identifying phase-type distributions and Markov chain models of transcriptional bursting. A Transcription requires multiple preceding steps such as chromatin opening and formation of transcription complexes needed for activation of the polymerase. These steps can be modeled as transitions between discrete steps. B Fluorescent tagging of mRNA structures allow observation of mRNA synthesis. The observed signal is deconvolved to obtain the transcription initiation events. C The phase-type distribution is used to infer the Markov chain model
Application to Transcriptional Bursting Models
Biological Considerations and Assumptions
Transcription is the fundamental molecular process that copies the DNA into RNA. The synthesis of RNA requires a dedicated molecular machinery, including an RNA polymerase (RNAP) enzyme that binds the DNA in specific regulatory regions able to activate this process, called promoter regions. Transcription requires multiple preceding steps such as chromatin opening and formation of transcription complexes needed for activation of the polymerase. Here we consider that transcription has a small number of limiting steps that can be modeled as transition between discrete states (Figure 5 A and Table 2). We classify these states in three categories: productive ON states that can initiate transcription, non-productive OFF states that are unable to initiate or resume transcription, and paused states PAUSE in which initiated transcription halts and can resume later or abort.
Table 2. Possible biological interpretation of the states in the models M2, M3, M4, M8, M9. For M4, M8, M9 pausing is always abortive
Possible state types | ||||
|---|---|---|---|---|
Model | state 1 | state 2 | state 3 | state 4 |
M2 | OFF | OFF | ON | EL |
M3 | OFF | OFF | ON | EL |
M4 | PAUSE /OFF | OFF | ON | EL |
M8 | PAUSE /OFF | OFF | ON | EL |
M9 | PAUSE /OFF | OFF/PAUSE | ON | EL |
Possible biological interpretation of state types | ||||
long OFF | Enhancer-Promoter disconnection / Chromatin closing | |||
short OFF | Transcription Factor unbinding / Pre-initiation Complex disassembly | |||
PAUSE | Promoter proximal pausing / Intrinsic pausing / Transcriptional roadblock | |||
The promoter starts transcription in the productive ON state, when it can trigger several departures of RNAP molecules. After ON, the RNAP can eventually stop in PAUSE or commit to irreversible productive elongation modeled by the state EL. Elongating polymerases move away from the promoter (to proceed into the gene body), thereby freeing the promoter that instantly enters either a new productive ON state or a non-productive state OFF.
Live imaging of transcription allows the detection of the synthesis of new mRNA molecules in real time and for each transcription site (Pichon et al., 2018). Using the tool BurstDeconv (Douaihy et al., 2023) we can deconvolve the live imaging signal and identify the time of each elongation event. This means that for each transcription site we observe the sequence of EL states (Figure 5B). The other states of the promoter are unobservable.
Strictly speaking, transcription initiation is not synonymous with the start of processive elongation, as transcription intiation can be abortive. Abortive transcription initiation events cannot be detected in our experimental settings. Therefore, in this paper, whenever we refer to transcription initiation, we consider the processive case.
We also consider that the state EL (denoted in our abstract Markov chain model) can be reached only from a state that can be either ON or PAUSE, and after reaching EL the promoter instantly switches to a unique return state s, that can be either productive or non-productive. The second assumption is needed for applying the phase-type distribution formalism. The first assumption, not applying to the case of multiple ON states, was lifted in Sect. 3.7. Using this extension of our approach we can also address scenarios in which transcription occurs simultaneously on duplicate DNA templates, as for example newly replicated sister chromatids.
Discriminating Between Variant Models Using State Probabilities and Lifetimes
As discussed in Sect. 6, several different models can lead to the same phase-type distribution. For instance, three exponential phase-type distribution data can have the same likelihood for models M2, M4, M8, M9. Furthermore, the states of these models can be interpreted in various ways. A possible interpretation is provided in the Table 2.
In this subsection we test the possibility to discriminate between solvable variant models using measurements of state lifetimes or probabilities. The probability that a given state is occupied can be estimated from experimantal data such as single molecule DNA footprinting assays (Krebs, 2021; Kleinendorst et al., 2021).
Proposition 6 shows that state lifetimes can not discriminate between models M4,M8,M9.
Additionally, Proposition 5 shows that for any N, the occupancy probability and the lifetime are the same for all models with N states and the same phase-type distribution.
After these considerations, for we remain with three independent parameters as potential markers, discriminating between variant solvable models, except for models M4,M8,M9 where the only discriminating parameter could be .
To evaluate the capacity of these markers to distinguish between variant models we generate a dataset consisting of random parameters of the phase-type distribution . For each set of phase-type distribution parameters we use the solutions of the inverse problem to compute the transition rate parameters of the models M2,M4,M8,M9, excluding cases leading to negative rate parameters. From the rate parameters we compute the values of , , and the corresponding differences , , between the maximum and minimum values among the four variant models. Finally, we use these differences to estimate the capacity of such marker values to discriminate between variants.
We found that can discriminate between variants in of the cases, whereas can discriminate between variants in of the cases, see Fig. 6. We have used random phase-type distribution parameters spanning domain values observed in studies of transcriptional bursting of mammalian and fruit-fly genes (Tantale et al., 2021; Pimmett et al., 2021). This suggests that is better for discriminating among variant models than and . This is confirmed also for the particular case of a HIV gene promoter in mammalian cells studied in (Tantale et al., 2021), where and are practically the same for all experimental conditions for the 3 variants, see Fig. 7 below.
[See PDF for image]
Fig. 6
Testing the discrimination capacity of , , and : We generated 100, 000 random parameter sets , where and are uniformly distributed in [0, 1], , , and are log-uniformly distributed in . From these, 61,411 sets were retained for which the inverse problem yielded positive rate parameters in at least one model (M2, M4, M8, or M9). Discrimination capacity for , , and is measured by the variations , , and across models with the same phase-type distribution parameters, where larger variations indicate better discrimination. The figures show frequency histograms of these variations, with no discrimination if the variation is zero. Results show that , , and fail to discriminate in , , and of the cases, respectively
[See PDF for image]
Fig. 7
Values of and for three phase-type distributions corresponding to three experiments on cell lines infected with HIV. Increasing the level of the protein tat corresponds to decreasing the durations , of the OFF states (see (Tantale et al., 2021)). For these cells, under no-tat, low-tat, and high-tat conditions, the probabilities , can discriminate between variant models, but the lifetimes , cannot
Conclusion and Perspectives
In contrast to statistical inference, which treats one model at a time, using symbolic solutions for the inverse phase-type distribution problem allows us to simultaneously infer all solvable models. Once the parameters of the phase-type distributions are estimated, our formulas provide the transition rate parameters for all solvable models. Thomas decomposition can be used to solve the inverse problem symbolically for models with a small number of states (we provide results for but we tested the method also for ).
The algebraic structure of the inverse problem for phase-type distributions leads us to a classification of Markov chain models compatible with the data. Some models are solvable, while others are not. Among the solvable three-state models, we discovered relationships stemming from the discrete symmetries of the inverse problem. Extending these results to an arbitrary number of states requires different mathematical tools, which we will address in future work.
We identified examples of solvable models with an arbitrary number of states and established necessary conditions for their solvability. While sufficient conditions for solvability remain to be found, this task presents an exciting challenge for future research.
In practical applications, Thomas decomposition provides formal solutions of the inverse problem. However, if a model has a solution set with a dimension greater than zero, or if multiple models have solutions, degeneracy arises, preventing unique identification from the experimental phase-type distribution. In such cases, additional data, such as state occupancy probabilities or state lifetimes, is needed. To achieve unique solutions, equations (13) and (31) (for or (13) and (33) (for ) must be combined with equations (55) and (56) that relate state occupancies and lifetimes to kinetic parameters.
However, limitations may arise from the complexity of the problem. Although models and parameters can theoretically be distinguished, many combinations may produce similar results, leading to practical degeneracy and overfitting. Some practical recipes to penalize complexity and avoid overfitting have been proposed in (Douaihy et al., 2023).
We applied our approach to the study of transcriptional bursting in two different biological context, human cultured cells (Tantale et al., 2021) and Drosophila developing embryos (Pimmett et al., 2021).
By mapping the parameters of the phase-type distribution to kinetic model parameters, we can identify mechanisms impacting transcriptional bursting from imaging datasets. This concept led to the development of BurstDeconv, a tool for analyzing transcriptional bursting data (Douaihy et al., 2023). Utilizing Thomas decomposition, the computations presented here have enabled us to extend the range of models that BurstDeconv can analyze. Currently, BurstDeconv supports only the two-state random telegraph and the three-state M2 and M9 models. Additionally, we demonstrated that variant models indistinguishable by time-to-event data can be discriminated using state occupancy probabilities.
In contrast to BurstDeconv, which only addresses transcription models with a single ON state, the results in this paper also apply to models with multiple ON states. This broader approach enables the analysis of phenomena such as sister chromatids (homologous DNA sequences from replication that can undergo transcription) where transcription sites are considered to have multiple active ON states. In this case the symmetrized equations to solve are (35) (when the return state is an ON state) and (34) (when the return state is not an ON state).
While our paper primarily centers on gene transcription applications, it’s important to note that the phase-type distribution models discussed herein possess a broader range of potential uses. The most immediate of them concern the next steps of gene expression. Monitoring RNA translation initiation events as time-to-event data is feasible by using techniques such as SunTag (Tanenbaum et al., 2014; Basyuk et al., 2021; Blake et al., 2024), and the method presented here can also be applied to reverse engineer translation regulation mechanisms. Applying our multi-exponential models in epidemiology and medical research could also prove intriguing, as it may help discern disease stages and the trajectories of progression (Liquet et al., 2012).
Acknowledgements
OR and ED are founded by ANRS (ANRS0068). MD is supported by a CNRS Prime80 grant to ML and OR. ML is sponsored by CNRS and ERC (SyncDev and LightRNA2Prot). We thank Werner Seiler for engaging and insightful discussions, and John Reinitz for critical reading the manuscript.
Data Availability
The implementation of the Thomas decomposition in Maple is available at https://github.com/Computational-Systems-Biology-LPHI/Thomas$$\_$$_Decomposition/.
although the rate parameters are real positive numbers, for formal reasons we define this map on complex numbers.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
Asanjarani, A; Liquet, B; Nazarathy, Y. Estimation of semi-Markov multi-state models: a comparison of the sojourn times and transition intensities approaches. Int J Biostat; 2021; 18,
Bokes, P; Borri, A; Palumbo, P; Singh, A. Mixture distributions in a stochastic gene expression model with delayed feedback: a WKB approximation approach. J Math Biol; 2020; 81,
Bellec, M; Dufourt, J; Hunt, G; Lenden-Hasse, H; Trullo, A; ZineEl Aabidine, A; Lamarque, M; Gaskill, MM; Faure-Gautron, H; Mannervik, M; Harrison, MM; Andrau, J-C; Favard, C; Radulescu, O; Lagha, M. The control of transcriptional memory by stable mitotic bookmarking. Nat Commun; 2022; 13,
Blake LA, De La Cruz A, Wu B: Imaging spatiotemporal translation regulation in vivo. In: Seminars in cell and developmental biology, vol. 154, pp. 155–164 (2024). Elsevier
Bächler, T; Gerdt, VP; Lange-Hegermann, M; Robertz, D. Algorithmic Thomas decomposition of algebraic and differential systems. J Symbolic Comput; 2012; 47,
Bladt, M. A review on phase-type distributions and their use in risk theory. ASTIN Bull J IAA; 2005; 35,
Bächler T, Lange-Hegermann M (2008) AlgebraicThomas and DifferentialThomas: Thomas decomposition for algebraic and differential systems. (2008–2012). (https://www.art.rwth-aachen.de/cms/MATHB/Forschung/Mathematische-Software/~lqnwi/)
Bharucha-Reid, AT. Elements of the theory of markov processes and their applications; 1997; Mineola, NY, Dover:
Basyuk E, Rage F, Bertrand E (2021) RNA transport from transcription to localized translation: a single molecule perspective. RNA Biol 18(9):1221–1237
Commault C, Chemla JP (1996) An invariant of representations of phase-type distributions and some applications. J Appl Prob 33(2):368–381. https://doi.org/10.2307/3215060
Commault C, Mocanu S (2003) Phase-type distributions and representations: some results and open problems for system theory. Int J Control 76(6):566–580. https://doi.org/10.1080/0020717031000114986. Publisher: Taylor & Francis _eprint:10.1080/0020717031000114986
Damour, A; Slaninova, V; Radulescu, O; Bertrand, E; Basyuk, E. lTranscriptional stochasticity as a key aspect of HIV-1 latency. Viruses; 2023; 15,
Dufourt, J; Trullo, A; Hunter, J; Fernandez, C; Lazaro, J; Dejean, M; Morales, L; Nait-Amer, S; Schulz, KN; Harrison, MM; Favard, C; Radulescu, O; Lagha, M. Temporal control of gene expression by the pioneer factor Zelda through transient interactions in hubs. Nat Commun; 2018; 9,
Douaihy, M; Topno, R; Lagha, M; Bertrand, E; Radulescu, O. BurstDECONV: a signal deconvolution method to uncover mechanisms of transcriptional bursting in live cells. Nucleic Acids Res; 2023; 51,
Dufresne, D. Fitting combinations of exponentials to probability distributions. Appl Stochast Mod Bus Ind; 2007; 23,
Fackrell, M. Modelling healthcare systems with phase-type distributions. Health Care Manage Sci; 2009; 12, pp. 11-26. [DOI: https://dx.doi.org/10.1007/s10729-008-9070-y]
Faddy, M. (1993) A structured compartmental model for drug kinetics. Biometrics, pp. 243–248
Feller, W. An introduction to probability theory and its applications; 1966; Hoboken, NJ, John Wiley & Sons:
Fulton, W; Harris, J. Representation theory a first course; 1991; New York, Springer:
Gerdt, VP; Lange-Hegermann, M; Robertz, D. The maple package TDDS for computing thomas decompositions of systems of nonlinear PDEs. Comp Phys Commun; 2019; 234, pp. 202-215.3864422 [DOI: https://dx.doi.org/10.1016/j.cpc.2018.07.025]
Hössjer O, Bechly G, Gauger A (2018) Phase-type distribution approximations of the waiting time until coordinated mutations get fixed in a population. In: Silvestrov S, Malyarenko A, Rančić M (eds) Stochastic processes and applications, pp 245–313. Springer, New York. https://doi.org/10.1007/978-3-030-02825-1_12
Hobolth, A; Siri-Jégousse, A; Bladt, M. Phase-type distributions in population genetics. Theor Popul Biol; 2019; 127, pp. 16-32. [DOI: https://dx.doi.org/10.1016/j.tpb.2019.02.001]
Kleinendorst, RW; Barzaghi, G; Smith, ML; Zaugg, JB; Krebs, AR. Genome-wide quantification of transcription factor binding at single-DNA-molecule resolution using methyl-transferase footprinting. Nat Protocols; 2021; 16,
Kumar N, Kulkarni RV (2019) Constraining the complexity of promoter dynamics using fluctuations in gene expression. Phys Biol 17(1):015001. https://doi.org/10.1088/1478-3975/ab4e57.Publisher:IOPPublishing
Krebs, AR. Studying transcription factor function in the genome at molecular resolution. Trends Genet; 2021; 37,
Kumar N, Singh A, Kulkarni RV (2015) Transcriptional bursting in gene expression: analytical results for general stochastic models. PLOS Comput Biol 11(10):1004292. https://doi.org/10.1371/journal.pcbi.1004292.Publisher:PublicLibraryofScience
Lange-Hegermann, M; Robertz, D; Seiler, WM; Seiß, M. Singularities of algebraic differential equations. Adv Appl Math; 2021; 131, 4305883 [DOI: https://dx.doi.org/10.1016/j.aam.2021.102266] 102266.
Liu Y (2022) Non-parametric bayesian inference with application to system biology. PhD thesis, Department of Statistics, University of Chicago. https://doi.org/10.6082/uchicago.3931
LaMar, DM; Kemper, P; Smith, GD. Reduction of calcium release site models via moment fitting of phase-type distributions. Phys Biol; 2011; 8,
Liquet B, Timsit J-F, Rondeau V (2012) Investigating hospital heterogeneity with a multi-state frailty model: application to nosocomial pneumonia disease in intensive care units. BMC Med Res Methodol 12(1):1–14
Maier RS (1991) The algebraic construction of phase-type distributions. Commun Stat Stochast Mod 7(4):573–602. https://doi.org/10.1080/15326349108807207. Publisher: Taylor & Francis _eprint:10.1080/15326349108807207
Moffitt JR, Bustamante C (2014) Extracting signal from noise: kinetic mechanisms from a Michaelis-Menten-like expression for enzymatic fluctuations. FEBS J 281(2):498–517. https://doi.org/10.1111/febs.12545. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/febs.12545
Meyn, SP; Tweedie, RL. Markov chains and stochastic stability; 2012; New York, Springer:
Neuts MF (1975) Probability distributions of phase type. Liber Amicorum Prof. Emeritus H, Florin
O’Cinneide CA (1989) On non-uniqueness of representations of phase-type distributions. Commun Stat Stochast Mod 5(2):247–259. https://doi.org/10.1080/15326348908807108. Publisher: Taylor & Francis _eprint:10.1080/15326348908807108
Pimmett, VL; Dejean, M; Fernandez, C; Trullo, A; Bertrand, E; Radulescu, O; Lagha, M. Quantitative imaging of transcription in living Drosophila embryos reveals the impact of core promoter motifs on promoter state dynamics. Nat Commun; 2021; 12,
Pichon, X; Lagha, M; Mueller, F; Bertrand, E. A growing toolbox to image gene expression in single cells: sensitive approaches for demanding challenges. Mol Cell; 2018; 71,
Shafarevich, I. Basic algebraic geometry 1; 1972; Heidelberg, Springer:
Soltani M, Vargas-Garcia CA, Antunes D, Singh A (2016) Intercellular variability in protein levels from stochastic expression and noisy cell cycle processes. PLOS Comput Biol 12(8):1004972. https://doi.org/10.1371/journal.pcbi.1004972. Publisher: Public Library of Science
Stone, K; Zwiggelaar, R; Jones, P; Mac Parthaláin, N. A systematic review of the prediction of hospital length of stay: towards a unified framework. PLOS Digital Health; 2022; 1,
Tantale, K; Garcia-Oliver, E; Robert, M-C; L’Hostis, A; Yang, Y; Tsanov, N; Topno, R; Gostan, T; Kozulic-Pirher, A; Basu-Shrivastava, M; Mukherjee, K; Slaninova, V; Andrau, J-C; Mueller, F; Basyuk, E; Radulescu, O; Bertrand, E. Stochastic pausing at latent HIV-1 promoters generates transcriptional bursting. Nat Commun; 2021; 12,
Tanenbaum, ME; Gilbert, LA; Qi, LS; Weissman, JS; Vale, RD. A protein-tagging system for signal amplification in gene expression and fluorescence imaging. Cell; 2014; 159,
Thomas, JM. Differential systems; 1937; New York, Colloquium Publications XXI. American Mathematical Society: [DOI: https://dx.doi.org/10.1090/coll/021]
Thomas, JM. Systems and roots; 1962; Richmond, VA, W. Byrd Press:
Zhang J, Chen A, Qiu H, Zhang J, Tian T, Zhou T (2024) Exact results for gene-expression models with general waiting-time distributions. Phys Rev E 109(2):024119. https://doi.org/10.1103/PhysRevE.109.024119.Publisher:AmericanPhysicalSociety
© The Author(s), under exclusive licence to the Society for Mathematical Biology 2024.