1. Introduction
The change-point problem was first proposed by Page [1]. It considers a model in which the distribution of the observed data changes abruptly at some point in time, which is common in biology [2], finance [3], literature [4] and epidemiology [5]. Change-point detection can be employed as a tool in time series segmentation. A typical reference in the field of study is [6]. Once a change-point is detected in a data sequence, it is used to split the data sequence into two segments so that both segments are modeled separately. On the other hand, from a practical point of view, behavior and policies can be adjusted based on changes in events of interest. So, it is very important to perform change-point detection.
There are mainly two problems in the change-point model: checking the existence of change points and estimating the positions of these change points. These issues have been studied in substantial literature. For example, see Sen and Srivastava [7] for the mean change in a normal distribution and Worsley [8] for a change in an exponential family using the maximum likelihood ratio method. Others include Bai [9] for the least squares estimate of mean shift in linear processes, Vexler [10] for the change-point problem in a linear regression model and Gombay [11] for the change-point in autoregressive time series, etc. See [12,13,14] for details.
A study has shown that most work on change-point problems has been done for continuous data [14]. In real life, however, many data are observed on a discrete scale. Common discrete distributions include binomial, multinomial and Poisson distributions. In this article, we consider the change-point problem in a multinomial sequence, which originated from the transcription of the Gospels [15]. The Lindisfarne Gospels were divided into several sections, assuming that only one author contributed to the writing of any section and that the sections written by any one author were continuous. The goal was to test whether a single author wrote the Gospels. The data may be the frequency of vocabulary or grammar used by the author of each section. In general, suppose that are K independent multinomial variables with parameter , where , , . On the point, there are experiments with m outcomes, and records the frequencies of m outcomes. We want to test
(1)
where is the true change-point, . If is rejected, we further estimate .To solve this problem, Wolfe and Chen [16] proposed several statistics based on the cumulative sum (CUSUM) method. Horváth and Serbinowska [17] used the maximum likelihood ratio and maximum chi-square statistic to test the existence of change points and derived their transformed limit distribution. Batsidis and Horváth [18] extended it and proposed a family of phi-divergence tests that involves broad statistics. Riba and Ginebra [19] performed a graphical exploration of a sequence of polynomial observations and found a break point. Note that they all assumed that the number of categories m is fixed.
In recent years, the rise of big data has made the high-dimensional change-point problem more important. Thus, it becomes urgent to consider high-dimensional multinomial data as the categories of one thing in life can be quite large, such as the type of stores selling certain items on a shopping platform and the type of illness of patients in an outpatient clinic during a day. In this paper, we consider problem (1) with m tending to infinity. Recently, Wang et al. [20] proposed a procedure based on Pearson’s Chi-square test under the above scenario. Their idea is to pre-divide categories into two based on their probability magnitudes and to use the original and modified Pearson’s Chi-square statistic for large and small categories, respectively. This pre-classification can balance sparse and dense signals, resulting in good statistical performance. So, here, we use the pre-classification idea to construct a test statistic for problem (1) with m tending to infinity.
Another tool used in this article is based on information entropy. Entropy, originally a concept in statistical physics, was introduced into information theory by Shannon [21]. It has been widely applied in change-point problems. Unakafov and Keller [22] used ordinal mode conditional entropy to detect change points. Ma and Sofronov [23] proposed a cross-entropy algorithm to estimate the number and positions of change points. Vexler and Gurevic [24] applied empirical likelihood method to change-point detection, in which the essence of empirical likelihood estimation is a density-based entropy estimation. Mutual information, denoted by MI, computed as the difference between entropy and conditional entropy, is popular in deep learning, for example, see [25,26]. In the area of machine learning, MI is similar to information gain, which is often used as a measure of the goodness of a step in an algorithm, such as the selection of node splitting in a tree. Therefore, MI is naturally used as a metric in event detection problems. Relevant works include [27,28], etc. We utilize the MI between data and their position in this paper, given that a large value of MI means a high probability that a change point occurs.
In this paper, we consider the offline change-point problem. We propose a test statistic based on the mutual information for the at most one change-point (AMOC) problem (1) with m tending to infinity as the sample size tends to infinity. We adopt the pre-classification idea in [20] here. The optimal change-point position can also be estimated by MI. We show that the proposed statistic has an asymptotic normal distribution under the null distribution, and the power of the test converges to one under the alternative hypothesis. Meanwhile, we point out the relationship between MI and the likelihood ratio. In fact, the proposed statistic is based on the likelihood ratio method. As is widely acknowledged, although there is no uniformly most powerful test for change-point detection in general [29,30], the test based on the likelihood ratio structure has a high power [31]. Simulation studies demonstrate the excellent power of the test based on the proposed statistic as well as the high accuracy of the estimation. The innovations we have made in this article are that we replace the Pearson Chi-square statistic in Wang et al. [20] with mutual information and achieve better performance in terms of power and accuracy compared to their method.
The remaining structure of this paper is as follows. In Section 2, we present the proposed test statistic and the estimation method of a change point. In Section 3, we provide simulation results. In Section 4, we illustrate the method with an example based on physical examination data. In Section 5, we conclude the paper with some remarks. The proofs of the theorems are given in Appendix A.
2. Methods
2.1. Entropy and Mutual Information
We first briefly introduce some concepts about entropy and mutual information.
Suppose that are the possible values taken by a random variable X, where u can be infinity. Let be the probability that . The Shannon entropy of X is defined as
(2)
when , define .Let Y be a random variable that takes values in , where v can be infinity. The conditional entropy of X given Y is defined as
(3)
where , are the joint probabilities of X and Y and the conditional probability of X given Y, respectively.Assume that X and Y are the same as in Definitions 1 and 2. The mutual information (MI) of X relative to Y is defined as
(4)
The entropy value is larger when the data distribution is more symmetric. On the contrary, when the data are skewed, they have a small entropy [32]. The conditional entropy measures how much uncertainty is eliminated in X by observing Y. Obviously, mutual information can be written as the difference between entropy and conditional entropy, that is, . It represents the average amount of information about X that can be gained or the amount of reduction of uncertainty in X by observing Y. , and it becomes zero if X and Y are independent of each other.
2.2. Pre-Classification
For multinomial data, when the number of categories m is large, it is sometimes not realistic to treat all categories equally. For example, of all the cities in China, only a few of them account for half of the economy, which means that the rest of the cities have a small average share. The well-known Pareto principle [33] that 20% of the population owns 80% of the wealth in society also illustrates this phenomenon. Therefore, it is reasonable to classify the categories with different orders of magnitude.
Consider problem (1), i.e.,
We denote under , and , under . Denote , . Note that .Similar to Wang et al. [20], let be a subset of such that , be a subset of such that , where is satisfying some conditions as . Let and , where the superscript c stands for the complement operator. Assume that and for some as . Let and . Then, m categories are divided into large and small orders of magnitude by denoted by A and B. A change from to might occur either in A or B.
Let be the component of in A for and be the component of in A. Let and be similarly defined. Then, the marginal distributions of and under the null assumption are
(5)
and(6)
In the next subsection, we construct a statistic built on the marginal distributions (5) and (6).
Here are some additional notations. Denote , , as the number of experiments in total, before and after time k, and , , as the number of successful trials in total, before and after time k. Let , , and be the corresponding frequencies.
For the data in A, let , , be the number of successful trials in total, before and after time k. Define , , and as the sum of successful trials in B of total, before, and after k. Let , , , , , , be the corresponding frequencies. Subscript S denotes the sum of frequencies. Similarly, we define , , , , , , , , , , , , . We illustrate some of the above notations in Table 1 in a more structured fashion.
2.3. Test Statistic
We use MI between the data and the location of the data to construct the statistic. For the data in A, the entropy is
(7)
The entropies in A before and after k are
(8)
and(9)
respectively.Denote the location of is before as the indicator function of the position of a sample relative to k. Note that, given the observations, by the independence. By Section 2.1, the MI between X and in A is
(10)
where is the conditional entropy of X given . Similarly, , where , and are defined similarly as in (7)–(9).The uncertainty of X given would reach the largest reduction if k is at the true break point ; hence, either or should be large. On the contrary, if the sequence is stable, the value of should be small for any .
Since A and B are unknown, in light of Wang et al. [20], we use to estimate A. Here, is some constant. As shown in [20], is a consistent estimator of A if satisfies certain assumptions. Let . Construct the test statistic
(11)
for (1). Summation and maximization are conducted respectively for the MI of and in . The first term in is the weighted log-likelihood ratio estimate, as pointed out after Lemma 1. The second term in is based on the maximum norm of MI. It is widely acknowledged that the max-norm test is more suitable for sparse and strong signals, see [34,35]. is a threshold for , which ensures that the second term in converges to zero under . is a large number. Note that the statistic in [20] is based on the Pearson Chi-square statistic. Since in reality, the frequencies of small categories might be zeros, the Pearson Chi-square statistic for is hence modified. The statistic presented here does not need to take into account the fact that a frequency may be zero, since by the definition of entropy, if . In order to study the properties of better, we first give a lemma about MI .Denote . Then, . It is also true by replacing A with B in all the subscripts, that is, .
Note that and in Lemma 1 are estimations of minus two log likelihood ratios for data in A and B when the change-point is at k. Therefore, the problem based on MI can be transformed into the problem based on likelihood ratios.
By Lemma 1, the second term in (11) is , and hence the existing limit theorems on likelihood ratios can be applied to it directly. The first term in (11) is , which has the form of a weighted log likelihood ratio estimation. In Appendix A, we show that it is only an infinitesimal quantity away from some CUSUM statistic [36] using Taylor expansion and then prove the asymptotic distribution of from related conclusions.
The sum of without weighting, , is closely related to the Shiryayev–Roberts procedure [37,38]. It uses as a statistic, where is the likelihood ratio when the change point is at k. It is widely applied to determine the best stopping criterion in sequential change-point monitoring (see, e.g., [39]). However, replacing unknown parameters in with their maximum likelihood estimation, which leads to in this paper, would result in a complex asymptotic analysis [40]. So, here we use the weighted version instead of .
Let denote the cardinality of any set A and denote the maximal cardinality of the set A. Assume that , and
-
(i)
as ,
-
(ii)
and as ,
-
(iii)
,
.
as , where , .
Theorem 1 shows that is asymptotically normally distributed under the null hypothesis. The condition (i) in Theorem 1 ensures the consistency of , which was also assumed in Theorem 1 of [20]. The condition () in Theorem 1 requires the threshold to be large enough in order to guarantee that converges to zero with probability one under the null hypothesis. Condition () means that every is much less than N. Next, we focus on the properties of the statistic under the alternative hypothesis.
Assume that the conditions (i)–() in Theorem 1 hold. Let , . Further assume that
-
(i)
as , where .
-
(ii)
as , , and there exist such that for as .
-
(iii)
for some ,
-
(iv)
for some ,
where is the critical value of the standard normal distribution at level α.
Theorem 2 establishes the consistency of the test under certain conditions when the probability in A or B changes. Condition (i) in Theorem 2 means that tends to infinity at a certain rate. It aims to ensure that tends to infinity when the parameters in A change. Condition () requires comparable sample sizes before and after the change point. The proofs of Theorem 1 and Theorem 2 are provided in Appendix A.
Once is rejected, we further use MI to estimate . If , then ; otherwise, . Numeric studies in the next section show that the power of the new statistic increases rapidly as the difference between the alternative hypothesis and the null hypothesis increases. At the same time, the precision of using pre-classification is also satisfactory.
3. Simulation
We conduct simulation experiments to assess the performance of the test procedures in empirical size, power and estimation in finite samples. All results are based on 1000 replications. We use R to obtain simulation results. The necessary R code is given in Appendix B.
To analyze the empirical size, we simulate multinomial data with parameter under the null hypothesis without break with reference to [20]. The first d probabilities are much greater than those of the latter. Hence, in reality, can be chosen as . Following [20], we use , where are the sorted values of . We consider different situations with the sample size K arranged from 50 to 500, and let in each situation. For simplicity, we fix , . For the formula of , we choose , according to the conditions in the above section. The simulation results with various combinations of (, d) are reported in Table 2. We observe that the empirical size of the test is 4.5–6.7%, which is thus around the nominal 5% level in different situations. Here, we show the case of . We also performed simulations for and found empirical values slightly higher than 5% (data not shown).
To evaluate the power of the test, the alternative hypotheses stipulate a single break in the data sequence. We first consider parameters of two forms:
(i).
(ii).
The weighted maximum likelihood ratio statistic where proposed by Horváth and Serbinowska [17];
The statistic in [20], in which , and , and in the simulations.
The results are summarized in Figure 1 for level . The size of L is on the high side, as seen from the curve at small s in Figure 1. The new test is very powerful, as evidenced by the rapid rate of convergence to 1 when s increases. In most cases, the empirical power of is larger than the other two for alternative hypothesis (i). For the alternative hypothesis (ii), the three statistics perform equally well. These results further show that our test has higher power to detect a change located in the middle of the sample than in the beginning while the power is also still high.
We also briefly investigate how well the change-point location is approximated by the estimator . We choose and as the change-point location. In Table 3 and Table 4, we report the mean and standard deviation of the absolute errors for the different choices of s and m under the alternative hypothesis (i) or (ii), respectively. We compare our estimate with the maximum likelihood ratio estimate and in [20].
The corresponding absolute errors in Table 3 and Table 4 underscore the considerable precision of , which improves when s is increased 0.3 from to 0.8. For the alternative (i), in almost all situations, is better than the other two competitors. Small changes (for example, s = 0.3) are found with greater difficulty by using and , while the precision of remains high. For the alternative hypothesis (ii), and have similar performance, and they are both slightly better than . Alternative Hypothesis (i): Assume that the large probability changes while the small probability remains the same. Alternative Hypothesis (ii): Assume that the small probability changes while the large probability remains the same. Under alternative hypothesis (i), our method has better performance than the other two methods, probably because entropy as a non-linear function can increase the difference between frequencies, and it is more pronounced when the difference is small (e.g., s = 0.3).
Finally, we simulate the power and estimation precision for alternative hypothesis (iii):
where parameters in A and B change simultaneously, which was not mentioned in [20]. We compare our statistic and with Q and in this case. The results are displayed in Figure 2, Table 5 and Table 6, from which we see that the power of is slightly better than that of Q, and the precision of is obviously higher than that of .4. Example
In this section, we use a data set to address the applicability of our method. The data concern the medical examination results of people working in Hefei’s financial sector (including banks and insurance companies) from 27 September 2017 to 25 August 2021, which includes each person’s age, gender, the date of examination and the disease detected. From the perspective of health analysis and disease prevention, it is thus important to understand the diseases in terms of how often they are detected.
Our goal is to test whether the proportion of people who have been diagnosed with some diseases change over time. After removing gender-specific diseases, we finally choose 210 diseases. Because in some weeks there is no person to have the examination, we eliminate those weeks and finally keep 173 weeks. Let be a 210-dimension vector with each component indicating the frequency of a certain disease detected during the time period. Then, there are vectors of dimension with outcomes.
Figure 3 shows the numbers of the top 30 diseases detected. The weekly sample size are provided in Figure 4. We find from Figure 3 that the numbers of the first six diseases, Fatty Liver (FL), Overweight (OW), Thyroid Nodule (TN), Pulmonary Nodule (PN), Hepatic Cyst (HC) and Thyroid Cyst (TC), were much higher than those of the other diseases. By calculation, their proportions were, respectively, 0.088, 0.086, 0.06, 0.06, 0.038 and 0.032, which accounted for 35.8% of all the detected diseases. Hence, we choose . The value of the statistic is , and hence the null hypothesis that there is no change in the proportions of diseases detected is rejected.
Because , we find that , corresponding to 27 December 2017. This suggests that the proportions of diseases detected vary before and after 2018. Table 7 displays the proportions of the first six diseases before and after 2018. The proportions of Overweight and Thyroid Nodule were the highest before 2018. However, after 2018, the proportion of Fatty Liver jumped to the highest, and the proportion of Pulmonary Nodule also increased significantly. A possible explanation is that some unexpected events lead to changes in people’s lifestyles, which lead to changes in the proportion of the population suffering from different diseases. For example, the start of the Sino–US trade war in early February 2018 led to a continuous decline in the price of China’s A-shares, which was the trigger for the change in the lifestyle of financial practitioners after 2018. The study into the proportions of people with different diseases in the financial sector can reveal which disease is on the rise in this sector, and hence proper recommendations can be made for disease prevention.
5. Conclusions
This paper develops a change-point test based on MI for multinomial data when the number of categories is comparable to the sample size. We show that under certain conditions, the proposed statistic is asymptotically normal under the null hypothesis and consistent under the alternative hypothesis. The simulation results suggest that the test based on the proposed statistic has a high power. The proposed inference procedures are used to analyze the change in proportions of diseases detected in physical examination data during a period.
Conceptualization and methodology, B.J., X.X. and Y.W.; software and writing—original draft preparation, X.X.; writing—review and editing, B.J. and Y.W.; supervision, B.J. All authors have read and agreed to the published version of the manuscript.
Not applicable.
Not applicable.
The authors are grateful to the referees for their insightful comments in revising this paper.
The authors declare no conflict of interest.
The following abbreviations are used in this manuscript:
MI | Mutual Information |
FL | Fatty Liver |
OW | Overweight |
TN | Thyroid Nodule |
PN | Pulmonary Nodule |
HC | Hepatic Cyst |
TC | Thyroid Cyst |
Footnotes
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.
Figure 1. Empirical power of three statistics for level [Forumla omitted. See PDF.]. (a) The power under the alternative hypothesis (i). (b) The power under the alternative hypothesis (ii). G denotes the proposed statistic. L is the weighted maximum likelihood ratio statistic in [17]. Q is the statistic in [20]. [Forumla omitted. See PDF.], [Forumla omitted. See PDF.].
Figure 2. Empirical power of [Forumla omitted. See PDF.] and Q under alternative hypothesis (iii). [Forumla omitted. See PDF.], [Forumla omitted. See PDF.].
Explanation of some notations.
Total | Before k | After k | ||||
---|---|---|---|---|---|---|
Experiment | N |
|
|
|||
Category | A | B | A | B | A | B |
Successful trials |
|
|
|
|
|
|
Frequency |
|
|
|
|
|
|
Empirical sizes of
m | |||||
---|---|---|---|---|---|
50 | 100 | 200 | 300 | 500 | |
(0.3, 5) | 0.046 | 0.053 | 0.052 | 0.045 | 0.053 |
(0.3, 6) | 0.052 | 0.058 | 0.059 | 0.052 | 0.052 |
(0.3, 10) | 0.053 | 0.039 | 0.054 | 0.060 | 0.057 |
(0.5, 6) | 0.050 | 0.034 | 0.047 | 0.066 | 0.067 |
(0.5, 8) | 0.053 | 0.045 | 0.054 | 0.062 | 0.062 |
(0.5, 10) | 0.053 | 0.051 | 0.061 | 0.066 | 0.054 |
Mean and standard deviation (in parentheses) of
m | s | Alternative Hypothesis (i) | Alternative Hypothesis (ii) | ||||
---|---|---|---|---|---|---|---|
|
|
|
|
|
|
||
200 | 0.3 | 1.88(3.35) | 15.95(27.15) | 25.97(34.74) | 0.64(1.16) | 0.64(1.18) | 0.66(1.20) |
0.4 | 0.77(1.32) | 2.52(5.02) | 3.94(6.90) | 0.20(0.50) | 0.20(0.50) | 0.21(0.53) | |
0.5 | 0.49(0.91) | 1.11(1.93) | 2.11(3.16) | 0.06(0.25) | 0.06(0.25) | 0.06(0.27) | |
0.6 | 0.31(0.67) | 0.53(0.96) | 1.67(2.59) | 0.01(0.11) | 0.01(0.11) | 0.01(0.11) | |
0.7 | 0.16(0.44) | 0.29(0.63) | 1.12(1.94) | 0 | 0 | 0 | |
0.8 | 0.16(0.52) | 0.16(0.48) | 0.87(1.63) | 0 | 0 | 0 | |
500 | 0.3 | 1.57(2.32) | 10.17(28.95) | 6.40(9.60) | 0.57(1.11) | 0.56(1.10) | 0.57(1.07) |
0.4 | 0.89(1.49) | 2.24(3.41) | 3.52(4.59) | 0.16(0.46) | 0.16(0.46) | 0.18(0.48) | |
0.5 | 0.50(0.92) | 1.08(1.68) | 2.18(3.15) | 0.07(0.26) | 0.07(0.26) | 0.07(0.27) | |
0.6 | 0.31(0.67) | 0.51(1.06) | 1.61(2.46) | 0.01(0.11) | 0.01(0.10) | 0.02(0.13) | |
0.7 | 0.15(0.40) | 0.31(0.67) | 1.10(1.83) | 0 | 0 | 0 | |
0.8 | 0.09(0.32) | 0.14(0.42) | 0.88(1.39) | 0 | 0 | 0 |
Mean and standard deviation (in parentheses) of
m | s | Alternative Hypothesis (i) | Alternative Hypothesis (ii) | ||||
---|---|---|---|---|---|---|---|
|
|
|
|
|
|
||
200 | 0.3 | 10.01(31.28) | 33.16(48.90) | 66.82(58.79) | 0.86(1.71) | 0.87(1.70) | 0.89(1.87) |
0.4 | 0.95(1.61) | 8.21(23.85) | 19.98(41.21) | 0.24(0.59) | 0.25(0.62) | 0.25(0.60) | |
0.5 | 0.73(0.26) | 1.82(5.59) | 2.45(3.94) | 0.07(0.27) | 0.08(0.28) | 0.09(0.33) | |
0.6 | 0.71(1.29) | 0.75(1.35) | 1.62(2.49) | 0.02(0.15) | 0.02(0.15) | 0.03(0.18) | |
0.7 | 0.45(0.88) | 0.33(0.70) | 1.2(2.17) | 0 | 0 | 0 | |
0.8 | 0.30(0.66) | 0.20(0.49) | 0.93(1.73) | 0 | 0 | 0 | |
500 | 0.3 | 1.77(2.73) | 52.01(105.27) | 54.39(110.28) | 0.80(1.29) | 0.80(1.26) | 0.83(1.46) |
0.4 | 0.94(1.48) | 3.42(6.47) | 3.82(5.64) | 0.22(0.54) | 0.23(0.56) | 0.24(0.57) | |
0.5 | 0.80(1.34) | 1.40(2.27) | 2.52(3.88) | 0.07(0.28) | 0.07(0.28) | 0.07(0.28) | |
0.6 | 0.68(1.22) | 0.73(1.22) | 1.56(2.27) | 0.01(0.09) | 0.01(0.09) | 0.02(0.13) | |
0.7 | 0.40(0.82) | 0.36(0.75) | 1.09(1.67) | 0.01(0.08) | 0.01(0.08) | 0 | |
0.8 | 0.31(0.66) | 0.20(0.48) | 0.82(1.29) | 0 | 0 | 0 |
Mean and standard deviation (in parentheses) of
s |
|
|
||
---|---|---|---|---|
|
|
|
|
|
0.3 | 1.82(2.63) | 4.01(6.68) | 1.62(2.68) | 5.59(8.32) |
0.4 | 0.82(1.42) | 3.42(4.76) | 0.82(0.82) | 3.06(4.49) |
0.5 | 0.47(0.89) | 2.24(3.37) | 0.46(1.07) | 2.33 (3.33) |
0.6 | 0.24(0.58) | 1.52(2.48) | 0.25(0.57) | 1.37(2.15) |
0.7 | 0.17(0.48) | 1.10(1.72) | 0.12(0.41) | 1.16(1.77) |
0.8 | 0.19(0.51) | 0.83(1.31) | 0.10(0.36) | 0.84(1.42) |
Mean and standard deviation (in parentheses) of
s |
|
|
||
---|---|---|---|---|
|
|
|
|
|
0.3 | 1.67(2.61) | 1.70(3.69) | 1.80(2.85) | 4.89(7.62) |
0.4 | 0.98(1.58) | 3.22(6.08) | 0.85(1.42) | 3.59(5.02) |
0.5 | 0.84(1.41) | 2.44(3.78) | 0.55(1.11) | 2.54(3.80) |
0.6 | 0.54(1.01) | 1.54(2.41) | 0.69(1.18) | 1.67(2.45) |
0.7 | 0.42(0.91) | 1.28(2.12) | 0.44(0.87) | 1.16(1.91) |
0.8 | 0.29(0.65) | 0.88(1.40) | 0.28(0.62) | 0.87(1.47) |
Proportions of the first six diseases before and after
Disease | FL | OW | TN | PN | HC | TC | |
---|---|---|---|---|---|---|---|
Proportion | before |
0.078 | 0.095 | 0.100 | 0.021 | 0.035 | 0.045 |
after |
0.090 | 0.084 | 0.051 | 0.062 | 0.038 | 0.029 |
Appendix A
□
If
-
under
, as for any ; -
under
, as for any , where is defined in condition ( ) of Theorem 2.
See the proof of Theorem 1 in Wang et al. [
By Lemma 1 and Lemma A1, it suffices to deduce the distribution of
We first show that under
Because
Then
Let
First assume that
Now, assume that
Appendix B
Necessary R code related to this article can be found online at
References
1. Page, E.S. Continuous inspection schemes. Biometrika; 1954; 41, pp. 100-115. [DOI: https://dx.doi.org/10.1093/biomet/41.1-2.100]
2. Fletcher, R.J.; Robertson, E.P.; Poli, C.; Dudek, S.; Gonzalez, A.; Jeffery, B. Conflicting nest survival thresholds across a wetland network alter management benchmarks for an endangered bird. Biol. Conserv.; 2021; 253, 108893. [DOI: https://dx.doi.org/10.1016/j.biocon.2020.108893]
3. Fryzlewicz, P. Wild binary segmentation for multiple change-point detection. Ann. Stat.; 2014; 42, pp. 2243-2281. [DOI: https://dx.doi.org/10.1214/14-AOS1245]
4. Ross, G.J.; Chevalier, A.; Sharples, L. Tracking the evolution of literary style via Dirichlet-multinomial change point regression. J. R. Stat. Soc. Ser. A-Stat. Soc.; 2019; 183, pp. 149-167. [DOI: https://dx.doi.org/10.1111/rssa.12492]
5. Jiang, F.; Zhao, Z.; Shao, X. Time series analysis of COVID-19 infection curve: A change-point perspective. J. Econom.; 2020; 232, pp. 1-17. [DOI: https://dx.doi.org/10.1016/j.jeconom.2020.07.039]
6. Palivonaite, R.; Lukoseviciute, K.; Ragulskis, M. Algebraic segmentation of short nonstationary time series based on evolutionary prediction algorithms. Neurocomputing; 2013; 121, pp. 354-364. [DOI: https://dx.doi.org/10.1016/j.neucom.2013.05.013]
7. Sen, A.K.; Srivastava, M.S. On tests for detecting change in mean. Ann. Stat.; 1975; 3, pp. 98-108. [DOI: https://dx.doi.org/10.1214/aos/1176343001]
8. Worsley, K.J. Confidence regions and tests for a change-point in a sequence of exponential family of random variables. Biometrika; 1986; 73, pp. 91-104. [DOI: https://dx.doi.org/10.1093/biomet/73.1.91]
9. Bai, J. Least squares estimation of a shift in linear processes. J. Time Ser. Anal.; 1994; 15, pp. 453-472. [DOI: https://dx.doi.org/10.1111/j.1467-9892.1994.tb00204.x]
10. Vexler, A. Guaranteed testing for epidemic changes of a linear regression model. J. Stat. Plan. Inference; 2006; 136, pp. 3101-3120. [DOI: https://dx.doi.org/10.1016/j.jspi.2004.11.010]
11. Gombay, E. Change detection in autoregressive time series. J. Multivar. Anal.; 2008; 99, pp. 451-464. [DOI: https://dx.doi.org/10.1016/j.jmva.2007.01.003]
12. Truong, C.; Oudre, L.; Vayatis, N. Selective review of offline change point detection methods. Signal Process.; 2020; 167, 107299. [DOI: https://dx.doi.org/10.1016/j.sigpro.2019.107299]
13. Aue, A.; Horváth, L. Structural breaks in time series. J. Time Ser. Anal.; 2013; 34, pp. 1-16. [DOI: https://dx.doi.org/10.1111/j.1467-9892.2012.00819.x]
14. Chen, J.; Gupta, A.K. Parametric Statistical Change Point Analysis; Birkhäuser: Boston, MA, USA, 2000.
15. Ross, A.S.C. Philological probability problems. J. R. Stat. Soc. Ser.-Stat. Methodol.; 1950; 12, pp. 19-59. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1950.tb00040.x]
16. Wolfe, D.A.; Chen, Y.S. The changepoint problem in a multinomial sequence. Commun.-Stat.-Simul. Comput.; 1990; 19, pp. 603-618. [DOI: https://dx.doi.org/10.1080/03610919008812877]
17. Horváth, L.; Serbinowska, M. Testing for changes in multinomial observations: The Lindisfarne scribes problem. Scand. J. Stat.; 1995; 22, pp. 371-384.
18. Batsidis, A.; Horváth, L.; Martín, N.; Pardo, L.; Zografos, K. Change-point detection in multinomial data using phi-divergence test statistics. J. Multivar. Anal.; 2013; 118, pp. 53-66. [DOI: https://dx.doi.org/10.1016/j.jmva.2013.03.008]
19. Riba, A.; Ginebra, J. Change-point estimation in a multinomial sequence and homogeneity of literary style. J. Appl. Stat.; 2005; 32, pp. 61-74. [DOI: https://dx.doi.org/10.1080/0266476052000330295]
20. Wang, G.H.; Zou, C.L.; Yin, G.S. Change-point detection in multinomial data with a large number of categories. Ann. Stat.; 2018; 46, pp. 2020-2044. [DOI: https://dx.doi.org/10.1214/17-AOS1610]
21. Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.; 1948; 27, pp. 379-432. [DOI: https://dx.doi.org/10.1002/j.1538-7305.1948.tb01338.x]
22. Unakafov, A.M.; Keller, K. Change-Point Detection Using the Conditional Entropy of Ordinal Patterns. Entropy; 2018; 20, 709. [DOI: https://dx.doi.org/10.3390/e20090709] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/33265798]
23. Ma, L.J.; Sofronov, G. Change-point detection in autoregressive processes via the Cross-Entropy method. Algorithms; 2020; 13, 128. [DOI: https://dx.doi.org/10.3390/a13050128]
24. Vexler, A.; Gurevich, G. Density-Based Empirical Likelihood Ratio Change Point Detection Policies. Commun.-Stat.-Simul. Comput.; 2010; 39, pp. 1709-1725. [DOI: https://dx.doi.org/10.1080/03610918.2010.512692]
25. Kamimura, R. Supposed maximum mutual information for improving generalization and interpretation of multi-layered neural networks. J. Artif. Intell. Soft Comput. Res.; 2019; 9, pp. 123-147. [DOI: https://dx.doi.org/10.2478/jaiscr-2018-0029]
26. Liu, L.X. Image multi-threshold method based on fuzzy mutual information. Comput. Eng. Appl.; 2009; 45, pp. 166–168, 197.
27. Oh, B.S.; Sun, L.; Ahn, C.S.; Yeo, Y.K.; Yang, Y.; Liu, N.; Lin, Z.P. Extreme learning machine based mutual information estimation with application to time-series change-points detection. Neurocomputing; 2017; 261, pp. 204-216. [DOI: https://dx.doi.org/10.1016/j.neucom.2015.11.138]
28. Kopylova, Y.; Buell, D.A.; Huang, C.T.; Janies, J. Mutual information applied to anomaly detection. J. Commun. Netw.; 2008; 10, pp. 89-97. [DOI: https://dx.doi.org/10.1109/JCN.2008.6388332]
29. Gurevich, G. Retrospective parametric tests for homogeneity of data. Commun.-Stat.-Theory Methods; 2007; 36, pp. 2841-2862. [DOI: https://dx.doi.org/10.1080/03610920701386968]
30. James, B.; James, K.L.; Siegmund, D. Tests for a change-point. Biometrika; 1987; 74, pp. 71-83. [DOI: https://dx.doi.org/10.1093/biomet/74.1.71]
31. Lai, T.L. Sequential changepoint detection in quality control and dynamical systems. J. R. Stat. Soc. Ser. B Stat. Methodol.; 1995; 57, pp. 613-658. [DOI: https://dx.doi.org/10.1111/j.2517-6161.1995.tb02052.x]
32. Lee, W. A Data Mining Framework for Constructing Features and Models for Intrusion Detection Systems. Ph.D. Thesis; Columbia University: New York, NY, USA, 1999.
33. Pareto, V. Cours d’Economie Politique; Droz: Geneva, Switzerland, 1896.
34. Chen, S.X.; Qin, Y.L. A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Stat.; 2010; 38, pp. 808-835. [DOI: https://dx.doi.org/10.1214/09-AOS716]
35. Fan, J.; Liao, Y.; Yao, J. Power enhancement in high-dimensional cross-sectional tests. Econometrica; 2015; 83, pp. 1497-1541. [DOI: https://dx.doi.org/10.3982/ECTA12749] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/26778846]
36. Aue, A.; Hrmann, S.; Horváth, L.; Reimherr, M. Break detection in the covariance structure of multivariate time series models. Ann. Stat.; 2009; 37, pp. 4046-4087. [DOI: https://dx.doi.org/10.1214/09-AOS707]
37. Shiryayev, A.N. On optimum methods in quickest detection problems. Theory Probab. Its Appl.; 1963; 13, pp. 22-46. [DOI: https://dx.doi.org/10.1137/1108002]
38. Roberts, S.W. A comparison of some control chart procedures. Technometrics; 1966; 8, pp. 411-430. [DOI: https://dx.doi.org/10.1080/00401706.1966.10490374]
39. Krieger, A.M.; Pollak, M.; Yakir, B. Surveillance of a Simple Linear Regression. J. Am. Stat. Assoc.; 2003; 98, pp. 456-469. [DOI: https://dx.doi.org/10.1198/016214503000233]
40. Vexler, A.; Gregory, G. Average Most Powerful Tests for a Segmented Regression. Commun.-Stat.-Theory Methods; 2009; 38, pp. 2214-2231. [DOI: https://dx.doi.org/10.1080/03610920802521208]
41. Csörgo, M.; Horváth, L. Limit Theorems in Change-Point Analysis; Wiley: New York, NY, USA, 1997.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
Time-series data often have an abrupt structure change at an unknown location. This paper proposes a new statistic to test the existence of a change-point in a multinomial sequence, where the number of categories is comparable with the sample size as it tends to infinity. To construct this statistic, the pre-classification is implemented first; then, it is given based on the mutual information between the data and the locations from the pre-classification. Note that this statistic can also be used to estimate the position of the change-point. Under certain conditions, the proposed statistic is asymptotically normally distributed under the null hypothesis and consistent under the alternative hypothesis. Simulation results show the high power of the test based on the proposed statistic and the high accuracy of the estimate. The proposed method is also illustrated with a real example of physical examination data.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details
1 School of Management, University of Science and Technology of China, Heifei 230026, China
2 Department of Mathematics and Statistics, York University, Toronto, ON M3J 1P3, Canada