A Bayesian Approach to Multistage Fitting of the

Full text

Turn on search term navigation

(ProQuest: ... denotes non-US-ASCII text omitted.)

Dong Hua 1 and Dechang Chen 2 and Fang Liu 3 and Abdou Youssef 1

Recommended by Zhenqiu Liu

1, Department of Computer Science, The George Washington University, 801 22nd Street NW, Washington, DC 20052, USA
2, Division of Epidemiology and Biostatistics, Uniformed Services University of the Health Sciences, 4301 Jones Bridge Road, Bethesda, MD 20814, USA
3, Department of Computer Science, University of Texas-Pan American, 1201 W. University Drive, Edinburg, TX 78539, USA

Received 1 December 2008; Accepted 17 March 2009

1. Introduction

Hand X-ray shown in Figure 1 is commonly used for skeletal age assessment in pediatric radiology. A discrepancy between skeletal maturity and the chronical age may indicate the presence of some abnormality in skeletal growth. This abnormality has been found to be related to various diseases such as endocrine disorders [1], metabolic/growth abnormalities [2], malformations and bone dysplasias [3], and gonadal dysgenesis [4]. Therefore, the assessment of skeletal maturity has become more and more important clinically. Clearly the accuracy in assessment is of the first concern.

Figure 1: Hand X-ray used in skeletal age assessment.

[figure omitted; refer to PDF]

Features encoded in ossification centers form the basis for assessment. If we know the exact characteristics of the features with regard to different stages of ages, we can do the best job on assessment. In reality, one needs a mechanism to capture such characteristics of features. Given data of a feature with respect to skeletal ages, a simple and common approach is to fit a line or a curve, which in turn is used for future prediction of new patients or assisting radiologists to understand the variation rules of the feature.

For instance, Figure 2(a) shows the variation of a ratio feature [5, 6] in vertical axis with regard to the increasing skeletal age along the horizontal axis from newborn to 19 year old boys. (More details on this ratio are provided in Section 3.2.) Here in the figure, a single line is used for fitting the values of the feature. Obviously, a line is not enough to capture the characteristic of the values of the feature. A quadratic curve, shown in Figure 2(c), does not do a good job either. Fitting a more complex curve does not seem to be a feasible approach. This is because sometimes there are available only a small amount of data which could restrict the learning of complex curves, and local properties (with respect to the time) of the feature are often lost when fitting a global complex curve, and thus leading to inaccurate future prediction.

Examples of fitting the variation of the ratio feature. The horizontal axis represents the skeletal age and the vertical axis corresponds to the values of the feature.

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

(d) [figure omitted; refer to PDF]

In this paper, we propose to fit the variation of features of the skeleton age via a multistage fitting approach. With our approach, we divide the skeletal age axis into several stages or phases, and within each stage, a relative simple model (line or curve) is employed for the purpose of fitting. Usually, the variation of a feature does not follow a simple rule when skeletal age increases. Instead, it often shows different variation patterns among different stages of age. As shown in Figures 2(b) and 2(d), multistage fitting not only can capture the entire pattern of feature variation but also carry the local properties regarding the skeletal age. A critical question is then, how does one determine the appropriate positions to separate the stages? The proposed Bayesian cut in this paper provides an answer via a Bayesian approach.

The rest of the paper is organized as follows. In Section 2, we describe our models for fitting, where the Bayesian cut is introduced. In Section 3, we present our experimental results on multi-stage fitting for artificial and real data. We conclude our paper in Section 4.

2. The Proposed Method

In this section, we first describe our proposed method for a simple case and then extend it to a general scenario.

Given a sequence of values _f1 ,_f2 ,...,_fn , which denotes the skeletal age f in an ascending order, consider the linear relationship between f and one feature y found in the hand X-ray (e.g., length of digit). Usually, such a linear relationship varies as the skeletal age increases. That is, one linear form established for one interval of the skeletal age may not hold for the next interval, where a different linear form should be used. The time where two linear forms differ is called a change point . Our model that takes into account linear relationships and change points is stated as follows: [figure omitted; refer to PDF] where _t1 ,...,_tk-1 (correspondingly _f1 ,...,_fk-1 ) indicate the sequential change points, _tj -_tj-1 ≥3 (j=1,...,k ), and _...ji (for all i ) are independent N(0,^σj2 ) and _...ji (for all i, j ) are independent of each other. In the model, the parameters _βj1 , _βj2 , ^σj2 , _tj are all unknown, which will be estimated in light of the given data. The interval [_tj -_tj-1 ] represents the j th stage or phase, denoted by p_hj . The main task here is to estimate the times _tj . Given the estimates of _tj , the linear forms and the associated parameters can be obtained through the traditional regression technique. We note that the requirement _tj -_tj-1 ≥3 (j=1,...,k) is needed for estimation of the regression lines. When k=2 , the model will be reduced to the two-phase regression with a single change point in [7].

The above model that uses only one dependent variable f can be generalized to include multiple independent variables. This generalization leads to the following model: [figure omitted; refer to PDF] where _fi is a p -dimensional vector of variables, _β...j (j=1,...,k ) is a p -dimensional vector of parameters, _tj -_tj-1 ≥p+1 , and _...ji are as the same as before. We refer p as the cardinality of the input vector _fi , denoted by C(_fi ) , and the number of sample points in p_hj as the cardinality of [_tj -_tj-1 ] , denoted by C(p_hj ) . We note that though linear regression is used for each phase in model (2), this model certainly encompasses other nonlinear cases such as polynomial forms.

We now describe a Bayesian approach to estimate the change points. Denote (_f_tj-1₊₁ ,...,_f_tj^)T by _Fj , (^F1T ,...,^FkT^)T by F , (_y_tj-1₊₁ ,...,_y_tj^)T by _yj , (^_y1^T ,...,^_yk^T^)T by y , and (_t1 ,...,_tk-1 ) by t . For simplicity, we assume the noninformative or uniform prior for _β...j (j=1,...,k ), ln(^_σj² ) and t . Noninformative priors are used when information about parameters is completely unknown or when proper priors such as conjugate priors do not apply. (For a vigorous discussion on the choice of priors, see [8].) We can show the following main result (see the Appendix). Given the data y and the uniform prior for _β...j (j=1,...,k ), ln(^_σj² ) and t , where the number k is predetermined, the posterior probability that change points occur at t is [figure omitted; refer to PDF] where J=(_∑t ...^2(n-kp)/2_∏j ...|^FjT_Fj^|-1/2 Γ((_tj -_tj-1 -p)/2)×^Sj-(^_tj^-^_tj-1^-p)/2^)-1 , and _Sj =(_yj -_Fj_β......j^)T (_yj -_Fj_β......j ) with _β......j =(^FjT_Fj^)-1^FjT_yj denoting the least-squares estimator of _β...j . Using this result, we estimate t by ^t* at which p(t|"y) has its maximum, that is, ^t* =arg ma_xt p(t|"y) . We call ^t* the Bayesian cut , and the value ^2(n-kp)/2_∏j ...|^FjT_Fj^|-1/2 Γ((_tj -_tj-1 -p)/2)^Sj-(^_tj^-^_tj-1^-p)/2 the proportional posterior (pp ).

3. Experiments

In this section, we perform the Bayesian cut on two data sets: one is synthesized and the other is real. We use the synthesized data for performance evaluation in terms of recovery of changing points. The real data are used to discover the Bayesian cut and describe the feature in a multistage way which has more accurate prediction of the skeletal age compared with fitting by a single line or curve. Both linear and nonlinear regression are used for comparison. For convenience, we call the fitting with a single line or curve the single fitting and the fitting with the Bayesian cut the Bayesian cut fitting .

3.1. Synthesized Data

We consider five cases or models describing the relationship between the dependent and independent variables. These are shown in Table 1 where the input vector _fi for models _m1 , _m2 , _m3 , _m4 , and _m5 is (1,_fi^)T , (1,_fi ,^fi2^)T , (1,_fi ,^fi2 ,^fi3^)T , (1,_fi ,^fi2 ,^fi3 ,^fi4^)T , and (1,_fi ,^fi2 ,^fi3 ,^fi4 ,^fi5^)T , respectively. The data are generated according to the setting given in Table 2. Specifically, _βji is randomly chosen from (-5.0 , 5.0 ). _...ji is generated from a normal distribution with mean 0 and variance ^σj2 randomly selected from (0,^5C(^_fi^)-1 ) . The number of sample points of the jth phase C(p_hj ) is randomly selected from the set {(C(_fi )+1),...,(C(_fi )+1)+s} , where s is predetermined. _fi takes the value of i for i=1,2,...,_tk . Note that we use a variable bound for ^σj2 for taking into account the influence of the highest degree of the polynomial. Also, we use the variable number of sample points for each phase by introducing unbalance and scalability factors such that the performance evaluation will be more objective.To present a quantity on the performance of the Bayesian cut, we use the metric absolute deviation (AD), defined as [figure omitted; refer to PDF] where ^tj* represents the jth element of ^t* (the Bayesian cut). Intuitively, the smaller AD is, the closer is the Bayesian cut ^t* to the true change points t .

Table 1: Models for testing the performance of the Bayesian cut.

_m1	_yi =_βj1 +_βj2_fi +_...ji ,
t=(_t1 ,...,_tk-1 )
_m2	_yi =_βj1 +_βj2_fi +_βj3^_fi² +_...ji ,
t=(_t1 ,...,_tk-1 )

_m3	_yi =_βj1 +_βj2_fi +_βj3^_fi² +_βj4^_fi³ +_...ji ,
t=(_t1 ,...,_tk-1 )

_m4	_yi =_βj1 +_βj2_fi +_βj3^_fi² +_βj4^_fi³ +_βj5^_fi⁴ +_...ji ,
t=(_t1 ,...,_tk-1 )

_m5	_yi =_βj1 +_βj2_fi +_βj3^_fi² +_βj4^_fi³ +_βj5^_fi⁴ +_βj6^_fi⁵ +_...ji ,
t=(_t1 ,...,_tk-1 )

Table 2: Experimental setting.

_βji	(-5.0,5.0)
_...ji	~N(0,^σj2 ),^σj2 ∈(0,^5C(^_fi^)-1 )
k	2, 3, 4
C(p_hj )	(C(_fi )+1),...,(C(_fi )+1)+s
scale	1,...,10
_t0	0
_tj	_tj-1 +C(p_hj-1 )
_fi	1,...,_tk

Table 3 shows the AD values. They are obtained by ranging k from 2 to 4 and s from 1 to 10 . For given k , s , and a given model, 50 trials are performed to generate data, leading to 50 datasets {(F, y)} . We find the Bayesian cut ^t* for each (F, y) and a given model. The final AD score is obtained by averaging the 50 runs.

Table 3: AD scores for models in Table 1.

k	s	_m1	_m2	_m3	_m4	_m5
2	1	0.280	0.340	0.320	0.080	0.180
2	0.300	0.460	0.360	0.200	0.100
3	0.260	0.400	0.320	0.100	0.100
4	0.640	0.380	0.260	0.180	0.180
5	0.480	0.680	0.480	0.100	0.060
6	0.380	0.300	0.560	0.220	0.100
7	0.540	0.520	0.340	0.280	0.100
8	0.900	0.520	0.440	0.120	0.020
9	0.740	0.340	0.080	0.040	0.020
10	0.740	0.720	0.160	0.200	0.020

3	1	0.230	0.360	0.210	0.240	0.090
2	0.440	0.390	0.190	0.080	0.060
3	0.590	0.340	0.210	0.220	0.060
4	0.820	0.590	0.260	0.060	0.010
5	0.970	0.690	0.530	0.020	0.090
6	0.670	0.580	0.120	0.060	0.070
7	1.220	0.750	0.160	0.080	0.190
8	1.260	0.680	0.650	0.040	0.030
9	1.210	0.860	0.370	0.380	0.010
10	1.340	0.360	0.680	0.020	0.020

4	1	0.333	0.300	0.133	0.040	0.053
2	0.440	0.433	0.227	0.060	0.033
3	0.867	0.480	0.113	0.080	0.033
4	0.780	0.513	0.093	0.080	0.133
5	1.020	0.887	0.453	0.133	0.173
6	1.360	0.760	0.193	0.093	0.180
7	1.007	0.593	0.353	0.047	0.040
8	0.727	0.587	0.453	0.093	0.113
9	1.080	1.240	0.867	0.360	0.087
10	1.213	0.873	0.333	0.120	0.140

Our findings can be summarized as follows. Regardless of linear or nonlinear regression, the Bayesian cut performs well with low AD scores. Introducing the unbalance and scalability factors does not deteriorate the performance of the Bayesian cut significantly. The Bayesian cut scales well when the number of change points increases.

3.2. Real Data

In this part, we apply the Bayesian cut fitting to some real data from our database shown in Table 4. This table describes feature values with regard to the increasing skeletal age that ranges from newborn to 19 -year-old boys (shown in column 1) labeled by radiology experts. In order to obtain features independent of the size and the length of digits, two ratio features are used according to the paper [5]. One is _L1 /_L2 , the ratio of the length of distal phalanx _L1 to that of middle phalanx _L2 of the middle digit, and the other is _L2 /_L3 , the ratio of the length of middle phalanx _L2 to that of proximal phalanx _L3 . See Figure 3 for illustration of _L1 , _L2 , and _L3 . These two features correspond to columns 2 and 3 which are generated in the light of the algorithm in [6]. Columns 4 and 5 represent normalized values of _L1 /_L2 and _L2 /_L3 , respectively. This normalization is done according to (x-μ)/σ, where μ is the expectation of x and σ is the variance. In our experiments, only normalized values are used. Figure 4 shows some of the Bayesian cut fitting, where features n(_L1 /_L2 ) and n(_L2 /_L3 ) are used, models describing the relationship between the feature and the skeletal age are _m1 and _m2 from Table 1, and k takes values of 2 , 3 , and 4 . In Figure 4, the horizontal axis represents the age and the horizontal axis indicates the feature. For model _m1 , the blue straight line across the entire age range is from the single (line) fitting. For model _m2 , the blue curve across the entire age range is from the single (quadratic) fitting. All red (broken) lines are from the Bayesian cut fitting.

Table 4: Some features of the skeletal age.

Age (yr)	_L1 /_L2	_L2 /_L3	n(_L1 /_L2 )	n(_L2 /_L3 )
0	0.6795	0.7016	41.8212	51.1987
3	0.6307	0.5853	6.4071	-17.6281
3.5	0.6220	0.6298	0.1020	8.6933
4.0	0.6060	0.5993	-11.4491	-9.3140
4.5	0.6111	0.5708	-7.7721	-26.1616
5.0	0.6172	0.5070	-3.3303	-63.8970
6.0	0.5675	0.5924	-39.3612	-13.4245
7.0	0.5947	0.6626	-19.6939	28.0937
8.0	0.5820	0.6097	-28.9032	-3.1878
9.0	0.5939	0.5968	-20.2149	-10.7828
10.0	0.5680	0.6643	-39.0383	29.1323
11.0	0.5776	0.6696	-32.0541	32.2560
11.5	0.5845	0.6550	-27.0602	23.6424
12.5	0.5979	0.6266	-17.3472	6.8003
13.0	0.6292	0.5670	5.3295	-28.4227
13.5	0.6000	0.6219	-15.8024	4.0436
14.0	0.6436	0.6065	15.7982	-5.0842
15.0	0.6703	0.6319	35.1558	9.9431
15.5	0.6843	0.5937	45.2891	-12.6564
16.0	0.6746	0.5843	38.2966	-18.2156
17.0	0.6632	0.6153	30.0081	0.1412
18.0	0.6589	0.6236	26.8770	5.0546
19.0	0.6452	0.6316	16.9420	9.7754

Figure 3: Illustration Of _L1 , _L2 and _L3 .

[figure omitted; refer to PDF]

Illustration of the Bayesian cut fitting applied to the real data on features of the skeletal age.

(a) [figure omitted; refer to PDF]

(b) [figure omitted; refer to PDF]

(d) [figure omitted; refer to PDF]

(e) [figure omitted; refer to PDF]

(f) [figure omitted; refer to PDF]

(g) [figure omitted; refer to PDF]

(h) [figure omitted; refer to PDF]

(i) [figure omitted; refer to PDF]

(j) [figure omitted; refer to PDF]

(k) [figure omitted; refer to PDF]

(l) [figure omitted; refer to PDF]

4. Conlcusion

In this paper, we propose the Bayesian cut fitting to describe features in response to the skeletal age. In the semantic space derived by our approach, the axis of skeletal age is divided into meaningful stages, within each of which the variation pattern of a feature is consistent so that a traditional regression technique can apply to model the relationship between the skeletal age and the feature. Our approach is inspired by the observation that the variation pattern of a feature can differ in different periods of the skeletal age. A critical issue is to determine the times or change points when the variation pattern of a feature changes. This is handled by the Bayesian cut proposed in this paper. Simulations have been used to demonstrate the efficiency of the Bayesian cut fitting in terms of recovery of change points. The experiments on real data show that given a type of relationship (e.g., linear or quadratic) between the skeletal age and a feature, the Bayesian cut fitting surpasses the traditional single fitting when the consistency of the variation pattern (over the entire skeletal age range) of the feature is suspected. One major issue which is not addressed in this paper is the determination of k , the number of stages. Selection of k depends on the given data and the practical need. We leave this as our future research work.

Acknowledgments

Dechang Chen was partially supported by the National Science Foundation grant CCF-0729080.

References

[1] D. B. Darling, chapter 6 Radiography of Infants and Children , Charles C. Thomas, Springfield, Ill, USA, 1979., 1st.

[2] A. K. Poznanski, S. M. Garn, J. M. Nagy, J. C. Gall Jr., "Metacarpophalangeal pattern profiles in the evaluation of skeletal malformations," Radiology , vol. 104, no. 1, pp. 1-11, 1972.

[3] D. R. Kirks, chapter 6 Practical Pediatric Imaging: Diagnostic Radiology of Infants and Children , Little, Brown, Boston, Mass, USA, 1984., 1st.

[4] J. Kosowicz, "The roentgen appearance of the hand and wrist in gonadal dysgenesis," The American Journal of Roentgenology, Radium Therapy and Nuclear Medicine , vol. 93, pp. 354-361, 1965.

[5] E. Pietka, M. F. McNitt-Gray, M. L. Kuo, H. K. Huang, "Computer-assisted phalangeal analysis in skeletal age assessment," IEEE Transactions on Medical Imaging , vol. 10, no. 4, pp. 616-620, 1991.

[6] E. Pietka, A. Gertych, S. Pospiech, F. Cao, H. K. Huang, V. Gilsanz, "Computer-assisted bone age assessment: image preprocessing and epiphyseal/metaphyseal ROI extraction," IEEE Transactions on Medical Imaging , vol. 20, no. 8, pp. 715-729, 2001.

[7] D. Chen, M. Fries, J. M. Lyon, "A statistical method of detecting bioremediation," Journal of Data Science , vol. 1, no. 1, pp. 27-41, 2003.

[8] G. E. P. Box, G. C. Tiao Bayesian Inference in Statistical Analysis , John Wiley & Sons, New York, NY, USA, 1992.

Appendix

A. Derivation of (3)

Proof.

According to the Pythagorean theorem, we have the following likelihood [figure omitted; refer to PDF] where _Sj =(_yj -_Fj_β......j^)T (_yj -_Fj_β......j ) and _β......j =(^FjT_Fj^)-1^FjT_yj . Since _...ji are independent of each other, the likelihood function of _β...1 ,...,_β...k , ^σ12 ,...,^σk2 ,t is then [figure omitted; refer to PDF] Due to the assumption of the uniform prior for _β...j , ln(^σj2 ) and t , we have [figure omitted; refer to PDF] Using (A.2) and (A.6), we have [figure omitted; refer to PDF]

Note that [figure omitted; refer to PDF] This equation exploits the fact [figure omitted; refer to PDF] from the normal density for the p -dimensional random vector X [figure omitted; refer to PDF] where μ... is the expected value of X and Σ is the variance-covariance matrix of X .

Substituting (A.10) into (A.7), we have [figure omitted; refer to PDF]

In addition, we have [figure omitted; refer to PDF] from the probability density function of X=aU [figure omitted; refer to PDF] where the constant a>0 and ^U-1 ~^χm2 .

By applying (A.9) to (A.11), we get [figure omitted; refer to PDF] where J=(_∑t ...^2(n-kp)/2_∏j ...|^FjT_Fj^|-1/2 Γ((_tj -_tj-1 -p)/2)×^Sj-(^_tj^-^_tj-1^-p)/2^)-1 . This completes the proof.

Word count: 3280

Show less

Copyright © 2009 Dong Hua et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Translate

Accurate assessment of skeletal maturity is important clinically. Skeletal age assessment is usually based on features encoded in ossification centers. Therefore, it is critical to design a mechanism to capture as much as possible characteristics of features. We have observed that given a feature, there exist stages of the skeletal age such that the variation pattern of the feature differs in these stages. Based on this observation, we propose a Bayesian cut fitting to describe features in response to the skeletal age. With our approach, appropriate positions for stage separation are determined automatically by a Bayesian approach, and a model is used to fit the variation of a feature within each stage. Our experimental results show that the proposed method surpasses the traditional fitting using only one line or one curve not only in the efficiency and accuracy of fitting but also in global and local feature characterization.

Details

Title

A Bayesian Approach to Multistage Fitting of the Variation of the Skeletal Age Features

Author

Dong, Hua; Chen, Dechang; Liu, Fang; Abdou Youssef

Pages

623853

Publication year

2009

Publication date

2009

Publisher

John Wiley & Sons, Inc.

ISSN

11107243

e-ISSN

11107251

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2009/623853

ProQuest document ID

856044312

A Bayesian Approach to Multistage Fitting of the Variation of the Skeletal Age Features

Jump to:

Full text

Abstract

Details

Suggested sources