Content area
The most widely used model in multivariate analysis of survival data is proportional hazards model proposed by Cox. While it is easy to get and interpret the results of the model, the basic assumption of proportional hazards model is that independent variables assumed to remain constant throughout the observation period. Model can give biased results in cases which this assumption is violated. One of the methods used modelling the hazard ratio in the cases that the proportional hazard assumption is not met is to add a time-dependent variable showing the interaction between the predictor variable and a parametric function of time. In this study, we investigate the factors that affect the survival time of the firms and the time dependence of these factors using Cox regression considering time-varying variables. The firm data comes from Business Development Centers (ISGEM) which is a prominent business incubation center operating in Turkey. [PUBLICATION ABSTRACT]
ABSTRACT
The most widely used model in multivariate analysis of survival data is proportional hazards model proposed by Cox. While it is easy to get and interpret the results of the model, the basic assumption of proportional hazards model is that independent variables assumed to remain constant throughout the observation period. Model can give biased results in cases which this assumption is violated. One of the methods used modelling the hazard ratio in the cases that the proportional hazard assumption is not met is to add a time-dependent variable showing the interaction between the predictor variable and a parametric function of time. In this study, we investigate the factors that affect the survival time of the firms and the time dependence of these factors using Cox regression considering time-varying variables. The firm data comes from Business Development Centers (ISGEM) which is a prominent business incubation center operating in Turkey.
Jel Code: C41, C24, M13
KEYWORDS
Survival Analysis, Cox Regression Model, Proportional Hazard Assumption, New Firms
ARTICLE HISTORY
Submitted:22 Jun 2012
Resubmitted:03 January 2013
Accepted:25 March 2013
(ProQuest: ... denotes formulae omitted.)
Introduction
Survival analysis deals with the probability of occurrence of a given event at a set of particular points in a time interval (Cox and Oakes, 1984; Sertkaya, Ata and Sözer, 2005) - In the small business and entrepreneurship literature, survival analysis has been used to track the start-ups over the years. The typical survival anaylsis may include the reports of hazard rates, ratios and survival curves while relating a likely set of independent variables to a specific event. A survival curve of a cohort of newly established firms reports what percentage of the cohort continue to survive since its inception over time, indicating whether some of the firms are failed over the years (Karaöz and Albeni, 2011). In many survival studies, it has been examined whether some variables or risk factors are effective on survival or not. Cox proportional hazards (PH) model is the most preferred model in order to investigate the effect of variables on survival time. The key assumption of Cox model is that hazard rate related to different levels of the factors is constant throughout the follow-up period (Ba§ar, 2006) . Violation of the PH assumption requires additional measures for unbiased results of Cox survival regression. In this paper, Cox regression has been applied to investigate the survival of newly established firms under incubation. Violation of PH assumption has been tested and further Cox regressions are performed considering time-varying effects of independent variables to survival.
Survival Analysis
Survival analysis is a collection of statistical procedures for data analysis for which the outcome variable of interest is time until an event occurs (Harrell, 2001). This event may be failure, and for this reason, the analysis of such data is often referred to as survival analysis (Bellera et al., 2010). The main objectives of the survival analysis are i) to estimate and interpret survival characteristics: Kaplan-Meier plots, median estimation and confidence intervals (Cl), ii) to compare survival in different groups: Log-rank test, iii) to assess the relationship of explanatory variables to survival time: Cox regression model (Yay, Çoker and Uysal, 2007).
In a survival analysis, it is usually referred to the time variable as survival time, because it gives the time that an individual has "survived" over some followup period (Geiss et al., 2009). It is also typically referred to the event as a failure, because the event of interest usually is death, disease incidence, or some other negative individual experience (Kleinbaum and Klein, 2005).
When survival time (T) is defined as a random variable with cumulative distribution function P(t) = Pr(T < t) and probability density function P(t) = dP(t)/d (t), survival function S(t) is explained by Equation (2.1) (Yay, Çoker and Uysal, 2007);
... (2.1)
Survival function 5(t) gives the probability that the random variable T exceeds the specified time t (Kleinbaum and Klein, 2005). All survival functions have the characteristics that i) they are nonincreasing; that is, they head downward as t increases, ii) at time t = 0, 5(t) = 5(0) = 1; that is, at the start of the study, since no one has gotten the event yet, the probability of surviving past time 0 is one, iii) at time t = oo, 5(t) = 5(oo) = 0; that is, theoretically, if the study period increased without limit, eventually nobody would survive, so the survival curve must eventually fall to zero (Kleinbaum and Klein, 2005).
The hazard function hit), with its complement of survival function 5 (t), is given by Equation (2.2), where At denotes a small interval of time (Kleinbaum and Klein, 2005);
... (2.2)
The hazard function /l(t) gives the instantaneous potential per unit time for the event to occur, given that the individual has survived up to time t (Tabatabai et al., 2007). In contrast to the survival function, which focuses on not failing, the hazard function focuses on failing, that is, on the event occurring. Thus, in some sense, the hazard function can be considered as giving the opposite side of the information given by the survival function (Kleinbaum and Klein, 2005).
The Cox Proportional Hazards Model
The Cox PH model is usually written in terms of the hazard model formula shown at Equation (2.3). This model gives an expression for the hazard at time t for an individual with a given specification of a set of explanatory variables denoted by X. That is, X represents a collection of predictor variables that is being modeled to predict an individual's hazard (Kleinbaum and Klein, 2005);
... (2.3)
The Cox model formula says that the hazard at time t is the product of two quantities. The first of these, is called the baseline hazard function. The second quantity is the exponential expression e to the linear sum of ßiXi, where the sum is over the p explanatory X variables (Kleinbaum and Klein, 20051-
In general, a hazard ratio (HR) is defined as the hazard for one individual divided by the hazard for a different individual. The two individuals being compared can be distinguished by their values for the set of predictors, that is, the Xs. Hazard ratio is shown by the following formula, where X* denotes the set of predictors for one individual, and X denotes the set of predictors for the other individual (Kleinbaum and Klein, 2005);
... (2.4)
X* = (X^, X2,..., XpJ and X = (Xlt X2,..., Xp) denote the set of X's for two individuals.
Once the model is fitted and the values for X* and X are specified, the value of the exponential expression for the estimated hazard ratio is a constant, which does not depend on time. If we denote this constant by ff', then hazard ratio can be written as shown below (Kleinbaum and Klein, 2005);
... (2.5)
If hazard ratio is greater than 1, the group which has the distinction of 1 category of the variable will higher significantly likely to be exposed to interest event by comparison 0 category of that variable. If the hazard ratio is equal to 1, chance of closing the two groups are equal; if it is between 0 and 1, the group receiving 0 category value has a lower closing probability by comparison 1 category.
The basic assumptions of the Cox regression model can be explained as follows (Yay, Çoker and Uysal, 2007); i) the effects of independent variables on the hazard function are loglinear, ii) The relationship between loglineer function of independent variables and the hazard function is multiplicative, iii) In addition to these two assumption, observations should independent of each other and hazard ratio should remains unchanged with respect to time, ie., is constant. This assumption related to hazard ratio is known as proportional hazard assumption.
A key reason for the popularity of the Cox model is that, even though the baseline hazard is not specified, reasonably good estimates of regression coefficients, hazard ratios of interest, and adjusted survival curves can be obtained for a wide variety of data situations. Another way of saying this is that the Cox PH model is a "robust" model, so that the results from using the Cox model will closely approximate the results for the correct parametric model (Kleinbaum and Klein, 2005).
In addition to the general "robustness" of the Cox model, the specific form of the model is attractive for several reasons (Kleinbaum and Klein, 2005). First, the exponential part of hazard model ensures that the fitted model will always give estimated hazards that are non-negative. Another tempting property of the Cox model is that, even though the baseline hazard part of the model is unspecified, it is still possible to estimate the ß's in the exponential part of the model. Lastly, it is preferred over the logistic model when survival time information is available and there is censoring. That is, the Cox model uses more information (the survival times) than the logistic model, which considers a (0,1) outcome and ignores survival times and censoring.
Evaluating the Proportional Hazards Assumption
For variables not satisfying the non-proportionality assumption, the power of the corresponding tests is reduced, that is, it is less likely to conclude for a significant effect when there is actually one. If the hazard ratio is increasing over time, the estimated coefficient assuming PH is overestimating at first and underestimating later on. For those variables of the model with a constant hazard ratio, the power of tests is also reduced as a consequence of an inferior fit of the model (Bellera et ah, 2010).
There are three general approaches to assess the PH assumption: 1) Graphical Approaches; Kaplan-Meier and log-log plots, observed versus expected plots, 2) Goodness of fit (GOF) test, 3) Statistical Methods; schoenfeld residuals, the log- rank test and time-dependent covariates.
Extension of the Cox Proportional Hazards Model
An important feature of this formula, which concerns the PH assumption, is that the baseline hazard is a function of t, but does not involve the X's. The X's in the formula are called time-independent X's (Kleinbaum and Klein, 2005). It is possible, nevertheless, to consider X's which do involve t. Such X's are called time-dependent variables. If time-dependent variables are considered, the Cox model form may still be used, but such a model no longer satisfies the PH assumption, and is called the extended Cox model (Kleinbaum and Klein, 2005).
In the case of being time-dependent explanatory variables, Cox regression model expands to a model which contains time-independent variables and some functions of the time the product with these variables. Independent variables are, where X^, X2,..., Xpi time-independent variables and Xi(t), X2(t),... ,Xp2(t) time-dependent variables (Sertkaya, Ata and Sözer, 2005);
...
as shown. Accordingly, Cox regression model is, ß and 8 which denote vector of coefficients of explanatory variables (Sertkaya, Ata and Sözer, 2005);
... (2.6)
as written. Where g(t) is defined as a function of time. Selection of g(t) varies according to the state of the variables used and according to the information level of the researchers. This function usually is defined in the form of t, log( t), ln(t) or step functions (Sertkaya, Ata and Sözer, 2005).
The general hazard ratio formula for extended Cox model is shown below (Kleinbaum and Klein, 2005);
... (2.7)
An Application Into New Firm Survival Under Incubation
Although the survival analysis extensively has been used in medical research on individuals, recently it becomes widely popular in business success and survival research. Thus, rather than on individuals, in this paper, we apply Cox regression to investigate the survival of newly established firms under incubation. There are studies applying survival violation of PH assumption has been tested and further Cox regressions are performed considering time-varying effects of independent variables to survival. Our 414 observations on firm characteristics acquired from 12 different incubators, I§GEMs, located across Turkey, in Zonguldak, Tarsus, Eregli, Eskiçehir, Adana, Mersin, Van, Avanos, Samsun, Elazig, Yozgat and Diyarbakir provinces. The data includes almost all firms that currently existing IÇGEMs or the firms that resided in the past yet left I§GEMs by graduation or failure. The survey data consists of the total.
A business incubator can be identified as an organization which mentors the development of newly founded firms by specialized services such as providing office space, specialized staff, machinery, equipment, facilities and business assistance (Aernoudt, 2004). Thus a business incubator is a framework organization which contains a collection of newly established firms. I§GEMs are one of the significant business incubation concept operating in Turkey.
Variables Used in the Analysis
For our analysis, factors affecting the initial success of young enterprises can be summarized as i) Human capital characteristics of new enterprise's owner such as education level and sector experience, ii) Firm characteristics such as scale, age and human capital, iii) Industry characteristics such as market growth rate and entry barriers, iv) Incubation features, v) Other external factors such as macroeconomic fluctuations, regional factors and public policies (Hackett and Dilts, 2004; Aernoudt, 2004). All of the data and variables used in our analysis are taken from Karaöz and Albeni (2011) and descriptive statistics and definitions are presented at An Application Into New Firm Survival Under Incubation
Although the survival analysis extensively has been used in medical research on individuals, recently it becomes widely popular in business success and survival research. Thus, rather than on individuals, in this paper, we apply Cox regression to investigate the survival of newly established firms under incubation. There are studies applying survival violation of PH assumption has been tested and further Cox regressions are performed considering time-varying effects of independent variables to survival. Our 414 observations on firm characteristics acquired from 12 different incubators, IÇGEMs, located across Turkey, in Zonguldak, Tarsus, Eregli, Eskiçehir, Adana, Mersin, Van, Avanos, Samsun, Elazig, Yozgat and Diyarbakir provinces. The data includes almost all firms that currently existing I§GEMs or the firms that resided in the past yet left l§GEMs by graduation or failure. The survey data consists of the total.
A business incubator can be identified as an organization which mentors the development of newly founded firms by specialized services such as providing office space, specialized staff, machinery, equipment, facilities and business assistance (Aernoudt, 2004). Thus a business incubator is a framework organization which contains a collection of newly established firms. IÇGEMs are one of the significant business incubation concept operating in Turkey.
Variables Used in the Analysis
For our analysis, factors affecting the initial success of young enterprises can be summarized as i) Human capital characteristics of new enterprise's owner such as education level and sector experience, ii) Firm characteristics such as scale, age and human capital, iii) Industry characteristics such as market growth rate and entry barriers, iv) Incubation features, v) Other external factors such as macroeconomic fluctuations, regional factors and public policies (Hackett and Dilts, 2004; Aernoudt, 2004). All of the data and variables used in our analysis are taken from Karaöz and Albeni (2011) and descriptive statistics and definitions are presented at Table 3.1. The entrepreneur's age, gender, education, professional career history and experience and family environment factors are the main factors in the literature in terms of the survival of firms (Karaöz and Albeni, 2011).
(exit) variable is used as dependent variable. It takes the value of 1 if the firm is closed within the period in incubation or after the firm has graduated from incubation, the value of 0 in other cases. In addition to (exit), exit time (incubage) is the other main variable in our survival analysis. As seen at Table 3.1, for our dataset, the firms' average life expectancy is 41.52 months. The maximum survival time observed as 158 months. Some of the firms failed either during or some time after leaving the incubator. Yet some of the firms still continue their activity either at incubator or outside the incubator. Survival curve of firms has been presented at Figure 3.1. According to the figure, surivors after 158 months diminish to about 20%.
Results
All Cox Regression results with and without considering time effects are presented in Table 3-2. (gender), (lnentage), (family), (export), (lnempini), (advert), (brand), (comserv), (sector), (compete) and (cycle) variables are insignificant in Model 1, which the time-dependent effects have not taken into account. According to Model 1 estimates, entrepreneur's gender, age, whether s/he is affected family environment; initial firm size, whether the firm exports and does advertising, whether the firm is brand owner; whether the firm takes advantage of common services offered by incubators; the sector in which the firm, intensity of competition in the sector and whether the firm experienced any macroeconomic crisis are not significant on the firms' survival times. Our tests indicate that further estimations are necessary using time-dependent variables. Thus we produce further new estimates and present most relevant two model results at Table 3.2.
Model 2 includes the variables which in Model 1 and all of the interaction terms created by each of these variables multiplying , which is a function of time, in order to handle variable-time interaction. The Model 3 are obtained by removing the interaction terms of (lnempini), (innova), (enteduuni), (whenest), (export), (brand), (gender), (sector), (advert), (networking), (entexp), (income), (onlyloan), (partner), (family), (lnentage), (comserv), (compete) and (cycle) variables from the model. Model 3 is the best model that takes into account time- dependent effects. The variables of (incubsize) and (prorank) are found to be the time-dependent variables.
Also considering the Model 2 and 3, we obtain various results regarding the variables. The possibility of failure of the firms, whose owners only dependent on earnings coming from its new-born firm, is about 6 times higher than other firms. In this case it has been seen that the entrepreneurs having income from other sources are more likely to be successful in start-up business. It is interesting to see the result that the firms whose owners are university graduates have about two times higher risk of failure than other firms. Yet there is a plausible explanation. Most of the incubator residents are specialized in low-technology industries, which have higher likelihood of failure. University graduates, who later realized that the new business has not much prospect, close the firm immediately and return looking for a job related to his carreer. University graduates have higher chance of finding a better paying job than non-university graduates. By the same token, non-university graduates seem to strive more to keep the new business alive. An increase in the number of partners in the firm decreases the possibility of failure of firms to 20%. It is interesting to see that failure risk of firms, whose founding capital is formed entirely by loans, is only about %15 of the other firms, whose initial capital is partially or fully self-financed. If an entrepreneur is in collaboration with stakeholders within and outside the incubation, survival probability of the firm becomes approximately 5-times higher. Moreover, it has been seen from the estimates that innovation activity of new firms increases chance of survival approximately 12-times. Brand ownership also increases the chance of the firm's survival. Establishing a firm within an incubation center that is within its first 3-years (36 months) increases survival probability. Finally, firms those experiences a macroeconomic crisis have nearly two times more likelihood of failure than others.
Conclusions
Cox proportional hazard model, besides others, rest on proportional hazards assumption that independent variables do not vary with time. When PH assumption is violated, Cox regression estimates become biased. Then, Cox survival estimates can be corrected by including the time-varying effects to the analysis. Identification and calculation of time-dependent effects give the opportunity to obtain some otherwise unseen valuable special time pattern information.
In our analysis, initially, the Cox regression was performed by considering that all explanatory variables are constant over time. Then, extended Cox regression models were estimated by including the time-dependent explanatory variables in the model. Our extended model results have shown that it become useful to estimate the Cox Proportional Hazards regression by also including the time-varying explanatory variables to the analysis. Both the time-independent and time-dependent variables create significant effects on the probability of survival of the I§GEM firms.
Overall, our estimates suggest that entrepreneurial experience acquired before starting business at I§GEM, higher number of partners in the firm, formation of the firm's capital completely by loan, being in collaboration with stakeholders within and outside the incubator, innovative activities in the firm, starting the new business within first 36 months of an incubator (in a young incubator), higher number of office spaces, establishing the firm in an economically larger province, and the density of competition in the sector have positive impact on the probability of survival of the new-born firms within the incubator. Entrepreneurs whose only source of income comes from the young firm, who has college diploma, who has brand ownership at the firm, who experience a macroeconomic crisis are more likely to fail.
1 (*) This research paper has been an extension to the findings of the scientific research project "The Factors Affecting Survival and Growth Performance of Newly Established Enterprises in Business Incubators: A Survey on the KOSGEB Business Development Centers (1ÇGEM)", 109K139, which has been funded with grant from TÜBÎTAK (The Scientific and Technological Research Council of Turkey). We also acknowledge the administrative support to the project from Turkish Small and Medium Entreprises Development Organisation (KOSGEB).
References
Aernoudt, R. (2004). Incubators: Tool for Entrepreneurship?. Small Business Economics, 23, 127-35.
Basar, E. (2006, May). Orantih Olmayan Hazard Uzerine Bir Çahçma. Paper presented at 5. Istatistik Giinleri Sempozyumu, Antalya, pp.l 11-16.
Bellera, C.A., MacGrogan, G., Debled, M., De Lara, C.T., Brouste, V., & Mathouhn-Péhssier, S. (2010). Variables with time-varying effects and the Cox model: Some statistical concepts illustrated with a prognostic factor study in breast cancer. BMC Medical Research Methodology, 10:20.
Cox, D.R., & Oakes, D. (1984). Anaylsis of Survival Data. London: Chapman and Hall.
Demirgil, H. (2008). Firmalann Hayatta Kalma and Büyüme Performanslarmi Belirleyen Faktörler: Göller BöIgesi Uzerine Bir Araytirrna. Ph.D. Thesis. Department of Economics, Suleyman Demirel University, Isparta.
Geiss, K., Meyer, M., Radespiel-Tröger, M., & Gefeller, O. (2009). SURVSOFT-Software for nonparametric survival analysis. Computer Methods and Programs in Biomedicine, Elsevier Ireland LTD., 96, 63-71.
Hackett, M., & Dilts, D.M. (2004). A Systematic Review of Business Incubation Research. Journal of Technology Transfer, 29, 55-82.
Karaöz, M., & Albeni, M., (2011), I§ Kuluçkalannda YeniKurulan Giriçimlerin Hayatta Kalma ve Büyüme Performansint Etkileyen Faktörler: KOSGEB I§ Geli§tirme Merkezleri (ISGEM) Uzerine Bir Araftirma. The Scientific and Technological Research Council of Turkey (TÜBÎTAK). (Issue Brief No. 109K139).
Kleinbaum, D.G., & Klein, M. (2005). Survival Analysis: A Self Learning Text (2nd Ed.). New York: Springer.
Scheike, T. H. (2004). Time-Varying Effects in Survival Analysis. In: N. Balakrishnan & C.R. Rao. (Ed.), Advances in Survival Analysis, 61-85. Amsterdam: Elsevier North-Holland.
Sertkaya, D., Ata, N., & Sözer, M. T. (2005). Yaçam çôzümlemesinde zamana bagli açiklayici degiskenli Cox regresyon modeli. Ankara Universitesi Tip Fakültesi Mecmuasi, 58, 153-58.
Tabatabai, M. A., Bursae, Z., Williams, D. K., & Singh, K. P. (2007). Hypertabastic survival model. Theoretical Biology and Medical Modelling, 4:40.
Yay, M., Çoker, E., & Uysal, O. (2007). Yasam Analizinde Cox Regresyon Modeli ve Artiklarm încelenmesi, Cerrahpa§a Tip Dergisi, 38, 139-45.
Aygul ANAVATAN
Akdeniz University, Faculty of Economics and Administrative Sciences,
Department of Econometrics, 07058, Antalya, Turkey
aygulanavatan @akdeniz. edu. tr,
Murat KARAOZ
Akdeniz University, Faculty of Economics and Administrative Sciences,
Department of Econometrics, 07058, Antalya, Turkey,
mkaraoz@akdeniz. edu. tr
Copyright International Burch University Fall 2013