An extension of model selection curves framework to accelerated failure time models
thesisposted on 2022-03-28, 09:20 authored by Md Jamil Hasan Karami
In most existing model selection criteria, a constant penalty multiplier is usually paired with a penalty function. A model selection criterion based on a single value of penalty multiplier, such as Akaike information criterion (AIC) and Bayesian information criterion (BIC), can be "unstable" as a different model may be selected if the penalty multiplier changes even by a small arbitrary amount. This thesis extends a recently developed model selection approach for (generalised) linear models, known as model selection curves (MSC), to accelerated failure time (AFT) models for survival data. In this approach, penalty multiplier in a predetermined range, instead of a single value, is considered. Model selection criteria based on this approach are thus considered more stable as the selected model is the least likely not to be selected even when the penalty multiplier changes. In addition to the two recently introduced longest cathetus criterion and longest hypotenuse criterion, a new criterion, called the triangle area criterion, is proposed in this thesis. Under some conditions, these three criteria are consistent in selecting a specified AFT model, similar to BIC. It is shown that the consistency result seems to hold even when sample size is only reasonably large using simulations. A model selection framework including these three MSC based criteria, as well as BIC and AIC, is proposed for AFT models of survival data. The framework was investigated through simulations considering survival data of various sizes and censoring proportions from different specified models. Moreover, the performance of those model selection criteria based on the MSC was examined in comparison to AIC and BIC. The results indicate that those criteria have the potential to outperform AIC and BIC in selecting the correct model. The model selection framework has also been applied to several real world survival data. A tool in R program is developed to visualise the results from applying the framework.