Bayesian model averaging approach to development of prognostic models

Regression models are widely used in medical research to identify important risk factors of morbidity and mortality. When there are large numbers of variables in the regression analysis - especially when the sample size is small or the event rate is low – identifying which variables to include in the predictive model is problematic. The most widely used approaches to variable selection are stepwise procedures, which may be implemented automatically or manually. The automatic technique consists of sequentially adding and deleting variables, guided by approximate asymptotic ratio tests, leading to the construction of a single "optimal" model. Inferences about the associations of the variables retained in the model - as risk factors for an outcome - are then made as if the selected model is the ideal choice.

Using stepwise selection procedures variables without statistically significant associations to the outcomes are excluded. The statistical penalty of this inclusion / exclusion selection method is to underestimate the uncertainty of this model, particularly because the model form is ignored. This conventional approach is mainly concerned with the imprecision in the strengths of the associations of the prognostic variables to events, but little or no attention is given to the imprecision arising from the variable selection procedure since the final single model is implicitly assumed to be "optimal".

Where does the project lie on the Translational Pathway?

T2 – Human/Clinical Research & T3 – Evidence into Practice

Expected Outputs

Academic publications which may have impacts on current practices on design and analysis of clinical research

Training Opportunities

The advisory board will provide training on the following subjects: Bayesian model averaging approach, development of prognostic models, and Monte Carlo simulation.  The student may be sent out to attend some short courses if necessary.

Skills Required

The ideal candidate should have: (1) MSc in Medical statistics or related subject; (2) Some experience in design and analysis of clinical trials and complex datasets; (3) good command of at least one statistical packages such as STATA/R/SAS; (4) some experience of writing academic reports; (5) excellent communication skills; and (6) knowledge of clinical trials methodology, medical statistics and epidemiology.

Key Publications associated with this project

Madigan, D, Raftery, AE.  Model Selection and Accounting for Model Uncertainty in Graphic Models Using Occam's Window. Journal of the American Statistical Association 1994; 89:1535-1546.

Kass RE, Raftery AE. Bayes factors. Journal of the American Statistical Association 1995; 90:773–79.

Wang D, Zhang W, Bakhai A. Comparison of Bayesian Model Averaging and Stepwise Methods for Model Selection in Logistic Regression. Statistics in Medicine. 2004,23:3451-3467.

Wang D, Bakhai A. Clinical Trials: A Practical Guide to Design, Analysis and Reporting. Remedica Publishing, London and Chicago, 2006, 496pp.

Wang D, Lertsithichai P, Nanchahal K, Yousufuddin M. Risk factors of coronary heart disease: A Bayesian model averaging approach. J Applied Statistics 2003;10(7):813-826

LSTM Themes and Topics – Key Words

Maternal and newborn health