This PhD opportunity is being offered as part of the LSTM and Lancaster University Doctoral Training Partnership. Find out more about the studentships and how to apply.
| Abstract | Tuberculosis (TB) remains one of the world’s deadliest infectious diseases, responsible for over 10 million new cases and 1.3 million deaths each year. Despite effective diagnostics and treatment, the persistent challenge is the so-called “missing millions”: people with TB who remain undiagnosed and untreated. Closing this diagnostic gap is a global health priority (see our team's article here https://bmcglobalpublichealth.biomedcentral.com/articles/10.1186/s44263-024-00063-4). The Start4All project (www.lstmed.ac.uk/start4all), funded by Unitaid, is working to bring TB screening and diagnosis closer to the point of need in high-burden settings. In Phase 1, our international consortium recruited ~15,000 participants (adults and children) across seven diverse countries (Bangladesh, Brazil, Cameroon, Kenya, Malawi, Nigeria, and Viet Nam). Participants contributed an unprecedented multimodal dataset including:
This combination of social, clinical, and diagnostic data represents one of the most comprehensive TB screening datasets ever assembled. To date, it has enabled impactful publications and diagnostic modelling studies. However, important opportunities remain untapped. These include, specifically, harnessing these diverse data sources to build predictive models that can identify, with high accuracy, individuals at greatest risk of prevalent TB. This PhD project will seize that opportunity. The student will work at the interface of epidemiology, biostatistics, and infectious disease modelling, supervised by Dr Tom Wingfield (Liverpool School of Tropical Medicine, LSTM) and Dr Jonathan Read (Lancaster University), who bring complementary expertise and a strong track record of collaboration at PhD and post-doctoral level. Together, they have successfully supervised and supported PhD students through to completion and career progression, underpinned by a robust supervisory and peer support structure across both institutions. Aims and Approach The overarching aim is to develop, validate, and evaluate a risk prediction tool for TB that combines sociodemographic, clinical, and diagnostic screening data, with the potential to guide decision-making in real-world, resource-limited settings. The project will proceed in three integrated stages: Stage 1: Development and Internal Validation Using Start4All Phase 1 data, the student will develop a risk prediction model (or models) for prevalent TB. Candidate predictors will include social and behavioural variables (age, occupation, household TB exposure, substance use), clinical information (BMI, prior TB, diabetes), and diagnostic screening results (CRP, CAD scores, WHO 4-symptom screen, sputum Xpert CT values). State-of-the-art statistical and machine learning methods (e.g. penalised regression, random forests, gradient boosting, cross-validation) will be applied, with careful attention to interpretability and clinical utility. Internal validation will assess discrimination, calibration, and clinical decision-analytic performance across subgroups (e.g. adults vs children, HIV-positive vs HIV-negative). Stage 2: External Validation and Extension The best-performing tool(s) will be externally validated in Start4All Phase 2, which is underway in five countries. This phase also incorporates large-scale near point-of-care (NPOC) mouth swab testing, offering opportunities to extend and update the risk tool to integrate this novel diagnostic modality. Analyses will consider settings where symptom-based screening may be unreliable, exploring applicability to both symptomatic and asymptomatic individuals. Stage 3: Modelling Added Value and Impact The student will lead modelling studies comparing the performance of the risk prediction tool to current decision rules, such as CAD thresholds alone, for determining next diagnostic steps (no test, pooled Xpert, individual Xpert). These analyses will quantify incremental benefit in terms of sensitivity, specificity, and case detection yield. With support from our health economics team, the student will also explore cost and cost-effectiveness implications from provider and patient perspectives, generating evidence with immediate policy relevance. Training Environment and Support The student will be jointly based at LSTM and Lancaster, benefiting from: LSTM’s global health expertise and network: direct links to Start4All study teams in Africa, Asia, and Latin America, and access to ongoing Phase 2 data collection. Lancaster’s world-class methodological training: Dr Jonathan Read and colleagues in the CHICAS group (Centre for Health Informatics, Computing, and Statistics) offer deep expertise in statistical modelling, infectious disease dynamics, and applied machine learning. Proven supervision track record: Dr Wingfield and Dr Read have successfully co-supervised PhD students, securing timely completions, publications in leading journals, and transitions into postdoctoral and academic positions. Students benefit from structured progression monitoring, joint supervisory meetings, and integration into vibrant PhD cohorts at both institutions. The student will also have access to the broader Start4All investigator group, including statisticians (Dr Marc Henrion), clinicians, and health economists, providing multidisciplinary input. Opportunities for short-term placements with partner sites in high-burden countries will be encouraged, offering first-hand exposure to TB diagnostics in real-world settings. Candidate Profile This project is well-suited to candidates with backgrounds in epidemiology, biostatistics, data science, public health, or related fields. Essential skills include statistical programming (e.g. R, Python, or Stata) and enthusiasm for working with large, complex datasets. Previous experience in infectious diseases, global health, or machine learning is desirable but not essential, as tailored training will be provided. Impact This PhD will generate a practical, evidence-based TB risk prediction tool with potential to directly influence diagnostic algorithms in high-burden settings. By integrating social, clinical, and screening data into one accessible model, the project will help move from “one-size-fits-all” approaches to more efficient, targeted case finding. In doing so, it directly supports the global End TB Strategy goal of finding the “missing millions” and reducing TB-related morbidity and mortality. At the same time, the student will gain advanced skills in predictive modelling, validation, and impact evaluation, preparing them for leadership careers in global health research, biostatistics, or policy. |
| Where does this project lie in the translational pathway? | T2 - Human /Clinical Research,T3 - Evidence into Practice ,T4 - Practice to Policy/Population |
| Methodological Aspects | This PhD will employ a suite of advanced quantitative and interdisciplinary methods to develop, validate, and evaluate a TB risk prediction tool using the Start4All dataset, one of the most comprehensive TB screening datasets ever assembled. The methodological elements can be grouped into three domains: predictive modelling, validation and performance evaluation, and impact modelling. 1. Predictive Modelling The first phase of the project will focus on risk tool development using Start4All Phase 1 data (~15,000 participants). Predictors will include sociodemographic (e.g. age, occupation, education, rural/urban residence, household TB exposure), behavioural (smoking, alcohol, drugs), clinical (BMI, diabetes, prior TB), and diagnostic screening results (CAD chest x-ray scores, CRP, symptom screen, urine LAM). Variable selection and preprocessing: Standard approaches (e.g. multiple imputation for missing data, normalisation for continuous variables, categorisation for clinical interpretability) will be applied. Correlated variables will be handled using penalised regression or dimensionality reduction techniques where appropriate. Model development: A comparative framework will test classical regression (multivariable logistic regression with penalisation methods such as LASSO or elastic net for variable selection), potentially alongside modern machine learning approaches (e.g. random forests, gradient boosting, support vector machines) depending on discussions and student preference. This dual approach would allow balance between interpretability (essential for clinical uptake) and predictive accuracy. Model optimisation: Internal validation with cross-validation and bootstrapping will guard against overfitting. Feature importance and partial dependence plots will be used to interpret complex models and ensure clinical plausibility. 2. Validation and Performance Evaluation The next step will be external validation using Start4All Phase 2 data, collected prospectively across five countries and incorporating additional diagnostics such as near point-of-care mouth swabs. Discrimination and calibration: Standard measures will include ROC curves, AUC, sensitivity, specificity, positive/negative predictive values, and calibration plots. Decision curve analysis will assess net clinical benefit across thresholds. Statistician Marc Henrion who has worked across Start4All since the beginning will be able to support this process from learnings already gained. Marc's SAP for Phase 1 of Start4All is here: https://github.com/gitMarcH/Start-4-All Subgroup analysis: Validation will explicitly test model performance across key subgroups—adults vs children, HIV-positive vs HIV-negative, symptomatic vs asymptomatic—to explore generalisability and highlight where recalibration or subgroup-specific models may be needed. Model updating and extension: Depending on Phase 1 performance, the tool may be extended to incorporate new predictors (e.g. mouth swab results), using model updating techniques such as recalibration or transfer learning. 3. Modelling Impact and Added Value Beyond predictive performance, the student will lead quantitative modelling of how the risk tool could be deployed in practice. Comparative decision analysis: The risk score will be evaluated against existing approaches, such as CAD thresholds alone or symptom-based screening, in terms of yield and efficiency for selecting individuals for further diagnostic testing (pooled or individual Xpert). Incremental value analysis: Net reclassification improvement and integrated discrimination improvement will quantify gains from incorporating new predictors. Cost-effectiveness modelling: In collaboration with the Start4All health economics team, the student will use decision-analytic models (e.g. decision trees or Markov models) to estimate provider and patient costs, cost per case detected, and potential system-level savings. 4. Interdisciplinary and Statistical Robustness The quantitative work will be embedded in an interdisciplinary supervisory structure: At Lancaster University, Dr Jonathan Read and colleagues will provide expertise in advanced statistical modelling, computational methods, and infectious disease dynamics. At LSTM, Dr Tom Wingfield and the Start4All consortium will provide clinical, epidemiological, and global health context, ensuring the models address real-world needs and considers feasibility and scalability using implementation science approaches. Statistical reproducibility will be emphasised, with fully documented code (R/Python) and transparent reporting following TRIPOD (Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis) guidelines. In summary, this PhD will offer outstanding quantitative training. The student will master statistical modelling, machine learning, internal and external validation techniques, and decision-analytic and cost-effectiveness modelling. They will gain experience in managing large, multi-country datasets and applying robust methods to translate quantitative findings into practical diagnostic tools. These methodological elements not only align with MRC’s quantitative and interdisciplinary skills priorities but also provide a clear pathway to translational impact in global health. |
| Expected Outputs | This PhD is designed to deliver high-impact scientific, translational, and career outputs. Publications The primary academic outputs will be a series of high-quality, peer-reviewed publications in leading journals. We anticipate at least three core papers: 1. Development and internal validation of a TB risk prediction tool using multi-country Start4All Phase 1 data. 2. External validation and extension of the tool in Start4All Phase 2 data, including novel near point-of-care mouth swab testing. 3. Modelling of the incremental diagnostic and cost-effectiveness benefits of risk-based screening compared to current approaches. These publications will build on our track record of successful outputs from Start4All, which has already generated impactful papers (including forthcoming publications in Lancet Respiratory Medicine). The quality and novelty of the work means we expect placement in high-impact infectious disease and global health journals. The candidate will also attend international conferences to present their work as oral presentations as we are doing having been selected to present our phase 1 findings at the prestigious "US-CDC/Union Late Breaker Session" at the International Union Against TB and Lung Disease Annual Meeting in Copenhagen, Denmark, this year. Policy and Global Health Impact A distinctive strength of this project is its direct connection to policy. The supervisory team and consortium have active links with the World Health Organization (WHO), including guideline development groups on TB diagnosis, screening, and paediatric TB. Our Start4All pooled sputum data has already contributed to WHO recommendations, demonstrating tangible impact. The outputs from this PhD are expected to inform upcoming WHO guidelines, handbooks, and operational manuals, particularly on targeted TB screening strategies. Networks and Collaborations The student will benefit from integration into world-class TB research networks:
These networks will provide not only collaborative opportunities but also visibility and platforms for dissemination at global TB conferences and policy fora. Funding and Career Development The PhD will lay the groundwork for postdoctoral fellowship applications and independent funding to implement and scale up the risk tool. Likely pathways include:
The clear translational pathway—from data-driven model development to WHO-relevant tools—will be attractive to funders prioritising implementation science and global health impact. Science Communication and Public Engagement A further unique output will be engagement with diverse audiences. Dr Wingfield has an established record in science communication, having contributed to the British Science Festival with an Award Lecture in 2025, The RCP Linacre Lecture, The Lancet Young Investigator Award Lecture, The Conversation, and national broadsheets. The student will be supported to contribute to similar public-facing outputs, ensuring their research reaches patients, communities, policymakers, and the public. This will enhance the visibility, reach, and real-world relevance of their work. In sum, the expected outputs of this PhD include: At least three peer-reviewed publications in high-impact journals.
This combination of academic excellence, policy relevance, collaborative strength, and public engagement ensures the PhD will have enduring scientific and societal impact. |
| Training Opportunities | The student will join a uniquely supportive and interdisciplinary environment that combines world-class methodological training with direct links to clinical and policy impact. Quantitative and Methodological Training Advanced statistical and machine learning methods (multivariable regression, penalised regression, random forests, gradient boosting, validation techniques). Predictive modelling best practice following TRIPOD guidelines, with emphasis on model calibration, discrimination, and clinical utility. Large-scale data management and cleaning, using real-world multi-country datasets of >15,000 individuals. Application of health impact and cost-effectiveness modelling in collaboration with the Start4All health economics team. Training in reproducible research practices, coding (R, Python, Stata), version control, and transparent reporting. Interdisciplinary and Translational Training Immersion in a multidisciplinary supervisory team: Dr Tom Wingfield (LSTM) with expertise in TB medicine, epidemiology, and global health; Dr Jonathan Read (Lancaster) with expertise in statistical and infectious disease modelling. Access to additional mentors and collaborators, including statisticians (Dr Marc Henrion), health economists, diagnostic scientists, and international site investigators. Opportunities to contribute directly to WHO guideline development and operational handbooks, offering first-hand exposure to research translation into policy. Engagement with TB research networks (UKAPTB, SSHIFTB, LIV-TB, and the LSTM Centre for TB Research), providing opportunities for collaboration, peer learning, and visibility. Professional and Career Development Access to structured doctoral training programmes at Lancaster and LSTM, including courses in epidemiology, advanced biostatistics, health economics, and scientific writing. Presentation of research at leading international conferences (e.g. Union World Conference on Lung Health), with training in oral and poster presentation skills. Opportunities to contribute to high-impact publications in leading journals, ensuring strong academic outputs for career progression. Support and mentoring to apply for postdoctoral fellowships and independent research funding (Wellcome Discovery, MRC, NIHR), building a clear career pathway. Science Communication and Public Engagement Tailored support from Dr Wingfield, who has a strong track record of public engagement (British Science Festival, The Conversation, broadsheet features). Opportunities to develop communication skills and engage wider audiences, ensuring the student can translate complex quantitative science into accessible messages for patients, communities, and the public. Global Health Experience Potential for short placements with partner sites in Africa, Asia, or Latin America, offering direct experience of TB diagnostics in real-world, resource-limited settings. Training in the ethical, logistical, and cultural aspects of conducting collaborative international research. |
| Skills Required | The student should have a strong background in a relevant discipline such as epidemiology, biostatistics, data science, public health, or a related quantitative field. Essential skills include: Competence in statistical programming (e.g. R, Python, or Stata). Experience working with quantitative data, ideally health-related. Strong analytical and problem-solving skills, with an aptitude for handling large and complex datasets. Desirable experience includes familiarity with infectious diseases, global health, or machine learning methods, though tailored training will be provided (see Q10 and Q13). Equally important are aptitudes: enthusiasm for interdisciplinary collaboration, willingness to engage with international partners, and motivation to translate quantitative findings into practical tools with real-world impact. Strong communication skills and openness to science communication/public engagement (as highlighted in Q12–13) will be an asset. |
| Subject Areas | Lung Health and Tuberculosis, Health Economics, Epidemiology, |
| Key Publications associated with this project |
https://bmcglobalpublichealth.biomedcentral.com/articles/10.1186/s44263-024-00063-4 https://pubmed.ncbi.nlm.nih.gov/39100507/ https://pubmed.ncbi.nlm.nih.gov/37217868/ |