Doctorate

Novel efficient statistical methods for biobank-scale prediction from brain imaging

Caption

Project Supervisors

Dr Baptiste Couvy-Duchesne

Senior Research Officer

Background

The field of neuroimaging is at a turning point, owing to the availability of several large datasets such as the UK Biobank, which comprises more than 50,000 volunteers from the general population with deep phenotyping, multimodal MRI and genotyping data. In comparison, clinical samples currently comprise a few thousand individuals at most, though larger samples should be available soon. Such large data promise a finer understanding of the brain association with disorders as well as improved risk prediction, though they also raise computational and methodological challenges.


Aim

This project aims at building prediction algorithms able to deal with the large number of features that still far exceeds the number of participants. This requires efficient algorithms and models that can scale up to the data (UKB sample is expected to grow to 100,000 participants) and that can combine information from different samples. Prediction often relies on penalised regression or convolutional neural networks (CNNs) that become extremely costly to train or update on large sample sizes, lack interpretability (black boxes), and often require pulling raw data together from different studies. Summary statistics such as brain association maps represent a condensed, meaningful and de-identified set of information, which could facilitate the analysis of big-data.


Project Potential

Summary statistics are a promising way to jointly analyse multiple datasets (federated learning) without having to share or pull together the individual-level data, which often poses important ethical and legal difficulties, in particular in the case of clinical data. Lastly, methods based on summary statistics can be computationally savvy as they do not require large memory and long calculations, unlike most analyses performed on individual-level data