My current research focuses on the mechanistic interpretability of machine learning, specifically using sparse autoencoders. This work combines my interests in computational modeling and complex systems, now applied to understanding the inner workings of AI.
I was previously a postdoc working at the Cancer Institute at University College London. My main research interests combine biology and computer science, using computational modelling to predict cancer evolution and plan treatment programmes to avert or overcome resistance.
I completed my PhD at the the University of Cambridge, looking at how computational network models could be used to find more effective combination treatments for breast cancer. As a postdoc at the Fisher Lab in the UCL Cancer Institute I built upon this work in order to predict resistance mechanisms to radiotherapy and to find the most effective patient-specific treatments to overcome them.
I am keen to share my knowledge and expertise with others. As a mentor to postdocs, PhD students, Masters students, and undergraduates, I take pride in helping to guide and inspire the next generation of scientists. If you have any questions about my work, please get in touch with the contact form below.
I am continuing my work on triple-negative breast cancer, non-small cell lung cancer and the automation of network generation using message-passing graph neural networks in the Jasmin Fisher lab.
Responsibilities included:
As part of the Jasmin Fisher lab, I continued my work on DNA damage repair, applying this to breast cancer and lung cancer. I also developed new methods for studying cancer evolution. During the pandemic, my colleagues and I demonstrated how our work on cancer could be applied to infectious disease, to predict drug repurposing for rapid response to the COVID-19 epidemic.
Responsibilities included:
I worked to support Promatix Ltd to accelerate the hunt for oncology therapeutics as part of their collaboration with the Jasmin Fisher lab.
Responsibilities included:
With funding from CRUK RadNet, I worked in the Jasmin Fisher lab on modelling the DNA damage response pathway to find radiosensitising drug treatments. I also continued my work on breast cancer, focussing on the triple-negative sub-type (TNBC) as part of a collaboration with the Mark Foundation for Cancer Research and PARTNER clinical trial. In collaboration with Jasmin Fisher I published a review on the opportunities and challenges for executable modelling.
Responsibilities included:
Thanks to generous funding from the Glover Fund, I was able to build upon my work on the evolution of cancer, expanding it to understand how the behaviour of blood cancers is determined by the order in which they acquire mutations, even when their final mutational profile is identical as part of a collaboration of the Jasmin Fisher lab with the Hall and Kent labs.
Responsibilities included:
As part of the Wellcome Trust Mathematical Genomics and Medicine PhD programme, studied as part of the University of Cambridge Computational Biology MPhil, while also undertaking rotations in different labs. For my PhD I was supervised by Jasmin Fisher in the University of Cambridge Department of Biochemistry and Microsoft Research Cambridge, and my advisor was Trevor Littlewood. During my PhD, I developed a network model of breast cancer, as well as novel methods to study combination treatment and the evolution of cancers. Using these, in collaboration with the Gerard Evan lab, we were able to identify and validate a novel combination treatment for breast cancer, exploiting the heterogeneity observed in myc driven breast cancers. My PhD examiners were Bertie Göttgens and Francesca Buffa.
Responsibilities included:
The disease burden from non-small cell lung cancer (NSCLC) adenocarcinoma is substantial, with around a million new cases diagnosed globally each year, and a 5-year survival rate of less than 20%. A lack of therapeutic options personalized to individual patient genetics, and the targeted therapies that exist quickly succumbing to resistance, leads to high variation in survival. Patient stratification combined with greater personalisation of therapies have the potential to improve outcomes, however, the wide variation in mutations found in NSCLC adenocarcinoma patients mean that experimentally determining suitable treatment combinations is time-consuming and expensive. Here we present an in silico model encompassing tumour intrinsic key oncogenic signalling pathways, including EGFR, AKT, JAK/STAT and WNT for efficiently predicting rational drug-drug and drug-radiotherapy combination therapies in NSCLC. Using this model, we simulate diverse genetic profiles and test over 10,000 therapeutic strategies to identify optimal strategies to overcome resistance mechanisms specific to genetic profiles and p53 status. Our in silico model reproduces drug additivity experiments, predicts radio-sensitising genes validated in a CRISPR screen and identifies 53BP1 as a potential drug target that improves the therapeutic window during radiotherapy, as well as potential to use ATM inhibitors to overcome p53 loss-of-function driven radiotherapy resistance. We further use the in silico model to identify a 19-gene signature to stratify patients most likely to benefit from radiotherapy and validated this using TCGA data. These results further demonstrate the utility of in silico mechanistic modelling and present a bespoke computational resource for large-scale screening of personalised therapies applied to NSCLC.
Sparse AutoEncoder (SAE) latents show promise as a method for extracting interpretable features from large language models (LM), but their overall utility for mechanistic understanding of LM remains unclear. Ideal features would be linear and independent, but we show that there exist SAE latents in GPT2-Small and Gemma-2-2b that display non-independent behaviour, especially in small SAEs. Rather, latents co-occur in clusters that map out interpretable subspaces, leading us to question how independent these latents are, and how does this depend on the SAE? We find that these subspaces show latents acting compositionally, as well as being used to resolve ambiguity in language, though SAE latents remain largely independently interpretable within these contexts despite this behaviour. Latent clusters decrease in both size and prevalence as SAE width increases, suggesting this is a phenomenon of small SAEs with coarse-grained latents. Our findings suggest that better understanding of how LM process information can be achieved in some cases by understanding SAE latents as being able to form functional units that are interpreted as a whole.
The COVID-19 pandemic has pushed healthcare systems globally to a breaking point. The urgent need for effective and affordable COVID-19 treatments calls for repurposing combinations of approved drugs. The challenge is to identify which combinations are likely to be most effective and at what stages of the disease. Here, we present the first disease-stage executable signalling network model of SARS-CoV-2-host interactions used to predict effective repurposed drug combinations for treating early- and late stage severe disease. Using our executable model, we performed in silico screening of 9870 pairs of 140 potential targets and have identified nine new drug combinations. Camostat and Apilimod were predicted to be the most promising combination in effectively supressing viral replication in the early stages of severe disease and were validated experimentally in human Caco-2 cells. Our study further demonstrates the power of executable mechanistic modelling to enable rapid pre-clinical evaluation of combination therapies tailored to disease progression. It also presents a novel resource and expandable model system that can respond to further needs in the pandemic.
Making decisions on how best to treat cancer patients requires the integration of different data sets, including genomic profiles, tumour histopathology, radiological images, proteomic analysis and more. This wealth of biological information calls for novel strategies to integrate such information in a meaningful, predictive and experimentally verifiable way. In this Perspective we explain how executable computational models meet this need. Such models provide a means for comprehensive data integration, can be experimentally validated, are readily interpreted both biologically and clinically, and have the potential to predict effective therapies for different cancer types and subtypes. We explain what executable models are and how they can be used to represent the dynamic biological behaviours inherent in cancer, and demonstrate how such models, when coupled with automated reasoning, facilitate our understanding of the mechanisms by which oncogenic signalling pathways regulate tumours. We explore how executable models have impacted the field of cancer research and argue that extending them to represent a tumour in a specific patient (that is, an avatar) will pave the way for improved personalized treatments and precision medicine. Finally, we highlight some of the ongoing challenges in developing executable models and stress that effective cross-disciplinary efforts are key to forward progress in the field.
Cells with higher levels of Myc proliferate more rapidly and supercompetitively eliminate neighboring cells. Nonetheless, tumor cells in aggressive breast cancers typically exhibit significant and stable heterogeneity in their Myc levels, which correlates with refractoriness to therapy and poor prognosis. This suggests that Myc heterogeneity confers some selective advantage on breast tumor growth and progression. To investigate this, we created a traceable MMTV-Wnt1 –driven in vivo chimeric mammary tumor model comprising an admixture of low-Myc– and reversibly switchable high-Myc–expressing clones. We show that such tumors exhibit interclonal mutualism wherein cells with high-Myc expression facilitate tumor growth by promoting protumorigenic stroma yet concomitantly suppress Wnt expression, which renders them dependent for survival on paracrine Wnt provided by low-Myc–expressing clones. To identify any therapeutic vulnerabilities arising from such interdependency, we modeled Myc/Ras/p53/Wnt signaling cross talk as an executable network for low-Myc, for high-Myc clones, and for the 2 together. This executable mechanistic model replicated the observed interdependence of high-Myc and low-Myc clones and predicted a pharmacological vulnerability to coinhibition of COX2 and MEK. This was confirmed experimentally. Our study illustrates the power of executable models in elucidating mechanisms driving tumor heterogeneity and offers an innovative strategy for identifying combination therapies tailored to the oligoclonal landscape of heterogenous tumors.