Bioinformatics Service
I established and lead the phenotypic screening facility’s bioinformatics service, providing bespoke computational support to academic and industry users. This chargeable service supports a wide range of life science disciplines, including cancer biology, immunology, neuroscience, developmental biology, and skin research.
I specialise in deriving insights from complex imaging and omics datasets and in developing scalable, reproducible analysis pipelines to advance cutting-edge research.
I also promote best practice and version control through the facility GitHub.
Machine Learning (ML) and Artificial Intelligence (AI)
Techniques:- Supervised learning: Logistic regression, Lasso regularisation, Elastic Net, Support Vector Machines (SVM), Random Forest, Multiple Discriminant Analysis, Neural Networks
- Unsupervised learning: Hierarchical clustering, k-means clustering, Principal Component Analysis (PCA), Exploratory Factor Analysis (EFA)
- SAMP-Score: Ensemble machine learning model for senescence detection to support drug discovery in cancer - Project Link
- SenPred: Senescence classification model for single-cell RNA sequencing transcriptomic data developed from 3D in vitro cell culture models - Project Link
- Prognostic multiplexed immunofluorescence model utilising the spatial assessment in oral epithelial dysplasia (manuscript in preparation)
High-Content Image Analysis
Techniques:- Analysis: Z-Scores, cluster-analysis, IC50s, batch normalisation
- Visualisations: Heatmaps, frequency distributions, dimensionality reduction (UMAP/TSNE)
- Unsupervised characterisation of senescence via phenotypic assessment of morphology - Project Link
- Phenocopying methodology comparing a genome-wide siRNA screen to a novel compound for target identification (in evaluation phase with commercial partner) - Press Release
- Developed reusable pipelines and HTML guides for users to standardise imaging data analysis - GitHub
Spatial Biology
Develop bespoke and sophisticated analysis pipelines for users of our Cell DIVE and Phenocycler Fusion multiplex immunofluorescence platforms via the HALO digital pathology software. I have also datamined published spatial transcriptomics datasets for users as a service.
Techniques:- Cluster-based cell phenotyping utilising dimensionality reduction and subject knowledge to assign cell types
- K-nearest neighbour analysis to determine cell type tissue distribution
- Cell Neighbourhood analysis to determine cell type enrichment within tissue
- Spatial distance and proximity assessments between cells and tissue architecture
High Performance Computing
I have experience analysing both my own and user projects through the university's High Performance Computing (HPC) cluster via both a web interface and the command line.
Genomic / Proteomic Analysis
Projects:Formal Courses
- DataCamp “Data Scientist with R” track (88-hour online course).
- Biochemical Society “R for Biochemists 101” (5-week online introduction to R).
- LinkedIn Learning “Become a Data Scientist” (17-hour introductory course).
- DataCamp “Data Analyst with Python” track (36-hour online course).
- DataCamp "Machine Learning Fundamentals with Python" track (16-hour online course).