Chibuikem Nwizu to Defend Dissertation on Single-Cell Data

The Center for Computational Molecular Biology (CCMB) is pleased to announce Chibuikem Nwizu’s defense on Monday, November 3 at 9am in the Data Science Institute’s Seminar Space!

Nwizu is to defend under the advisory of Dr. Lorin Crawford. Here is a bit of information about the dissertation:

Understanding how cells adopt and transition between transcriptional states is central to deciphering tissue organization, disease progression, and therapeutic response. Single-cell RNA sequencing (scRNA-seq) provides a powerful platform for characterizing this heterogeneity at unprecedented resolution. However, current computational methods for identifying and interpreting cell states face several major challenges. They often rely on clustering heuristics that assume a fixed number of states, separate the inference of states from the discovery of their defining gene markers, and fail to account for how cellular state composition varies across biological contexts such as time, perturbation, or patient background. These limitations hinder the interpretability and actionability of inferred states, constraining their potential as biomarkers in precision medicine.

This dissertation develops a family of scalable, interpretable, and probabilistically grounded models for the unsupervised characterization of transcriptomic cell states in single-cell data. The proposed framework addresses three key methodological needs: (i) the joint inference of cell states and their associated gene markers, (ii) the relaxation of assumptions regarding the number of latent states, and (iii) the probabilistic modeling of state persistence and context-dependent variation over biological time. The models build on Bayesian nonparametric principles—particularly the Dirichlet and dependent Dirichlet process mixture frameworks—and employ sparse priors to identify gene-level markers that distinguish cell states within high-dimensional expression space. Variational inference algorithms are derived to enable scalable estimation on datasets containing millions of cells.

In the first part of the thesis, a sparse nonparametric Bayesian clustering model is developed that infers both the number of cell states and their characteristic gene expression patterns directly from expression data, without requiring prior cluster specification. The second part introduces NCLUSION, a scalable variational implementation that jointly performs clustering and marker selection, providing statistically robust and biologically interpretable results. The final part presents i-NCLUSION, a hierarchical extension that integrates experimental structure via a directed acyclic graph representation, allowing for dynamic modeling of cell state preferences across related contexts such as time points or patient samples. Applied to large-scale datasets—including an 11-million-cell mouse developmental time series and longitudinal profiles of human breast milk—these methods outperform structure-agnostic baselines and uncover biologically consistent patterns of cellular adaptation.

Collectively, this work contributes a unified probabilistic framework for defining, interpreting, and contextualizing cell states as actionable biomarkers. By combining statistical rigor, scalability, and biological interpretability, the models developed here lay the foundation for more transparent and clinically relevant analyses of single-cell data, advancing the broader goals of precision medicine and systems-level understanding of cellular behavior.

Congratulations Chib!

Alan Bidart
Alan Bidart
Graduate Student in Chemistry