Understanding the behavior of a cell requires that its molecular constituents, such as mRNA or protein levels, be profiled quantitatively. Typically, these measurements are performed in bulk and represent values aggregated from thousands of cells. Insights from such data can be very useful, but the loss of single-cell resolution can prove misleading for heterogeneous tissues and in diseases like cancer.
Recently, technological advances have allowed us to profile multiple cellular parameters simultaneously at single-cell resolution, for thousands to millions of cells. While this provides an unprecedented opportunity to learn new biology, analyzing such massive and high-dimensional data requires efficient and accurate computational tools to extract the underlying biological phenomena. Such methods must take into account biological properties such as non-linear dependencies between measured parameters.
In this dissertation, I contribute to the development of tools from harmonic analysis and computational geometry to study the shape and geometry of single-cell data collected using mass cytometry and single-cell RNA sequencing (scRNA-seq). In particular, I focus on diffusion maps, which can learn the underlying structure of the data by modeling cells as lying on a low-dimensional phenotype manifold embedded in high dimensions. Diffusion maps allow non-linear transformation of the data into a low-dimensional Euclidean space, in which pairwise distances robustly represent distances in the high-dimensional space. In addition to the underlying geometry, this work also attempts to study the shape of the data using archetype analysis. Archetype analysis characterizes extreme states in the data and complements traditional approaches such as clustering. It facilitates analysis at the boundary of the data enabling potentially novel insights about the system.
I use these tools to study how the negative costimulatory molecules Ctla4 and Pdcd1 affect T-cell differentiation. Negative costimulatory molecules play a vital role in attenuating T-cell activation, in order to maintain activity within a desired physiological range and prevent autoimmunity. However, their potential role in T cell differentiation remains unknown. In this work, I analyze mass cytometry data profiling T cells in control and Ctla4- or Pdcd1-deficient mice and analyze differences using the tools above. I find that genetic loss of Ctla4 constrains CD4+ T-cell differentiation states, whereas loss of Pdcd1 subtly constrains CD8+ T-cell differentiation states. I propose that negative costimulatory molecules place limits on maximal protein expression levels to restrain differentiation states.
I use similar approaches to study breast cancer cells, which are profiled using scRNA-seq as they undergo the pathological epithelial-to-mesenchymal transition (EMT). For this work, I introduce Markov Affinity based Graph Imputation of Cells (MAGIC), a novel algorithm designed in our lab to denoise and impute sparse single-cell data. The mRNA content of each cell is currently massively undersampled by scRNA-seq, resulting in 'zero' expression values for the majority of genes in a large fraction of cells. MAGIC circumvents this problem by using a diffusion process along the data to share information between similar cells and thereby denoise and impute expression values. In addition to MAGIC, I apply archetype analysis to study various cellular stages during EMT, and I find novel biological processes in the previously unstudied intermediate states.
The work presented here introduces a mathematical modeling framework and advanced geometric tools to analyze single-cell data. These ideas can be generally applied to various biological systems. Here, I apply them to answer important biological questions in T cell differentiation and EMT. The obtained knowledge has applications in our basic understanding of the process of EMT, T cell biology and in cancer treatment.
Identifer | oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/d8-cvyg-kx02 |
Date | January 2019 |
Creators | Sharma, Roshan |
Source Sets | Columbia University |
Language | English |
Detected Language | English |
Type | Theses |
Page generated in 0.0025 seconds