nature biotechnology advance online publication a n a ly s i s The ability to analyze multiple si... more nature biotechnology advance online publication a n a ly s i s The ability to analyze multiple single-cell parameters is critical for understanding cellular heterogeneity. Despite recent advances in measurement technology, methods for analyzing high-dimensional single-cell data are often subjective, labor intensive and require prior knowledge of the biological system. To objectively uncover cellular heterogeneity from single-cell measurements, we present a versatile computational approach, spanning-tree progression analysis of density-normalized events (SPADE). We applied SPADE to flow cytometry data of mouse bone marrow and to mass cytometry data of human bone marrow. In both cases, SPADE organized cells in a hierarchy of related phenotypes that partially recapitulated well-described patterns of hematopoiesis. We demonstrate that SPADE is robust to measurement noise and to the choice of cellular markers. SPADE facilitates the analysis of cellular heterogeneity, the identification of cell types and comparison of functional markers in response to perturbations.
We perform network inference ('reverse-engineering') on phospho-specific multi-dimensional flow c... more We perform network inference ('reverse-engineering') on phospho-specific multi-dimensional flow cytometry measurements of signaling molecules in human T cells using Bayesian networks. Inferred networks are found to have good agreement with known pathways derived from the literature.
Characterization of patient-specific disease features at a molecular level is an important emergi... more Characterization of patient-specific disease features at a molecular level is an important emerging field. Patients may be characterized by differences in the level and activity of relevant biomolecules in diseased cells. When high throughput, high dimensional data is available, it becomes possible to characterize differences not only in the level of the biomolecules, but also in the molecular interactions among them. We propose here a novel approach to characterize patient specific signaling, which augments high throughput single cell data with state nodes corresponding to patient and disease states, and learns a Bayesian network based on this data. Features distinguishing individual patients emerge as downstream nodes in the network. We illustrate this approach with a six phospho-protein, 30,000 cell-per-patient dataset characterizing three comparably diagnosed follicular lymphoma, and show that our approach elucidates signaling differences among them.
Flow cytometric measurement of signaling protein abundances has proved particularly useful for el... more Flow cytometric measurement of signaling protein abundances has proved particularly useful for elucidation of signaling pathway structure. The single cell nature of the data ensures a very large dataset size, providing a statistically robust dataset for structure learning. Moreover, the approach is easily scaled to many conditions in high throughput. However, the technology suffers from a dimensionality constraint: at the cutting edge, only about 12 protein species can be measured per cell, far from sufficient for most signaling pathways. Because the structure learning algorithm (in practice) requires that all variables be measured together simultaneously, this restricts structure learning to the number of variables that constitute the flow cytometer's upper dimensionality limit. To address this problem, we present here an algorithm that enables structure learning for sparsely distributed data, allowing structure learning beyond the measurement technology's upper dimensionality limit for simultaneously measurable variables. The algorithm assesses pairwise (or n-wise) dependencies, constructs “Markov neighborhoods” for each variable based on these dependencies, measures each variable in the context of its neighborhood, and performs structure learning using a constrained search.
nature biotechnology advance online publication a n a ly s i s The ability to analyze multiple si... more nature biotechnology advance online publication a n a ly s i s The ability to analyze multiple single-cell parameters is critical for understanding cellular heterogeneity. Despite recent advances in measurement technology, methods for analyzing high-dimensional single-cell data are often subjective, labor intensive and require prior knowledge of the biological system. To objectively uncover cellular heterogeneity from single-cell measurements, we present a versatile computational approach, spanning-tree progression analysis of density-normalized events (SPADE). We applied SPADE to flow cytometry data of mouse bone marrow and to mass cytometry data of human bone marrow. In both cases, SPADE organized cells in a hierarchy of related phenotypes that partially recapitulated well-described patterns of hematopoiesis. We demonstrate that SPADE is robust to measurement noise and to the choice of cellular markers. SPADE facilitates the analysis of cellular heterogeneity, the identification of cell types and comparison of functional markers in response to perturbations.
We perform network inference ('reverse-engineering') on phospho-specific multi-dimensional flow c... more We perform network inference ('reverse-engineering') on phospho-specific multi-dimensional flow cytometry measurements of signaling molecules in human T cells using Bayesian networks. Inferred networks are found to have good agreement with known pathways derived from the literature.
Characterization of patient-specific disease features at a molecular level is an important emergi... more Characterization of patient-specific disease features at a molecular level is an important emerging field. Patients may be characterized by differences in the level and activity of relevant biomolecules in diseased cells. When high throughput, high dimensional data is available, it becomes possible to characterize differences not only in the level of the biomolecules, but also in the molecular interactions among them. We propose here a novel approach to characterize patient specific signaling, which augments high throughput single cell data with state nodes corresponding to patient and disease states, and learns a Bayesian network based on this data. Features distinguishing individual patients emerge as downstream nodes in the network. We illustrate this approach with a six phospho-protein, 30,000 cell-per-patient dataset characterizing three comparably diagnosed follicular lymphoma, and show that our approach elucidates signaling differences among them.
Flow cytometric measurement of signaling protein abundances has proved particularly useful for el... more Flow cytometric measurement of signaling protein abundances has proved particularly useful for elucidation of signaling pathway structure. The single cell nature of the data ensures a very large dataset size, providing a statistically robust dataset for structure learning. Moreover, the approach is easily scaled to many conditions in high throughput. However, the technology suffers from a dimensionality constraint: at the cutting edge, only about 12 protein species can be measured per cell, far from sufficient for most signaling pathways. Because the structure learning algorithm (in practice) requires that all variables be measured together simultaneously, this restricts structure learning to the number of variables that constitute the flow cytometer's upper dimensionality limit. To address this problem, we present here an algorithm that enables structure learning for sparsely distributed data, allowing structure learning beyond the measurement technology's upper dimensionality limit for simultaneously measurable variables. The algorithm assesses pairwise (or n-wise) dependencies, constructs “Markov neighborhoods” for each variable based on these dependencies, measures each variable in the context of its neighborhood, and performs structure learning using a constrained search.
Uploads
Papers by Karen Sachs