Global ETD Search

Return to search

High-Throughput Data Analysis: Application to Micronuclei Frequency and T-cell Receptor Sequencing

The advent of high-throughput sequencing has brought about the creation of an unprecedented amount of research data. Analytical methodology has not been able to keep pace with the plethora of data being produced. Two assays, ImmunoSEQ and the cytokinesisblock micronucleus (CBMN), that both produce count data and have few methods available to analyze them are considered.
ImmunoSEQ is a sequencing assay that measures the beta T-cell receptor (TCR) repertoire. The ImmunoSEQ assay was used to describe the TCR repertoires of patients that have undergone hematopoietic stem cell transplantation (HSCT). Several different methods for spectratype analysis were extended to the TCR sequencing setting then applied to these data to demonstrate different ways the data set can be analyzed. The different methods include CDR3 distribution perturbation, Oligoscores, Simpson's diversity, Shannon diversity, Kullback-Liebler divergence, a non-parametric method and a proportion logit transformation method. Herein we also demonstrate adapting compositional data analysis methods to the TCR sequencing setting. The various methods were compared when analyzing a set of 13 subjects who underwent hematopoietic stem cell transplantation. The eight subjects who developed graft versus host disease were compared to the five who did not. There was no little overlap in the results of the different methods showing that researchers must choose the appropriate method for their research question of interest.
The CBMN assay measures the rate of micronuclei (MN) formation in a sample of cells and can be paired with gene expression or methylation assays to determine association between MN formation and other genetic markers. Herein we extended the generalized monotone incremental forward stagewise (GMIFS) method to the situation where the response is count data and there are more independent variables than there are samples. Our Poisson GMIFS method was compared to a popular alternative, glmpath, by using simulations and applying both to real data. Simulations showed that both methods perform similarly in accurately choosing truly significant variables. However, glmpath appears to overfit compared to our GMIFS method. Finally, when both methods were applied to two data sets GMIFS appeared to be more stable than glmpath.

GMIFS

Micronuclei

high-throughput sequencing

TCR

hematopoietic stem cell transplantation

gene expression

Biostatistics

Genetic Processes

Other Immunology and Infectious Disease

Identifer	oai:union.ndltd.org:vcu.edu/oai:scholarscompass.vcu.edu:etd-4966
Date	01 January 2015
Creators	Makowski, Mateusz
Publisher	VCU Scholars Compass
Source Sets	Virginia Commonwealth University
Detected Language	English
Type	text
Format	application/pdf
Source	Theses and Dissertations
Rights	© The Author

Page generated in 0.0021 seconds

High-Throughput Data Analysis: Application to Micronuclei Frequency and T-cell Receptor Sequencing

Description

Links & Downloads

Tags

Additional Fields