Global ETD Search

Return to search

Poisson multiscale methods for high-throughput sequencing data

In this dissertation, we focus on the problem of analyzing data from high-throughput sequencing experiments. With the emergence of more capable hardware and more efficient software, these sequencing data provide information at an unprecedented resolution. However, statistical methods developed for such data rarely tackle the data at such high resolutions, and often make approximations that only hold under certain conditions. We propose a model-based approach to dealing with such data, starting from a single sample. By taking into account the inherent structure present in such data, our model can accurately capture important genomic regions. We also present the model in such a way that makes it easily extensible to more complicated and biologically interesting scenarios. Building upon the single-sample model, we then turn to the statistical question of detecting differences between multiple samples. Such questions often arise in the context of expression data, where much emphasis has been put on the problem of detecting differential expression between two groups. By extending the framework for a single sample to incorporate additional group covariates, our model provides a systematic approach to estimating and testing for such differences. We then apply our method to several empirical datasets, and discuss the potential for further applications to other biological tasks. We also seek to address a different statistical question, where the goal here is to perform exploratory analysis to uncover hidden structure within the data. We incorporate the single-sample framework into a commonly used clustering scheme, and show that our enhanced clustering approach is superior to the original clustering approach in many ways. We then apply our clustering method to a few empirical datasets and discuss our findings. Finally, we apply the shrinkage procedure used within the single-sample model to tackle a completely different statistical issue: nonparametric regression with heteroskedastic Gaussian noise. We propose an algorithm that accurately recovers both the mean and variance functions given a single set of observations, and demonstrate its advantages over state-of-the art methods through extensive simulation studies.

http://pqdtopen.proquest.com/#viewpdf?dispub=10195268

Biostatistics|Genetics|Statistics

Identifer	oai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:10195268
Date	21 December 2016
Creators	Xing, Zhengrong
Publisher	The University of Chicago
Source Sets	ProQuest.com
Language	English
Detected Language	English
Type	thesis

Page generated in 0.0013 seconds

Poisson multiscale methods for high-throughput sequencing data

Description

Links & Downloads

Tags

Additional Fields