31 |
A Medicago Sativa Draft Genome using Next Generation Sequencing Reads from Reduced Representation LibrariesYang, Le 26 March 2012 (has links)
Medicago sativa (Alfalfa) is an important agricultural plant for animal forage and nitrogen fixation, and has potential value in ligno-cellulosic energy production. In the quest to understand the plant, I generated a draft genome sequence of M. sativa via two reduced representation sequencing approaches: methylation-dependent filtration, and high CoT filtration. Libraries created from each approach were sequenced on an Illumina next-generation sequencing platform yielding approximately 2.5Gb of raw data. A combination of reference-based genome assembly approaches using the closely related species, Medicago truncatula as a reference, and de novo genome assembly approaches were performed to assemble the draft genome. The reference-based assembly generated 312,011 contigs with weighted median contig length (N50) of 247 bases, whereas de novo assembly produced 547,304 contigs with N50 of 275 bases. The creation of the M. sativa draft genome is vital for downstream functional analyses such as genome wide gene mining and gene expression profiling.
|
32 |
Acceleration of Coevolution Detection for Predicting Protein InteractionsRodionov, Alexandr 25 August 2011 (has links)
Protein function is the ultimate expression of the genetic code of every organism, and determining which proteins interact helps reveal their functions. MatrixMatchMaker (MMM) is a computational method of predicting protein-protein interactions that works by detecting co-evolution between pairs of proteins. Although MMM has several advanced features compared to other co-evolution-based methods, these come at the cost of high computation, and so the goal of this research is to improve the performance of MMM. First we redefine the computational problem posed by the method, and then develop a new algorithm to solve it, achieving a total speedup of 570x over the existing MMM algorithm for a biologically meaningful data set. We also develop hardware which has not yet succeeded in further improving the performance of MMM, but could serve as a platform that could lead to further gains.
|
33 |
Characterization of Friable1-like Homologues in Arabidopsis using Bioinformatics and Reverse GeneticsHsieh, Chih-Cheng Sherry 10 August 2009 (has links)
The FRIABLE1 (FRB1) gene is identified to be a novel glycosyltransferase involved in cell adhesion, based on reverse genetics and immunocytochemistry studies. A total of 31 FRB1 paralogues were found in Arabidopsis thaliana using a bioinformatics approach. The following expression analysis has revealed 6 FRB1 paralogues to be pollen-specific. One pollen-specific FRB1 paralogue, At1g14970, exhibits longer silique lengths when exposed to higher than normal temperature at 28oC in its T-DNA insertional knockout when compared to Columbia wildtype plants. This may be due to the loss of temperature sensing and the continuous stimulated pollen tube cell wall growth or the up-regulation of genes that encode other glycosyltransferases. Thus, the identification of FRB1 paralogues and homologues in both rice and poplar may have tremendous potential to increase their yield in global warming for agricultural and industrial benefits.
|
34 |
Bayesian Hidden Markov Models for finding DNA Copy Number Changes from SNP Genotyping ArraysKowgier, Matthew 31 August 2012 (has links)
DNA copy number variations (CNVs), which involve the deletion or duplication of subchromosomal segments of the genome, have become a focus of genetics research. This dissertation develops Bayesian HMMs for finding CNVs from single nucleotide polymorphism (SNP) arrays.
A Bayesian framework to reconstruct the DNA copy number sequence from the observed sequence of SNP array measurements is proposed. A Markov chain Monte Carlo (MCMC) algorithm, with a forward-backward stochastic algorithm for sampling DNA copy number sequences, is developed for estimating model parameters. Numerous versions of Bayesian HMMs are explored, including a discrete-time model and different models for the instantaneous transition rates of change among copy number states of a continuous-time HMM. The most general model proposed makes no restrictions and assumes the rate of transition depends on the current state, whereas the nested model fixes some of these rates by assuming that the rate of transition is independent of the current state. Each model is assessed using a subset of the HapMap data. More general parameterizations of the transition intensity matrix of the continuous-time Markov process produced more accurate
inference with respect to the length of CNV regions. The observed SNP array measurements are assumed to be stochastic with distribution determined by the underlying DNA copy number. Copy-number-specific distributions, including a non-symmetric
distribution for the 0-copy state (homozygous deletions) and mixture distributions for 2-copy state (normal), are developed and shown to be more appropriate than existing implementations which lead
to biologically implausible results.
Compared to existing HMMs for SNP array data, this approach is more flexible in that model parameters are estimated from the data rather than set to a priori values. Measures of uncertainty, computed as simulation-based probabilities, can be determined for putative CNVs detected by the HMM. Finally,
the dissertation concludes with a discussion of future work, with special attention given to model extensions for multiple sample analysis and family trio data.
|
35 |
NAViGaTing the Micronome: A Systematic Study of both the External Effects of MicroRNAs on Gene Repression networks, and the Contribution of microRNA Terminal Loops to MicroRNA FunctionShirdel, Elize Astghik 07 January 2013 (has links)
The first aim of this thesis is to examine relationships between microRNAs targeting gene networks, combining knowledge from microRNA prediction databases into our microRNA Data Integration Portal (mirDIP). Modeling the microRNA:transcript interactome – referred to as the micronome – to build microRNA interaction networks of signalling pathways, we find genes within signalling pathways to be co-targeted by common microRNAs suggesting an unexpected level of transcriptional control. We identify two distinct classes of microRNAs; universe microRNAs, which are involved in many signalling pathways; and intra-pathway microRNAs, which target multiple genes within one signalling pathway. We find universe microRNAs to have more targets, to be more studied and more involved in cancer signalling than their intrapathway counterparts.
The second aim was to undertake a more focused view, analyzing the characteristics of microRNAs within the micronome itself beginning with a focus on the under-examined microRNA terminal loop across the micronome to determine if this region of the microRNA structure might contribute to microRNA functioning. We have identified 2 main classes of microRNAs based on loop structure – perfect and occluded, which show biological relevance. We found regulatory motifs within microRNA terminal loops and found a large number of Frequently Occurring Words (FOWs) significantly overrepresented across the micronome. Set analysis of in vitro secreted microRNAs, microRNA expression across a panel of normal tissues, and microRNAs shown to be secreted in lung cancer shows that specific microRNA loop motifs within these groups are significantly overreperesented – suggesting that microRNA terminal loops harbour sequences bearing microRNA processing and localization signals.
|
36 |
Dynamic Structures of Protein Interaction Networks Predict Complex Phenotypes of Biological SystemsTaylor, Ian 28 February 2013 (has links)
This work focuses on the use of network graph theory in biological networks. I explore how network graph theory informs our understanding of biological networks such as protein interaction networks. I show that the human protein interaction network forms dynamic, modular structures that organize cell signaling pathways into higher order units. The misregulation of the dynamic, modular structure of the protein interaction network in breast cancer tumours is associated with outcome of the disease, suggesting that the altered structure of the protein interaction network is directly related to the phenotype of the tumour. I also demonstrate that the human protein interaction network is fractal in nature and thus forms self-similar structures within the network. The fractal skeletons of the protein interaction network contain critical information and therefore can be used alone in determining the phenotype of breast cancer tumours by examing the disruption of dynamic network structures. The self-similar fractal backbones deconvolve the protein interaction network into layers of independent function, resulting in improved description of breast cancer outcome using the dynamic network modularity algorithm. Finally, I discuss how the discoveries and technologies described within can be improved and how these discoveries can lead to a network based modality of medicine.
|
37 |
Inferring the Binding Preferences of RNA-binding ProteinsHilal, Kazan 17 December 2012 (has links)
Post-transcriptional regulation is carried out by RNA-binding proteins (RBPs) that bind to specific RNA molecules and control their processing, localization, stability and degradation. Experimental studies have successfully identified RNA targets associated with specific RBPs. However, because the locations of the binding sites within the targets are unknown and because RBPs recognize both sequence and structure elements in their binding sites, identification of RBP binding preferences from these data remains challenging.
The unifying theme of this thesis is to identify RBP binding preferences from experimental data. First, we propose a protocol to design a complex RNA pool that represents diverse sets of sequence and structure elements to be used in an in vitro assay to efficiently measure RBP binding preferences. This design has been implemented in the RNAcompete method, and applied genome-wide to human and Drosophila RBPs. We show that RNAcompete-derived motifs are consistent with established binding preferences.
We developed two computational models to learn binding preferences of RBPs from large-scale data. Our first model, RNAcontext uses a novel representation of secondary structure to infer both sequence and structure preferences of RBPs, and is optimized for use with in vitro binding data on short RNA sequences. We show that including structure information improves the prediction accuracy significantly. Our second model, MaLaRKey, extends RNAcontext to fit motif models to sequences of arbitrary length, and to incorporate a richer set of structure features to better model in vivo RNA secondary structure. We demonstrate that MaLaRKey infers detailed binding models that accurately predict binding of full-length transcripts.
|
38 |
High Throughput Prediction of Critical Protein Regions Using Correlated Mutation AnalysisXu, Yongbai 29 July 2010 (has links)
Correlated mutation analysis is an effective approach for predicting functional and structural residue interactions from protein multiple sequence alignments. A prediction pipeline over the Pfam database was developed to predict residue contacts within protein domains. Cross- reference with the PDB showed these contacts are spatially close. Furthermore, we found our predictions to be biochemically reasonable and correspond closely with known contact matrices. This large-scale search for coevolving regions within protein domains revealed that if two sites in an alignment covary, then neighboring sites in the alignment would also typically covary, resulting in clusters of covarying residues. The program PatchD was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibited strong covariation identified multiple residues that were generally nearby in the protein structures, suggesting that the detection of covarying patches can be used in addition to traditional CMA approaches to reveal functional interaction partners.
|
39 |
Mechanism of MicroRNA miR-520g Pathogenesis in CNS-PNETShih, J. H. David 25 August 2011 (has links)
We recently discovered a high-level amplicon spanning the chr19q13.41 microRNA cluster in CNS Primitive Neuroectodermal Tumour, which results in striking upregulation of miR-520g. Constitutive over-expression of miR-520g in untransformed human neural stem cells enhanced cell growth, restricted differentiation down the neuronal lineage, and promoted expression of neural stem/progenitor cell markers. We thus hypothesize that ectopic miR-520g expression promotes tumourigenesis in part by inhibiting cellular differentiation. Consistent with this proposition, miR-520g is silenced upon embryonic stem cell differentiation and its expression is absent from most adult tissues. Moreover, expression analysis of miR-520g overexpressing cells revealed significant dysregulation of developmental signalling pathways. Further efforts focused on elucidating mechanisms of miR-520g function led to the identification of a cell cycle inhibitor, p21, as an important candidate target. These findings collectively suggest that miR-520g may modulate differentiation by regulating developmental signalling pathways and cell cycle exit of neural stem/progenitor cells.
|
40 |
Identifying Tissue Specific Distal Regulatory Sequences in the Mouse GenomeChen, Chih-yu 06 December 2011 (has links)
Epigenetic modifications, transcription factor (TF) availability and chromatin conformation influence how a genome is interpreted by the transcriptional machinery responsible for gene expression. Enhancers buried in non-coding regions are associated with significant differences in histone marks between different cell types. In contrast, gene promoters show more uniform modifications across cell types. In this report, enhancer identification is first carried out using an enhancer associated feature in mouse erythroid cells. Taking advantage of public domain ChIP-Seq data sets in mouse embryonic stem cells, an integrative model is then used to assess features in enhancer prediction, and subsequently locate enhancers. Significant associations with multiple TF bound loci, higher expression in the closest genes, and active enhancer marks support functionality and tissue-specificity of these enhancers. Motif enrichment analysis further determines known and novel TFs regulating the target cell type. Furthermore, the features identified can facilitate more accurate enhancer prediction in other cell types.
|
Page generated in 0.0144 seconds