Global ETD Search

301	Petal - A New Approach to Construct and Analyze Gene Co-Expression Networks in R Petereit, Julia 17 February 2017 (has links) <p> <b>petal</b> is a network analysis method that includes and takes advantage of precise Mathematics, Statistics, and Graph Theory, but remains practical to the life scientist. <b>petal</b> is built upon the assumption that large complex systems follow a scale-free and small-world network topology. One main intention of creating this program is to eliminate unnecessary noise and imprecision introduced by the user. Consequently, no user input parameters are required, and the program is designed to allow the two structural properties, scale-free and small-world, to govern the construction of network models. </p><p> The program is implemented in the statistical language <b>R</b> and is freely available as a package for download. Its package includes several simple <b>R</b> functions that the researcher can use to construct co-expression networks and extract gene groupings from a biologically meaningful network model. More advanced <b>R</b> users may use other functions for further downstream analyses, if desired. </p><p> The <b>petal</b> algorithm is discussed and its application demonstrated on several datasets. <b>petal</b> results show that the technique is capable of detecting biologically meaningful network modules from co-expression networks. That is, scientists can use this technique to identify groups of genes with possible similar function based on their expression information. </p><p> While this approach is motivated by whole-system gene expression data, the fundamental components of the method are transparent and can be applied to large datasets of many types, sizes, and stemming from various fields. </p>
302	Discovering driver somatic mutations, copy number alterations and methylation changes using Markov Chain Monte Carlo Yahya, Bokhari 11 December 2013 (has links) Nowadays we have tremendous amount of genetic data needing to be interpreted. Somatic mutations, copy number variations and methylation are example of the genetics data we are dealing with. Discovering driver mutations from these combined data types is challenging. Mutations are unpredictable and have broad heterogeneity, which makes our goal hard to accomplish. Many methods have been proposed to solve the mystery of genetics of cancer. In this project we manipulate those above mentioned genetics data types and choose to use and modified an existing method utilizing Markov Chain Monte Carlo (MCMC). The method introduced two properties, coverage and exclusivity. We obtained the data from The Cancer Genome Atlas (TCGA). We used MCMC method with three cancer types: Glioblastoma Multiform (GBM) with 214 patients, Breast Invasive Carcinoma (BRCA) with 474 patients and Colon Adenocarcinoma (COAD) with 233 patients. Bioinformatics Life Sciences
303	Gene set enrichment and projection\| A computational tool for knowledge discovery in transcriptomes Stamm, Karl D. 18 August 2016 (has links) <p> Explaining the mechanism behind a genetic disease involves two phases, collecting and analyzing data associated to the disease, then interpreting those data in the context of biological systems. The objective of this dissertation was to develop a method of integrating complementary datasets surrounding any single biological process, with the goal of presenting the response to a signal in terms of a set of downstream biological effects. This dissertation specifically tests the hypothesis that computational projection methods overlaid with domain expertise can direct research towards relevant systems-level signals underlying complex genetic disease. To this end, I developed a software algorithm named Geneset Enrichment and Projection Displays (GSEPD) that can visualize multidimensional genetic expression to identify the biologically relevant gene sets that are altered in response to a biological process. </p><p> This dissertation highlights a problem of data interpretation facing the medical research community, and shows how computational sciences can help. By bringing annotation and expression datasets together, a new analytical and software method was produced that helps unravel complicated experimental and biological data. </p><p> The dissertation shows four coauthored studies where the experts in their field have desired to annotate functional significance to a gene-centric experiment. Using GSEPD to show inherently high dimensional data as a simple colored graph, a subspace vector projection directly calculated how each sample behaves like test conditions. The end-user medical researcher understands their data as a series of somewhat-independent subsystems, and GSEPD provides a dimensionality reduction for high throughput experiments of limited sample size. Gene Ontology analyses are accessible on a sample-to-sample level, and this work highlights not just the expected biological systems, but many annotated results available in vast online databases.</p> Bioinformatics\|Computer science
304	Development and application of spectral databases and mathematical models in the study of plant natural products biosynthesis Johnson, Sean Robert 25 October 2016 (has links) <p> Plant natural products are useful for many different applications, including medicines, flavors and fragrances, and industrial uses. Two important aspects of plant natural products research are the identification of compounds in their source plants, and the characterization of the processes involved in their biosynthesis. To aid in the identification of plant natural products, we developed the Spektraris family of databases. These databases include highperformance liquid chromatography mass spectrometry data, and <sup>13</sup>C and <sup> 1</sup>H nuclear magnetic resonance data, which are searchable through an online interface. The utility of Spektraris was validated by using it to identify compounds in plant extracts and as part of a workflow to elucidate the structure of a previously undescribed compound. </p><p> Mints have a long history of use as model systems for studying the processes of terpene natural products biosynthesis in specialized plant tissues. The mint family (Lamiaceae), synthesizes and stores volatile terpenes in glandular trichomes. Using a comparative transcriptomic approach, we identified differences in gene expression of monoterpene biosynthetic genes among mint species with different oil profiles. We also assembled the genome of a mint species, <i> Mentha longifolia</i>. The genome assembly will be valuable for future mint research. </p><p> To further investigate biosynthetic processes in mint, I developed a detailed mathematical model of the metabolism of peppermint glandular trichomes. The model incorporates multiple sources of data, including transcriptome data, metabolite data, enzymatic data from the peppermint literature, and previously developed models of plant metabolism. The creation of a new metabolic modeling software package, called YASMEnv, facilitated construction of the model. Model-based simulated reaction knockouts using flux balance analysis revealed that fermentation may be important for ATP regeneration in secretory phase glandular trichomes. Follow up experiments confirmed high levels of alcohol dehydrogenase activity in secretory phase isolated trichomes. Simulations also supported an essential role for ferredoxin and ferredoxin-NADP reductase. Transcriptome analysis revealed the presence of an isoform of ferredoxin in trichomes distinct from the one expressed in root. The presence of a distinct ferredoxin isoform in trichomes supports the hypothesis that selection pressure for efficient natural products biosynthesis may also act on the enzymes of primary metabolism.</p>
305	Dinoflagellate genomic organization and phylogenetic marker discovery utilizing deep sequencing data Mendez, Gregory Scott 01 October 2016 (has links) <p> Dinoflagellates possess large genomes in which most genes are present in many copies. This has made studies of their genomic organization and phylogenetics challenging. Recent advances in sequencing technology have made deep sequencing of dinoflagellate transcriptomes feasible. This dissertation investigates the genomic organization of dinoflagellates to better understand the challenges of assembling dinoflagellate transcriptomic and genomic data from short read sequencing methods, and develops new techniques that utilize deep sequencing data to identify orthologous genes across a diverse set of taxa. To better understand the genomic organization of dinoflagellates, a genomic cosmid clone of the tandemly repeated gene Alchohol Dehydrogenase (AHD) was sequenced and analyzed. The organization of this clone was found to be counter to prevailing hypotheses of genomic organization in dinoflagellates. Further, a new non-canonical splicing motif was described that could greatly improve the automated modeling and annotation of genomic data. A custom phylogenetic marker discovery pipeline, incorporating methods that leverage the statistical power of large data sets was written. A case study on Stramenopiles was undertaken to test the utility in resolving relationships between known groups as well as the phylogenetic affinity of seven unknown taxa. The pipeline generated a set of 373 genes useful as phylogenetic markers that successfully resolved relationships among the major groups of Stramenopiles, and placed all unknown taxa on the tree with strong bootstrap support. This pipeline was then used to discover 668 genes useful as phylogenetic markers in dinoflagellates. Phylogenetic analysis of 58 dinoflagellates, using this set of markers, produced a phylogeny with good support of all branches. The <i>Suessiales</i> were found to be sister to the <i>Peridinales.</i> The <i>Prorocentrales </i> formed a monophyletic group with the Dinophysiales that was sister to the <i>Gonyaulacales.</i> The <i>Gymnodinales</i> was found to be paraphyletic, forming three monophyletic groups. While this pipeline was used to find phylogenetic markers, it will likely also be useful for finding orthologs of interest for other purposes, for the discovery of horizontally transferred genes, and for the separation of sequences in metagenomic data sets.</p> Biology\|Molecular biology\|Bioinformatics
306	Characterizing Gene Networks and RNA-Mediated Gene Regulation in Maize Unknown Date (has links) Controlling spatial-temporal gene expression patterns is a fundamental task for maize growth and development. With the emergence of massively parallel sequencing, genome-wide expression data production has reached an unprecedented level. This abundance of data has greatly facilitated maize research, but may not be amenable to traditional analysis techniques that were optimized for other data types. In one project, using publicly available data, a Gene Co-expression Network (GCN) was constructed and used for gene function prediction, candidate gene selection and improving understanding of regulatory pathways. To build an optimal GCN from plant materials RNA-Seq data, parameters for expression data normalization and network inference were evaluated. A comprehensive evaluation of these two parameters and ranked aggregation strategy on network performance using libraries from 1266 maize samples was conducted. Three normalization methods (VST, CPM, RPKM) and ten inference methods, including six correlation and four mutual information (MI) methods, were tested. The three normalization methods had very similar performance. For network inference, correlation methods performed better than MI methods at some genes. Increasing sample size also had a positive effect on GCN. Aggregating single networks together resulted in improved performance compared to single networks. In another project, a maize mutant, transgene reactivated 9-1 (tgr9-1) in the transcriptional gene silencing (TGS) pathway, was cloned. The B-A translocation lines were used to map tgr9-1 on chromosome 3 and this result was consistent with molecular markers. To further locate tgr9-1, next-generation sequencing (NGS) combined with bulk segregant analysis was applied to the tgr9-1 mapping population. Using coexpression analysis, our result indicates a maize dicer-like3a (Zmdcl3a) gene is a high-confidence candidate gene for tgr9. Zmdcl3a is involved in the RNA-directed DNA methylation (RdDM) pathway. This pathway is driven by two plant-specific DNA-dependent RNA polymerases, Polymerase IV (Pol IV) and Polymerase V (Pol V). Several kinds of non-coding RNAs are involved, including long single-stranded RNAs, double-stranded RNAs, and small interfering RNAs. The identification of tgr9-1 uncovered the role of non-coding RNAs in TGS and revealed the diversity of TGS pathways in maize. One primary focus of gene regulation study is by studying transcription factors (TFs). Transcription factors (TFs) are proteins that can bind to DNA sequences and regulate gene expression. Many TFs are master regulators in cells that contribute to tissue-specific and cell-type-specific gene expression patterns in eukaryotes. Little is known about tissue-specific gene regulation through TFs in maize. In this project, a network approach was applied to elucidate gene regulatory networks (GRNs) in four tissues (leaf, root, shoot apical meristem and seed) in maize. We used GENIE3 machine-learning algorithm combined with the large quantity of RNA-Seq expression data to construct four tissue-specific GRNs. Although many TFs were expressed across multiple tissues, a multi-tiered analysis predicted tissue-specific regulatory functions for many transcription factors. Some well-studied TFs emerged within the four tissue-specific GRNs, and the GRN predictions matched expectations based upon published results for many of these examples. The GRNs were also validated by ChIP-Seq datasets (KN1, FEA4, and O2). Key TFs were identified for each tissue and matched expectations for key regulators in each tissue, including GO enrichment and identity with known regulatory factors for that tissue. / A Dissertation submitted to the Department of Biological Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 2, 2018. / GENE EXPRESSION, MAIZE, NETWORK, RDDM, SMALL RNA, TRANSCRIPTION FACTOR / Includes bibliographical references. / Karen M. McGinnis, Professor Directing Dissertation; Alan R. Lemmon, University Representative; Kathryn M. Jones, Committee Member; Brian P. Chadwick, Committee Member; Jonathan H. Dennis, Committee Member. Biology Bioinformatics Biology--Classification
307	Discovery and interpretation of genetic variation with next‐generation sequencing technologies Quinlan, Aaron Ryan January 2008 (has links) Thesis advisor: Gabor T. Marth / Improvements in molecular and computational technologies have driven and will continue to drive advances in our understanding of genetic variation and its relationship to phenotypic diversity. Over the last three years, several new DNA sequencing technologies have been developed that greatly improve upon the cost and throughput of the capillary DNA sequencing technologies that were used to sequence the first human genome. The economy of these so‐called “next‐generation” technologies has enabled researchers to conduct genome‐wide studies in genetic variation that were previously intractable or too expensive. However, because the new technologies employ novel molecular techniques, the resulting sequence data is quite different from the capillary sequences to which the genomics field is accustomed. Moreover, the vast amounts of sequence data that these technologies produce present novel statistical and computational challenges in order to make even the simplest observations. The focus of my dissertation has been the development of novel computational and analytical methods that facilitate genome‐wide studies in genetic variation with traditional capillary sequencers and with new sequencing technologies. I present a novel method that produces more accurate error estimates for sequence data from one of these next‐generation sequencing technologies. I also present two studies that illustrate the utility of two such technologies for genome‐wide polymorphism discovery studies in Drosophila melanogaster and Caenorhabditis elegans. These studies accurately estimate the degree of genetic diversity in the fruitfly and nematode, respectively. I later describe how new sequencing approaches can be used to accelerate the mapping of causal genetic mutations in forward geetic screens. Lastly, I remark on where I believe these technologies will lead future studies in human genetic variation and describe their relevance to several of my future research interests. / Thesis (PhD) — Boston College, 2008. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Biology. genomics genetics bioinformatics
308	Enhancing Space and Time Efficiency of Genomics in Practice through Sophisticated Applications of the FM-Index Muggli, Martin D. 22 January 2019 (has links) <p> Genomic sequence data has become so easy to get that the computation to process it has become a bottleneck in the advancement of biological science. A data structure known as the FM-Index both compresses data and allows efficient querying, thus can be used to implement more efficient processing methods. In this work we apply advanced formulations of the FM-Index to existing problems and show our methods exceed the performance of competing tools. </p><p> Bioinformatics\|Computer science
309	Algorithms for Reconstruction of Gene Regulatory Networks from High-Throughput Gene Expression Data Deng, Wenping 15 February 2019 (has links) <p> Understanding gene interactions in complex living systems is one of the central tasks in system biology. With the availability of microarray and RNA-Seq technologies, a multitude of gene expression datasets has been generated towards novel biological knowledge discovery through statistical analysis and reconstruction of gene regulatory networks (GRN). Reconstruction of GRNs can reveal the interrelationships among genes and identify the hierarchies of genes and hubs in networks. The new algorithms I developed in this dissertation are specifically focused on the reconstruction of GRNs with increased accuracy from microarray and RNA-Seq high-throughput gene expression data sets. </p><p> The first algorithm (Chapter 2) focuses on modeling the transcriptional regulatory relationships between transcription factors (TF) and pathway genes. Multiple linear regression and its regularized version, such as Ridge regression and LASSO, are common tools that are usually used to model the relationship between predictor variables and dependent variable. To deal with the outliers in gene expression data, the group effect of TFs in regulation and to improve the statistical efficiency, it is proposed to use Huber function as loss function and Berhu function as penalty function to model the relationships between a pathway gene and many or all TFs. A proximal gradient descent algorithm was developed to solve the corresponding optimization problem. This algorithm is much faster than the general convex optimization solver CVX. Then this Huber-Berhu regression was embedded into partial least square (PLS) framework to deal with the high dimension and multicollinearity property of gene expression data. The result showed this method can identify the true regulatory TFs for each pathway gene with high efficiency. </p><p> The second algorithm (Chapter 3) focuses on building multilayered hierarchical gene regulatory networks (ML-hGRNs). A backward elimination random forest (BWERF) algorithm was developed for constructing an ML-hGRN operating above a biological pathway or a biological process. The algorithm first divided construction of ML-hGRN into multiple regression tasks; each involves a regression between a pathway gene and all TFs. Random forest models with backward elimination were used to determine the importance of each TF to a pathway gene. Then the importance of a TF to the whole pathway was computed by aggregating all the importance values of the TF to the individual pathway gene. Next, an expectation maximization algorithm was used to cut the TFs to form the first layer of direct regulatory relationships. The upper layers of GRN were constructed in the same way only replacing the pathway genes by the newly cut TFs. Both simulated and real gene expression data were used to test the algorithms and demonstrated the accuracy and efficiency of the method. </p><p> The third algorithm (Chapter 4) focuses on Joint Reconstruction of Multiple Gene Regulatory Networks (JRmGRN) using gene expression data from multiple tissues or conditions. In the formulation, shared hub genes across different tissues or conditions were assumed. Under the framework of the Gaussian graphical model, JRmGRN method constructs the GRNs through maximizing a penalized log-likelihood function. It was formulated as a convex optimization problem, and then solved it with an alternating direction method of multipliers (ADMM) algorithm. Both simulated and real gene expression data manifested JRmGRN had better performance than existing methods.</p><p>
310	On Being 'Fitter, Happier, and More Productive'\| The Impact of Implicit Goals in Affective Personal Informatics Hollis, Victoria 16 February 2019 (has links) <p> Personal informatics (PI) technologies allow unprecedented opportunities to track and analyze complex data about ourselves. However, a concern is that these technologies can make normative assumptions about user goals and ideal outcomes. Such assumptions could be especially problematic for Affective PI, as there is a risk that technologies which reflect implicit goals for users be more positive or reduce stress could ironically decrease well-being (Mauss et al., 2011). Furthermore, users could actively avoid PI data if they feel unable to meet the demands of the system (Duval & Wicklund, 1972), running counter to the view that users will engage data for beneficial insights (Kersten-van Dijk et al., 2017). We tested whether Affective PI systems that reflect goals for particular emotion outcomes (Improvement) have counterproductive effects for well-being and user engagement. These outcomes were contrasted against systems that instead reflect goals for Self-Knowledge, a top user interest (Hollis et al., 2018). Study 1 examined the effects of implicit goals in the context of an automatic stress detection and feedback system used during an exam. Participants viewed instructions that either describes the system goal as stress reduction (Improvement), stress reduction with a relaxation strategy (Self-Efficacy), accurate self-knowledge (Self-Knowledge), or saw no system goal (Control). Study 2 was a 21-day field study during which participants used a manual emotion-tracking web app that either emphasized a goal of increased positivity (Improvement), a goal of accurate self-knowledge (Self-Knowledge), or only completed pre-post surveys (Control). For each study, participants completed measures of well-being and engagement with the experimental systems. Across both studies, there were no significant condition differences in well-being. However, participants in the Self-Knowledge conditions of both studies considered themselves significantly more successful at achieving the system goals. As a result, Self-Knowledge participants were also more engaged with the stress- and emotion-tracking systems. Unlike prior work showing the ironic effects of emotional positivity goals, we show such negative impacts do not occur in this real-world context. We discuss these results with design implications for self-tracking systems and deepen the theoretical understandings of how users engage with PI.</p><p> Cognitive psychology\|Bioinformatics

Page generated in 0.0799 seconds