Spelling suggestions: "subject:"biology - bioinformatics"" "subject:"biology - ioinformatics""
91 |
Genetic analysis of differentiation of T-helper lymphocytesWang, Qixin 28 November 2013 (has links)
<p> In the human immune system, T-helper cells are able to differentiate into two lymphocyte subsets: Th1 and Th2. The intracellular signaling pathways of differentiation form a dynamic regulation network by secreting distinctive types of cytokines, while differentiation is regulated by two major gene loci: T-bet and GATA-3. We developed a system dynamics model to simulate the differentiation and re-differentiation process of T-helper cells, based on gene expression levels of T-bet and GATA-3 during differentiation of these cells. We arrived at three ultimate states of the model and came to the conclusion that cell differentiation potential exists as long as the system dynamics is at an unstable equilibrium point; the T-helper cells will no longer have the potential of differentiation when the model reaches a stable equilibrium point. In addition, the time lag caused by expression of transcription factors can lead to oscillations in the secretion of cytokines during differentiation.</p>
|
92 |
Genetic analysis of 100 loci for coronary artery disease and associated phenotypes in a founder populationParé, Guillaume. January 2006 (has links)
Coronary artery disease (CAD) is a major health concern for both developed and developing countries. With a heritability estimated at around 50%, there is a strong rationale to better define the genetic contribution of CAD. In order to do so, my thesis project consists in the genetic analysis of over 1400 individuals from the Saguenay Lac St-Jean region using 1536 single nucleotide polymorphisms in 103 candidate genes for CAD. Using this data, suggestive linkage for HDL cholesterol was found on chromosome 1 and several significant associations were observed with lipoprotein-related traits as well as adiponectin plasma concentration, including two novel associations.
|
93 |
A data-intensive assessment of the species-abundance distributionBaldridge, Elita 13 May 2015 (has links)
<p> The hollow curve species abundance distribution describes the pattern of large numbers of rare species and a small number of common species in a community. The species abundance distribution is one of the most ubiquitous patterns in nature and many models have been proposed to explain the mechanisms that generate this pattern. While there have been numerous comparisons of species abundance distribution models, most of these comparisons only use a small subset of available models, focus on a single ecosystem or taxonomic group, and fail to use the most appropriate statistical methods. This makes it difficult to draw general conclusions about which, if any, models provide the best empirical fit to species abundance distributions. I compiled data from the literature to significantly expand the available data for underrepresented taxonomic groups, and combined this with other macroecological datasets to perform comprehensive model comparisons for the species abundance distribution. A multiple model comparison showed that most available models for the species abundance distribution fit the data equivalently well across a diverse array of ecosystems and taxonomic groups. In addition, a targeted comparison of the species abundance distribution predicted by a major ecological theory, the unified neutral theory of biodiversity (neutral theory), against a non-neutral model of species abundance, demonstrates that it is difficult to distinguish between these two classes of theory based on patterns in the species abundance distribution. In concert, these studies call into question the potential for using the species abundance distribution to infer the processes operating in ecological systems.</p>
|
94 |
Assembling improved gene annotations in Clostridium acetobutylicum with RNA sequencingRalston, Matthew T. 25 March 2015 (has links)
<p> The <i>C. acetobutylicum</i> genome annotation has been markedly improved by integrating bioinformatic predictions with RNA sequencing(RNA-seq) data. Samples were acquired under butanol, butyrate, and unstressed treatments across various growth stages to sample the transcriptome from a range of physiologically relevant conditions. Analysis of an initial assembly revealed errors due to technical and biological background signals, challenges with few solutions. Hurdles for RNA-seq transcriptome mapping research include optimizing library complexity and sequencing depth, yet most studies in bacteria report low depth and ignore the effect of ribosomal RNA abundance and other sources on the effective sequencing depth. </p><p> In this work, workflows were established to address type I and II errors associated with these challenges. An integrative analysis method was developed to combine motif predictions, single-nucleotide resolution sequencing depth, and library complexity to resolve these errors during assembly curation. This contextualization minimized false positive error and determined gene boundaries, in some cases, to the exact basepair of prior studies. Curation of the pSOL1 megaplasmid reconciled transcriptome assembly statistics with findings from <i>E. coli</i>. </p><p> The resulting annotation can be readily explored and downloaded through a customized genome browser, enabling future genomic and transcriptomic research in this organism. This work demonstrates the first strand-specific transcriptome assembly in a <i>Clostridium</i> organism. This method can improve the precision of transcript boundary estimates in bacterial transcriptome mapping studies.</p>
|
95 |
A web semantic for SBML mergeThavappiragasam, Mathialakan 05 November 2014 (has links)
<p> The manipulation of XML based relational representations of biological systems (BioML for Bioscience Markup Language) is a big challenge in systems biology. The needs of biologists, like translational study of biological systems, cause their challenges to become grater due to the material received in next generation sequencing. Among these BioML's, SBML is the de facto standard file format for the storage and exchange of quantitative computational models in systems biology, supported by more than 257 software packages to date. The SBML standard is used by several biological systems modeling tools and several databases for representation and knowledge sharing. Several sub systems are integrated in order to construct a complex bio system. The issue of combining biological sub-systems by merging SBML files has been addressed in several algorithms and tools. But it remains impossible to build an automatic merge system that implements reusability, flexibility, scalability and sharability. The technique existing algorithms use is name based component comparisons. This does not allow integration into Workflow Management System (WMS) to build pipelines and also does not include the mapping of quantitative data needed for a good analysis of the biological system. In this work, we present a deterministic merging algorithm that is consumable in a given WMS engine, and designed using a novel biological model similarity algorithm. This model merging system is designed with integration of four sub modules: SBMLChecker, SBMLAnot, SBMLCompare, and SBMLMerge, for model quality checking, annotation, comparison, and merging respectively. The tools are integrated into the BioExtract server leveraging iPlant collaborative resources to support users by allowing them to process large models and design work flows. These tools are also embedded into a user friendly online version SW4SBMLm.</p>
|
96 |
Evaluating Transcriptional Regulation Through Multiple LensesGrubisic, Ivan 19 November 2014 (has links)
<p> Scientific research, especially within the space of translational research is becoming increasingly multidisciplinary. With the development of each new method there is not only a need for a broad fundamental understanding of all the sciences and mathematics, but also an acute awareness of how errors propagate across methods, the limitation of the methods and what contextual frameworks need to be used for the interpretation. The ability to understand transcriptional mechanisms and the affect that subtle changes in equilibrium may have on cell fate decisions has been greatly advanced by next generation sequencing and subsequent tools that have been developed. Bioinformatic techniques can serve multiple roles. They fundamentally provide a global picture of what is happening within an experimental condition which can then be used to either confirm individual experimental findings as globally relevant, or to discover new insights to inform the next iteration of experiments. Many of the experiments are done in <i>in vitro</i> conditions and therefore I have also focused energy on trying to understand how the mechanical inputs, largely not representative of what is occurring <i>in vivo,</i> from these methods affect transcriptional regulation. Much of this research requires the switching of frameworks to understand how results from disparate data sources can be correlated. I then applied a similar thought process to the development of <i>Lens.</i> Without an effective means of communicating research findings in an elegant and streamline mannered, we are slowing down the ability for researchers to learn new frame- works to efficiently approach the next research questions. In addition to better methods of communicating, we also need more modular and simplified tools that can be applied to various experimental systems to increase the speed and efficiency of translational research.</p>
|
97 |
Taxonomic assignment of gene sequences using hidden Markov modelsHuang, Huanhua 16 October 2014 (has links)
<p> Our ability to study communities of microorganisms has been vastly improved by the development of high-throughput DNA sequences. These technologies however can only sequence short fragments of organism's genomes at a time, which introduces many challenges in translating sequences results to biological insight. The field of bioinformatics has arisen in part to address these problems. </p><p> One bioinformatics problem is assigning a genetic sequence to a source organism. It is now common to use high−throughput, short−read sequencing technologies, such as the Illumina MiSeq, to sequence the 16S rRNA gene from a community of microorganisms. Researchers use this information to generate a profile of the different microbial organisms (i.e., the taxonomic composition) present in an environmental sample. There are a number of approaches for assigning taxonomy to genetic sequences, but all suffer from problems with accuracy. The methods that have been most widely used are pairwise alignment methods, like BLAST, UCLUST, and RTAX, and probability-based methods, such as RDP and MOTHUR. These methods can classify microbial sequences with high accuracy when sequences are long (e.g., thousand bases), however accuracy decreases as sequences are shorter. Current high−throughout sequencing technologies generates sequences between about 150 and 500 bases in length. </p><p> In my thesis I have developed new software for assigning taxonomy to short DNA sequences using profile Hidden Markov Models (HMMs). HMMs have been applied in related areas, such as assigning biological functions to protein sequences, and I hypothesize that it might be useful for achieving high accuracy taxonomic assignments from 16S rRNA gene sequences. My method builds models of 16S rRNA sequences for different taxonomic groups (kingdom, phylum, class, order, family genus and species) using the Greengenes 16S rRNA database. Given a sequence with unknown taxonomic origin, my method searches each kingdom model to determine the most likely kingdom. It then searches all of the phyla within the highest scoring kingdom to determine the most likely phylum. This iterative process continues until the sequence cannot be assigned at a taxonomic level with a user-defined confidence level, or until a species-level assignment is made that meets the user-defined confidence level. </p><p> I next evaluated this method on both artificial and real microbial community data, with both qualitative and quantitative metrics of method performance. The evaluation results showed that in the qualitative analyses (specificity and sensitivity) my method is not as good as the previously existing methods. However, the accuracy in the quantitative analysis was better than some other pre-existing methods. This suggests that my current implementation is sensitive to false positives, but is better at classifying more sequences than the other methods. </p><p> I present my method, my evaluations, and suggestions for next steps that might improve the performance of my HMM-based taxonomic classifier.</p>
|
98 |
Ambiguous fragment assignment for high-throughput sequencing experimentsRoberts, Adam 28 May 2014 (has links)
<p> As the cost of short-read, high-throughput DNA sequencing continues to fall rapidly, new uses for the technology have been developed aside from its original purpose in determining the genome of various species. Many of these new experiments use the sequencer as a digital counter for measuring biological activities such as gene expression (RNA-Seq) or protein binding (ChIP-Seq). </p><p> A common problem faced in the analysis of these data is that of sequenced fragments that are "ambiguous", meaning they resemble multiple loci in a reference genome or other sequence. In early analyses, such ambiguous fragments were ignored or were assigned to loci using simple heuristics. However, statistical approaches using maximum likelihood estimation have been shown to greatly improve the accuracy of downstream analyses and have become widely adopted Optimization based on the expectation-maximization (EM) algorithm are often employed by these methods to find the optimal sets of alignments, with frequent enhancements to the model. Nevertheless, these improvements increase complexity, which, along with an exponential growth in the size of sequencing datasets, has led to new computational challenges. </p><p> Herein, we present our model for ambiguous fragment assignment for RNA-Seq, which includes the most comprehensive set of parameters of any model introduced to date, as well as various methods we have explored for scaling our optimization procedure. These methods include the use of an online EM algorithm and a distributed EM solution implemented on the Spark cluster computing system. Our advances have resulted in the first efficient solution to the problem of fragment assignment in sequencing.</p><p> Furthermore, we are the first to create a fully generalized model for ambiguous fragment assignment and present details on how our method can provide solutions for additional high-throughput sequencing assays including ChIP-Seq, Allele-Specific Expression (ASE), and the detection of RNA-DNA Differences (RDDs) in RNA-Seq.</p>
|
99 |
Significant distinct branches of hierarchical trees| A framework for statistical analysis and applications to biological dataSun, Guoli 10 March 2015 (has links)
<p> One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. </p><p> We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. With each of the five datasets, there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. </p><p> One dataset uses Cores Of Recurrent Events (CORE) to select features. CORE was developed with my participation in the course of this work. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/CORE/index.html. </p><p> Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: cran.r-project.org/web/packages/TBEST/index.html.</p>
|
100 |
Phenotypic alterations in Borrelia burgdorferi and implications for the persister cell hypothesisCaskey, John Russell 13 February 2015 (has links)
<p> Lyme disease is the most commonly reported vector-borne disease in the United States. The causative agent of Lyme disease, can alter gene expression to enable survival in a diverse set of conditions, including the tick midgut and the mammalian host. External environmental changes can trigger gene expression in <i>B. burgdorferi,</i> and the data demonstrate that <i> B. burgdorferi</i> can similarly alter gene expression as a stress-response when it is treated with the antibiotic doxycycine. After treatment with the minimum bactericidal concentration (MBC) of doxycycline, a subpopulation can alter its phenotype to survive antibiotic treatment, and to host adapt and successfully infect a mammalian host. Furthermore, our data demonstrate that if a population is treated with the MBC of doxycycline, a subpopulation may alter its phenotype to adopt a state of dormancy until the removal of the antibiotic, whereupon the subpopulation can regrow. We demonstrate that the chance of regrowth occurring increases as a population reaches stationary phase, and present a mathematical model for predicting the probability of a persister subpopulation within a larger population, and ascertain the quantity of a persister subpopulation. To determine which genes are expressed as stress-response genes, RNA Sequencing analysis, or RNASeq, was performed on treated, untreated, and treated and regrown <i>B. burgdorferi</i> samples. The results suggest several genes were significantly different in the treated group, compared to the untreated group, and in the untreated and regrown group compared to the untreated group, including a 50S ribosomal stress-response protein, coded from BB_0786. The appendices discuss the theory and methods that were used in RNA Sequencing (RNASeq) analysis, and provide an overview of the database that was created for the <i>B. burgdorferi</i> transcriptome. Additional studies may demonstrate further how persister subpopulations form, and which genes can trigger a persister state in <i>B. burgdorferi.</i></p>
|
Page generated in 0.1252 seconds