Spelling suggestions: "subject:"[een] BIOINFORMATICS"" "subject:"[enn] BIOINFORMATICS""
261 |
Statistical Methods and Analyses for Next-generation Sequencing DataYu, Xiaoqing 02 September 2014 (has links)
No description available.
|
262 |
The role of RNA-editing in viral mediated pathogenesisFrederick , Madeline Rose 11 May 2018 (has links)
No description available.
|
263 |
New Facets of the Balanced Minimal Evolution PolytopeKeefe, Logan 10 June 2016 (has links)
No description available.
|
264 |
DISCOVERING DRIVER MUTATIONS IN BIOLOGICAL DATABokhari, Yahya 01 January 2018 (has links)
Background
Somatic mutations accumulate in human cells throughout life. Some may have no adverse consequences, but some of them may lead to cancer. A cancer genome is typically unstable, and thus more mutations can accumulate in the DNA of cancer cells. An ongoing problem is to figure out which mutations are drivers - play a role in oncogenesis, and which are passengers - do not play a role. One way of addressing this question is through inspection of somatic mutations in DNA of cancer samples from a cohort of patients and detection of patterns that differentiate driver from passenger mutations. Results
We propose QuaDMutEx an QuadMutNetEx, a method that incorporates three novel elements: a new gene set penalty that includes non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty, through a combination of heuristic Monte Carlo optimization and exact binary quadratic programming.
QuaDMutNetEx is our proposed method that combines protein-protein interaction networks to the method elements of QuaDMutEx. In particular, QuaDMutEx incorporates three novel elements: a non-linear penalization of multiple mutations in putative sets of driver genes, an ability to adjust the method to handle slow- and fast-evolving tumors, and a computationally efficient method for finding gene sets that minimize the penalty. In the new method, we incorporated a new quadratic rewarding term that prefers gene solution set that is connected with respect to protein-protein interaction networks. Compared to existing methods, the proposed algorithm finds sets of putative driver genes that show higher coverage and lower excess coverage in eight sets of cancer samples coming from brain, ovarian, lung, and breast tumors. Conclusions
Superior ability to improve on both coverage and excess coverage on different types of cancer shows that QuaDMutEx and QuaDMutNetEx are tools that should be part of a state-of-the-art toolbox in the driver gene discovery pipeline. It can detect genes harboring rare driver mutations that may be missed by existing methods.
|
265 |
Clustering biological data using a hybrid approach : Composition of clusterings from different featuresKeller, Jens January 2008 (has links)
<p>Clustering of data is a well-researched topic in computer sciences. Many approaches have been designed for different tasks. In biology many of these approaches are hierarchical and the result is usually represented in dendrograms, e.g. phylogenetic trees. However, many non-hierarchical clustering algorithms are also well-established in biology. The approach in this thesis is based on such common algorithms. The algorithm which was implemented as part of this thesis uses a non-hierarchical graph clustering algorithm to compute a hierarchical clustering in a top-down fashion. It performs the graph clustering iteratively, with a previously computed cluster as input set. The innovation is that it focuses on another feature of the data in each step and clusters the data according to this feature. Common hierarchical approaches cluster e.g. in biology, a set of genes according to the similarity of their sequences. The clustering then reflects a partitioning of the genes according to their sequence similarity. The approach introduced in this thesis uses many features of the same objects. These features can be various, in biology for instance similarities of the sequences, of gene expression or of motif occurences in the promoter region. As part of this thesis not only the algorithm itself was implemented and evaluated, but a whole software also providing a graphical user interface. The software was implemented as a framework providing the basic functionality with the algorithm as a plug-in extending the framework. The software is meant to be extended in the future, integrating a set of algorithms and analysis tools related to the process of clustering and analysing data not necessarily related to biology.</p><p>The thesis deals with topics in biology, data mining and software engineering and is divided into six chapters. The first chapter gives an introduction to the task and the biological background. It gives an overview of common clustering approaches and explains the differences between them. Chapter two shows the idea behind the new clustering approach and points out differences and similarities between it and common clustering approaches. The third chapter discusses the aspects concerning the software, including the algorithm. It illustrates the architecture and analyses the clustering algorithm. After the implementation the software was evaluated, which is described in the fourth chapter, pointing out observations made due to the use of the new algorithm. Furthermore this chapter discusses differences and similarities to related clustering algorithms and software. The thesis ends with the last two chapters, namely conclusions and suggestions for future work. Readers who are interested in repeating the experiments which were made as part of this thesis can contact the author via e-mail, to get the relevant data for the evaluation, scripts or source code.</p>
|
266 |
Phylogenetic and Comparative Genomic Analyses of Heterocystous CyanobacteriaHoward-Azzeh, Joseph Mohammad 24 October 2014 (has links)
<p>Heterocyst-forming Cyanobacterial species encompass the orders <em>Nostocales</em> and <em>Stigonematales</em>. These orders are currently differentiated based solely upon morphological characteristics (such as the formation of true or false branches), which may be unrelated to phylogeny. Thus, these bacteria do not form distinct monophyletic groups in the 16S rRNA tree, and as yet, no reliable molecular markers have yet been identified that allow species of these two orders to be distinguished from each other or from other organisms. Using published genome sequences for these species, we have investigated the relationship of species from these two orders. We describe here detailed phylogenetic analyses based on concatenated protein sequences for 45 proteins from 48 Cyanobacterial species/strains whose genomes are now available. In addition, we have performed comprehensive comparative genomic analyses on eight available <em>Nostocales</em> and <em>Stigonematales</em> genomes to identify conserved signature indels (CSIs) and conserved signature proteins (CSPs) that are specifically present in all <em>Nostocales/Stigonematales</em> or any of their subclades. These analyses have yielded >100 CSIs and CSPs that are specific for distinct coherent clades of <em>Nostocales/Stigonematales</em>. Seventeen of these CSIs and twelve CSPs are present in all sequenced <em>Nostocales</em> and <em>Stigonematales</em>, supporting a distinct relationship of the species from these two orders of heterocyst-forming bacteria. Fifty-four CSIs and forty-one CSPs are specific for different subclades of <em>Nostocales</em> and many others are diagnostic of individual species/strains. The newly-identified CSIs and CSPs, which are specific for different subclades of <em>Nostocales/Stigonematales,</em> may provide novel means for identifying previously unknown members of these orders, for assignment of unsequenced members of heterocystous Cyanobacteria into different identified subclades, and for detecting individual strains from environmental samples by employing CSIs/CSPs as diagnostic tools.</p> / Master of Science (MSc)
|
267 |
A Parallel, High-Throughput Framework for Discovery of DNA MotifsKurz, Kyle W. 27 July 2010 (has links)
No description available.
|
268 |
Lateral Gene Transfer in Operons and Its Effects on Neighbouring GenesPasha, Asher 10 1900 (has links)
<p>Prokaryotes evolve, in part, by lateral gene transfer (LGT). This transfer of genetic material is likely important in the evolution of operons, a group of genes that are transcribed as a single mRNA. Genes that are transferred may then be integrated into genomes by homologous recombination. In this thesis, it was proposed that homologous recombination is the mechanism of integration of laterally transferred genes into operons. To investigate this proposal, a phylogenetic tree of Bacillus was inferred using DNA sequence alignments. LGT was inferred using a parsimony algorithm, and operons were inferred using OperonDB. Homologous recombination breakpoints were identified by permutation tests, <em>GENECONV</em> and maximum chi square algorithm. The results indicate that there is evidence for integration of functionally annotated genes into operons by homologous recombination. There are several laterally transferred genes that have recombination breakpoints before the start codon or after the stop codon of the genes. It was also proposed in this thesis that LGT causes an increase in the rate of evolution of genes that are neighbours of laterally transferred genes. To investigate this proposal, genes that are neighbours of laterally transferred genes in Bacillus were identified. These genes were classified as upstream or downstream genes to the LGT event. Genes that are not neighbours of laterally transferred genes were also identified as a control. Selection and the rate of evolution was studied using maximum likelihood models implemented in CodeML of PAML. Genes under positive selection were inferred using likelihood ratio tests. The results indicate that only a few neighbouring genes were under positive selection, and the rate of evolution of the neighbouring genes was slightly higher than that of the non-neighbouring genes. The high rates of evolution of the neighbouring genes are likely due to relaxed selection on the neighbouring genes.</p> / Master of Biological Science (MBioSci)
|
269 |
Evaluating PrediXcan’s Ability to Predict Differential Expression Between Alcoholics and Non-AlcoholicsDrake, John E, Jr 01 January 2019 (has links)
PrediXcan is a recent software for the imputation of gene expression from genotype data alone. Using an overlapping set of transcriptome datasets from postmortem brain tissues of donors with alcohol use disorder and neurotypical controls, which were generated by two different platforms (e.g., Arraystar and Affymetrix), and an additional unrelated transcriptome dataset from lung tissue, we sought to evaluate PrediXcan’s ability to impute gene expression and identify differentially expressed genes. From the Arraystar platform, 1.3% of matched genes between the measured and imputed expression had a Pearson correlation ≥ 0.5. Our attempt to replicate this finding using the expression data from the Affymetrix platform also lead to a similarly poor outcome (2.7%). Our third attempt using the transcriptome data from lung tissue produced similar results (1.1%) but performance improved markedly after filtering out genes with a low predicted R2, which was a model metric provided by the PrediXcan authors. For example, filtering out genes with a predicted R2 below 0.6 led to 16 genes remaining and a Pearson correlation of 0.365 between the measured and imputed expression. We were unable to reproduce similar performance gains with filtering the Arraystar or Affymetrix alcohol use disorder datasets. Given that PrediXcan can impute a narrow portion of the transcriptome, which is further reduced significantly by filtering, we believe caution is warranted with the interpretation of results derived from PrediXcan.
|
270 |
Using sequence similarity to predict the function of biological sequences.Jones, Craig E. January 2007 (has links)
In this thesis we examine issues surrounding the development of software that predicts the function of biological sequences using sequence similarity. There is a pressing need for high throughput software that can annotate protein or DNA sequences with functional information due to the exponential growth in sequence data. In Chapter 1 we briefly introduce the molecular biology and bioinformatics that is assumed knowledge, and the objectives for the research presented here. In Chapter 2 we discuss the development of a method of comparing competing designs for software annotators, using precision and recall metrics, and a benchmark method referred to as Best BLAST. From this we conclude that data-mining approaches may be useful in the development of annotation algorithms, and that any new annotator should demonstrate its effectiveness against other approaches before being adopted. As any new annotator that utilises sequence similarity to predict the function of a sequence will rely on the quality of existing annotations, we examine the error rate of existing sequence annotations in Chapter 3. We develop a new method that allows for the estimation of annotation error rates. This involves adding annotation errors at known rates to a sample of reference sequence annotations that was found to be similar to query sequences. The precision at each error rate treatment is determined, and linear regression then used to find the error rate at estimated values for the maximum precision possible given assumptions concerning the impact of semantic variation on precision. We found that the error rate of curated annotations based on sequence similarity (ISS) is far higher than those that use other forms of evidence (49% versus 13-18%, respectively). As such we conclude that software annotators should avoid basing predictions on ISS annotations where possible. In Chapter 4 we detail the development of GOSLING, Gene Ontology Similarity Listing using Information Graphs, a software annotator with a design based on the principles discovered in previous chapters. Chapter 5 concludes the thesis by discussing the major findings from the research presented. / http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1280882 / Thesis (M.Sc.(M&CS)) -- School of Computer Science, 2007
|
Page generated in 0.0732 seconds