271 |
Using sequence similarity to predict the function of biological sequences.Jones, Craig E. January 2007 (has links)
In this thesis we examine issues surrounding the development of software that predicts the function of biological sequences using sequence similarity. There is a pressing need for high throughput software that can annotate protein or DNA sequences with functional information due to the exponential growth in sequence data. In Chapter 1 we briefly introduce the molecular biology and bioinformatics that is assumed knowledge, and the objectives for the research presented here. In Chapter 2 we discuss the development of a method of comparing competing designs for software annotators, using precision and recall metrics, and a benchmark method referred to as Best BLAST. From this we conclude that data-mining approaches may be useful in the development of annotation algorithms, and that any new annotator should demonstrate its effectiveness against other approaches before being adopted. As any new annotator that utilises sequence similarity to predict the function of a sequence will rely on the quality of existing annotations, we examine the error rate of existing sequence annotations in Chapter 3. We develop a new method that allows for the estimation of annotation error rates. This involves adding annotation errors at known rates to a sample of reference sequence annotations that was found to be similar to query sequences. The precision at each error rate treatment is determined, and linear regression then used to find the error rate at estimated values for the maximum precision possible given assumptions concerning the impact of semantic variation on precision. We found that the error rate of curated annotations based on sequence similarity (ISS) is far higher than those that use other forms of evidence (49% versus 13-18%, respectively). As such we conclude that software annotators should avoid basing predictions on ISS annotations where possible. In Chapter 4 we detail the development of GOSLING, Gene Ontology Similarity Listing using Information Graphs, a software annotator with a design based on the principles discovered in previous chapters. Chapter 5 concludes the thesis by discussing the major findings from the research presented. / http://library.adelaide.edu.au/cgi-bin/Pwebrecon.cgi?BBID=1280882 / Thesis (M.Sc.(M&CS)) -- School of Computer Science, 2007
|
272 |
Grid and High-Performance Computing for Applied BioinformaticsAndrade, Jorge January 2007 (has links)
The beginning of the twenty-first century has been characterized by an explosion of biological information. The avalanche of data grows daily and arises as a consequence of advances in the fields of molecular biology and genomics and proteomics. The challenge for nowadays biologist lies in the de-codification of this huge and complex data, in order to achieve a better understanding of how our genes shape who we are, how our genome evolved, and how we function. Without the annotation and data mining, the information provided by for example high throughput genomic sequencing projects is not very useful. Bioinformatics is the application of computer science and technology to the management and analysis of biological data, in an effort to address biological questions. The work presented in this thesis has focused on the use of Grid and High Performance Computing for solving computationally expensive bioinformatics tasks, where, due to the very large amount of available data and the complexity of the tasks, new solutions are required for efficient data analysis and interpretation. Three major research topics are addressed; First, the use of grids for distributing the execution of sequence based proteomic analysis, its application in optimal epitope selection and in a proteome-wide effort to map the linear epitopes in the human proteome. Second, the application of grid technology in genetic association studies, which enabled the analysis of thousand of simulated genotypes, and finally the development and application of a economic based model for grid-job scheduling and resource administration. The applications of the grid based technology developed in the present investigation, results in successfully tagging and linking chromosomes regions in Alzheimer disease, proteome-wide mapping of the linear epitopes, and the development of a Market-Based Resource Allocation in Grid for Scientific Applications. / QC 20100622
|
273 |
Revealing Microbial Responses to Environmental Dynamics: Developing Methods for Analysis and Visualization of Complex Sequence Datasets.January 2017 (has links)
abstract: The greatest barrier to understanding how life interacts with its environment is the complexity in which biology operates. In this work, I present experimental designs, analysis methods, and visualization techniques to overcome the challenges of deciphering complex biological datasets. First, I examine an iron limitation transcriptome of Synechocystis sp. PCC 6803 using a new methodology. Until now, iron limitation in experiments of Synechocystis sp. PCC 6803 gene expression has been achieved through media chelation. Notably, chelation also reduces the bioavailability of other metals, whereas naturally occurring low iron settings likely result from a lack of iron influx and not as a result of chelation. The overall metabolic trends of previous studies are well-characterized but within those trends is significant variability in single gene expression responses. I compare previous transcriptomics analyses with our protocol that limits the addition of bioavailable iron to growth media to identify consistent gene expression signals resulting from iron limitation. Second, I describe a novel method of improving the reliability of centroid-linkage clustering results. The size and complexity of modern sequencing datasets often prohibit constructing distance matrices, which prevents the use of many common clustering algorithms. Centroid-linkage circumvents the need for a distance matrix, but has the adverse effect of producing input-order dependent results. In this chapter, I describe a method of cluster edge counting across iterated centroid-linkage results and reconstructing aggregate clusters from a ranked edge list without a distance matrix and input-order dependence. Finally, I introduce dendritic heat maps, a new figure type that visualizes heat map responses through expanding and contracting sequence clustering specificities. Heat maps are useful for comparing data across a range of possible states. However, data binning is sensitive to clustering cutoffs which are often arbitrarily introduced by researchers and can substantially change the heat map response of any single data point. With an understanding of how the architectural elements of dendrograms and heat maps affect data visualization, I have integrated their salient features to create a figure type aimed at viewing multiple levels of clustering cutoffs, allowing researchers to better understand the effects of environment on metabolism or phylogenetic lineages. / Dissertation/Thesis / Chapter 2 Excel file of transcriptome responses / Chapter 2 Perl scripts / Chapter 3 Cluster Aggregation Perl script / Chapter 4 Example of the top-down clustering method used to construct dendritic heat maps / Chapter 4Perl scripts and dendritic heat map images / Chapter 4 Perl scripts and dendritic heat map images / Doctoral Dissertation Geological Sciences 2017
|
274 |
Method for recognizing local descriptors of protein structures using Hidden Markov ModelsBjörkholm, Patrik January 2008 (has links)
Being able to predict the sequence-structure relationship in proteins will extend the scope of many bioinformatics tools relying on structure information. Here we use Hidden Markov models (HMM) to recognize and pinpoint the location in target sequences of local structural motifs (local descriptors of protein structure, LDPS) These substructures are composed of three or more segments of amino acid backbone structures that are in proximity with each other in space but not necessarily along the amino acid sequence. We were able to align descriptors to their proper locations in 41.1% of the cases when using models solely built from amino acid information. Using models that also incorporated secondary structure information, we were able to assign 57.8% of the local descriptors to their proper location. Further enhancements in performance was yielded when threading a profile through the Hidden Markov models together with the secondary structure, with this material we were able assign 58,5% of the descriptors to their proper locations. Hidden Markov models were shown to be able to locate LDPS in target sequences, the performance accuracy increases when secondary structure and the profile for the target sequence were used in the models.
|
275 |
Inferring Clonal Heterogeneity in Chronic Lymphocytic Leukemia From High-Throughput DataZucker, Mark Raymond 11 July 2019 (has links)
No description available.
|
276 |
Iterative Visual Analytics and its Applications in BioinformaticsYou, Qian 20 March 2012 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / You, Qian. Ph.D., Purdue University, December, 2010. Iterative Visual Analytics and its Applications in Bioinformatics. Major Professors: Shiaofen Fang and Luo
Si.
Visual Analytics is a new and developing field that addresses the challenges of
knowledge discoveries from the massive amount of available data. It facilitates
humans‘ reasoning capabilities with interactive visual interfaces for exploratory data analysis tasks, where automatic data mining methods fall short due to the lack of the pre-defined objective functions. Analyzing the large volume of data sets for biological discoveries raises similar challenges. The domain knowledge of biologists and bioinformaticians is critical in the hypothesis-driven discovery tasks. Yet developing visual analytics frameworks for bioinformatic applications is
still in its infancy.
In this dissertation, we propose a general visual analytics framework – Iterative Visual Analytics (IVA) – to address some of the challenges in the current research. The framework consists of three progressive steps to explore data sets with the increased complexity: Terrain Surface Multi-dimensional Data
Visualization, a new multi-dimensional technique that highlights the global
patterns from the profile of a large scale network. It can lead users‘ attention to characteristic regions for discovering otherwise hidden knowledge; Correlative Multi-level Terrain Surface Visualization, a new visual platform that provides the overview and boosts the major signals of the numeric correlations among nodes in interconnected networks of different contexts. It enables users to gain
critical insights and perform data analytical tasks in the context of multiple correlated networks; and the Iterative Visual Refinement Model, an innovative process that treats users‘ perceptions as the objective functions, and guides the users to form the optimal hypothesis by improving the desired visual patterns. It is a formalized model for interactive explorations to converge to optimal solutions.
We also showcase our approach with bio-molecular data sets and demonstrate
its effectiveness in several biomarker discovery applications.
|
277 |
Automated Alignment of RNA 3D StructuresRahrig, Ryan Robert 16 August 2010 (has links)
No description available.
|
278 |
Phosphoproteomic Characterization of Systems-Wide Differential Signaling Induced by Small Molecule PP2A ActivationWiredja, Danica 02 February 2018 (has links)
No description available.
|
279 |
A Scalable, Memory Efficient Multicore TEIRESIAS ImplementationNau, Lee J. 26 July 2011 (has links)
No description available.
|
280 |
Elucidating the Effects of Developmental Pyrethroid Pesticide Exposure in Mouse Brain Using a Multiomics ApproachCurtis, Melissa Ann January 2021 (has links)
No description available.
|
Page generated in 0.0833 seconds