Global ETD Search

661	Structure-based Subfamily Classification of Homeodomains Tsai, Jennifer Ming-Jiun 30 July 2008 (has links) Eukaryotic DNA-binding proteins mediate many important steps in embryonic development and gene regulation. Consequently, a better understanding of these proteins would hopefully allow a more complete picture of gene regulation to be determined. In this study, a structure-based subfamily classification of the homeodomain family of DNA-binding proteins was undertaken in order to determine whether sub-groupings of a protein family could be identified that corresponded to differences in specific function, and identification of subfamily-determining residues was performed in order to gain some insight on functional differences via analysis of the residue properties. Subfamilies appear to have different specific DNA binding properties, according to DNA profiles obtained from TRANSFAC [1] and other sources in the literature. Subfamily-specific residues appear to be frequently associated with the protein-DNA interface and may influence DNA binding via interactions with the DNA phosphate backbone; these residues form a conserved profile uniquely identifying each subfamily. subfamily classification bioinformatics homeodomains 0715
662	Structure-based Subfamily Classification of Homeodomains Tsai, Jennifer Ming-Jiun 30 July 2008 (has links) Eukaryotic DNA-binding proteins mediate many important steps in embryonic development and gene regulation. Consequently, a better understanding of these proteins would hopefully allow a more complete picture of gene regulation to be determined. In this study, a structure-based subfamily classification of the homeodomain family of DNA-binding proteins was undertaken in order to determine whether sub-groupings of a protein family could be identified that corresponded to differences in specific function, and identification of subfamily-determining residues was performed in order to gain some insight on functional differences via analysis of the residue properties. Subfamilies appear to have different specific DNA binding properties, according to DNA profiles obtained from TRANSFAC [1] and other sources in the literature. Subfamily-specific residues appear to be frequently associated with the protein-DNA interface and may influence DNA binding via interactions with the DNA phosphate backbone; these residues form a conserved profile uniquely identifying each subfamily. subfamily classification bioinformatics homeodomains 0715
663	A symmetry preserving singular value decomposition Shah, Mili January 2007 (has links) This thesis concentrates on the development, analysis, implementation, and application of a symmetry preserving singular value decomposition (SPSVD). This new factorization enhances the singular value decomposition (SVD)---a powerful method for calculating a low rank approximation to a large data set---by producing the best symmetric low rank approximation to a matrix with respect to the Frobenius norm and matrix-2 norm. Calculating an SPSVD is a two-step process. In the first step, a matrix representation for the symmetry of a given data set must be determined. This process is presented as a novel iterative reweighting method: a scheme which is rapidly convergent in practice and seems to be extremely effective in ignoring outliers of the data. In the second step, the best approximation that maintains the symmetry calculated from the first step is computed. This approximation is designated the SPSVD of the data set. In many situations, the SPSVD needs efficient updating. For instance, if new data is given, then the symmetry of the set may change and an alternative matrix representation has to be formed. A modification in the matrix representation also alters the SPSVD. Therefore, proficient methods to address each of these issues are developed in this thesis. This thesis applies the SPSVD to molecular dynamic (MD) simulations of proteins and to face analysis. Symmetric motions of a molecule may be lost when the SVD is applied to MD trajectories of proteins. This loss is corrected by implementing the SPSVD to create major modes of motion that best describe the symmetric movements of the protein. Moreover, the SPSVD may reduce the noise that often occurs on the side chains of molecules. In face analysis, the SVD is regularly used for compression. Because faces are nearly symmetric, applying the SPSVD to faces creates a more efficient compression. This efficiency is a result of having to store only half the picture for the SPSVD. Therefore, it is apparent that the SPSVD is an effective method for calculating a symmetric low rank approximation for a set of data. Mathematics Biology, Bioinformatics Computer Science
664	Genomic and phylogenetic assessment of sea lamprey (Petromyzon marinus) Hox genes and analysis of Hox genes in association with myomeres across multiple lamprey genera Childs, Darcy 22 August 2013 (has links) Lampreys are an important model for the study of early vertebrate development due to their unique evolutionary position as one of only two extant jawless vertebrates. In this study, 12 new putative Hox gene fragments were identified within the recently available Petromyzon marinus (sea lamprey) genome. These and the other previously-identified Hox genes were analyzed phylogenetically, which enabled the assignment of many of the new sequences to distinct paralogous gene clusters and showed distinctions between gnathostome and lamprey Hox sequences. An examination of Hox genes in other lamprey species was conducted using genomic PCR-based detection methods and identified 26 putative Hox gene homeobox fragments from multiple Hox genes across nine lamprey species. A study of Hox10 coding sequences in different lamprey species failed to find any correlation with variable numbers of trunk myomeres in lampreys, which suggests that other sequences or factors regulate the number of myomeres in different species. Molecular Lamprey Bioinformatics Phylogenetics Hox
665	Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis Hussain, Hanaa Mohammad January 2012 (has links) The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised Kmeans clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis. 572.8 FPGA ; microarray ; DPR ; bioinformatics
666	Comparative genomics of microsatellite abundance: a critical analysis of methods and definitions Jentzsch, Iris Miriam Vargas January 2009 (has links) This PhD dissertation is focused on short tandemly repeated nucleotide patterns which occur extremely often across DNA sequences, called microsatellites. The main characteristic of microsatellites, and probably the reason why they are so abundant across genomes, is the extremely high frequency of specific replication errors occurring within their sequences, which usually cause addition or deletion of one or more complete tandem repeat units. Due to these errors, frequent fluctuations in the number of repetitive units can be observed among cellular and organismal generations. The molecular mechanisms as well as the consequences of these microsatellite mutations, both, on a generational as well as on an evolutionary scale, have sparked debate and controversy among the scientific community. Furthermore, the bioinformatic approaches used to study microsatellites and the ways microsatellites are referred to in the general literature are often not rigurous, leading to misinterpretations and inconsistencies among studies. As an introduction to this complex topic, in Chapter I I present a review of the knowledge accumulated on microsatellites during the past two decades. A major part of this chapter has been published in the Encyclopedia of Life Sciences in a Chapter about microsatellite evolution (see Publication 1 in Appendix II). The ongoing controversy about the rates and patterns of microsatellite mutation was evident to me since before starting this PhD thesis. However, the subtler problems inherent to the computational analyses of microsatellites within genomes only became apparent when retrieving information on microsatellite distribution and abundance for the design of comparative genomic analyses. There are numerous publications analyzing the microsatellite content of genomes but, in most cases, the results presented can neither be reliably compared nor reproduced, mainly due to the lack of details on the microsatellite search process (particularly the program’s algorithm and the search parameters used) and because the results are expressed in terms that are relative to the search process (i.e. measures based on the absolute number of microsatellites). Therefore, in Chapter II I present a critical review of all available software tools designed to scan DNA sequences for microsatellites. My aim in undertaking this review was to assess the comparability of search results among microsatellite programs, and to identify the programs most suitable for the generation of microsatellite datasets for a thorough and reproducible comparative analysis of microsatellite content among genomic sequences. Using sequence data where the number and types of microsatellites were empirical know I compared the ability of 19 programs to accurately identify and report microsatellites. I then chose the two programs which, based on the algorithm and its parameters as well as the output informativity, offered the information most suitable for biological interpretation, while also reflecting as close as possible the microsatellite content of the test files. From the analysis of microsatellite search results generated by the various programs available, it became apparent that the program’s search parameters, which are specified by the user in order to define the microsatellite characteristics to the program, influence dramatically the resulting datasets. This is especially true for programs suited to allow imperfections within tandem repeats, because imperfect repetitions can not be defined accurately as is the case for perfect ones, and because several different algorithms have been proposed to address this problem. The detection of approximate microsatellites is, however, essential for the study of microsatellite evolution and for comparative analyses based on microsatellites. It is now well accepted that small deviations from perfect tandem repeat structure are common within microsatellites and larger repeats, and a number of different algorithms have been developed to confront the challenge of finding and registering microsatellites with all expectable kinds of imperfection. However, biologists have still to apply these tools to their full potential. In biological analyses single tandem repeat hits are consistently interpreted as isolated and independent repeats. This interpretation also depends on the search strategy used to report the microsatellites in DNA sequences and, therefore, I was particularly interested in the capacity of repeat finding programs to report imperfect microsatellites allowing interpretations that are useful in a biological sense. After analzying a series of tandem repeat finding programs I optimized my microsatellite searches to yield the best possible datasets for assessing and comparing the degree of imperfection of microsatellites among different genomes (Chapter III) During the program comparisons performed in Chapter II, I show that the most critical search parameter influencing microsatellite search results is the minimum length threshold. Biologically speaking, there is no consensus with respect to the minimum length, beyond which a short tandem repeat is expected to become prone to microsatellite-like mutations. Usually, a single absolute value of ~12 nucleotides is assigned irrespective of motif length.. In other cases thresholds are assigned in terms of number of repeat units (i.e. 3 to 5 repeats or more), which are better applied individually for each motif. The variation in these thresholds is considerable and not always justifiable. In addition, any current minimum length measures are likely naïve because it is clear that different microsatellite motifs undergo replication slippage at different length thresholds. Therefore, in Chapter III, I apply two probabilistic models to predict the minimum length at which microsatellites of varying motif types become overrepresented in different genomes based on the individual oligonucleotide frequency data of these genomes. Finally, after a range of optimizations and critical analyses, I performed a preliminary analysis of microsatellite abundance among 24 high quality complete eukaryotic genomes, including also 8 prokaryotic and 5 archaeal genomes for contrast. The availability of the methodologies and the microsatellite datasets generated in this project will allow informed formulation of questions for more specific genome research, either about microsatellites, or about other genomic features microsatellites could influence. These datasets are what I would have needed at the beginning of my PhD to support my experimental design, and are essential for the adequate data interpretation of microsatellite data in the context of the major evolutionary units; chromosomes and genomes. microsatellites microsatellite evolution bioinformatics genomics
667	Module-Based Analysis for "Omics" Data Wang, Zhi 24 March 2015 (has links) <p> This thesis focuses on methodologies and applications of module-based analysis (MBA) in omics studies to investigate the relationships of phenotypes and biomarkers, e.g., SNPs, genes, and metabolites. As an alternative to traditional single–biomarker approaches, MBA may increase the detectability and reproducibility of results because biomarkers tend to have moderate individual effects but significant aggregate effect; it may improve the interpretability of findings and facilitate the construction of follow-up biological hypotheses because MBA assesses biomarker effects in a functional context, e.g., pathways and biological processes. Finally, for exploratory “omics” studies, which usually begin with a full scan of a long list of candidate biomarkers, MBA provides a natural way to reduce the total number of tests, and hence relax the multiple-testing burdens and improve power.</p><p> The first MBA project focuses on genetic association analysis that assesses the main and interaction effects for sets of genetic (G) and environmental (E) factors rather than for individual factors. We develop a kernel machine regression approach to evaluate the complete effect profile (i.e., the G, E, and G-by-E interaction effects separately or in combination) and construct a kernel function for the Gene-Environmental (GE) interaction directly from the genetic kernel and the environmental kernel. We use simulation studies and real data applications to show improved performance of the Kernel Machine (KM) regression method over the commonly adapted PC regression methods across a wide range of scenarios. The largest gain in power occurs when the underlying effect structure is involved complex GE interactions, suggesting that the proposed method could be a useful and powerful tool for performing exploratory or confirmatory analyses in GxE-GWAS.</p><p> In the second MBA project, we extend the kernel machine framework developed in the first project to model biomarkers with network structure. Network summarizes the functional interplay among biological units; incorporating network information can more precisely model the biological effects, enhance the ability to detect true signals, and facilitate our understanding of the underlying biological mechanisms. In the work, we develop two kernel functions to capture different network structure information. Through simulations and metabolomics study, we show that the proposed network-based methods can have markedly improved power over the approaches ignoring network information.</p><p> Metabolites are the end products of cellular processes and reflect the ultimate responses of biology system to genetic variations or environment exposures. Because of the unique properties of metabolites, pharmcometabolomics aims to understand the underlying signatures that contribute to individual variations in drug responses and identify biomarkers that can be helpful to response predictions. To facilitate mining pharmcometabolomic data, we establish an MBA pipeline that has great practical value in detection and interpretation of signatures, which may potentially indicate a functional basis for the drug response. We illustrate the utilities of the pipeline by investigating two scientific questions in aspirin study: (1) which metabolites changes can be attributed to aspirin intake, and (2) what are the metabolic signatures that can be helpful in predicting aspirin resistance. Results show that the MBA pipeline enables us to identify metabolic signatures that are not found in preliminary single-metabolites analysis.</p>
668	Identification of candidate genes involved in fin/limb development and evolution using bioinformatic methods Mastick, Kellen J. 05 November 2014 (has links) <p> Key to understanding the transition that vertebrates made from water to land is determining the developmental and genomic bases for the changes. New bioinformatic tools provide an opportunity to automate the discovery, broaden the number of, and provide an evidence-based ranking for potential candidate genes. I sought to explore this potential for the fin/limb transition, using the substantial genetic and phenotypic data available in model organism databases. Model organism data was used to hypothesize candidate genes for the fin/limb transition. In addition, 131 fin/limb candidate genes from the literature were extracted and used as a basis for comparison with candidates from the model organism databases. Additionally, seven genes specific to limb and 24 genes specific to fin were identified as future fin/limb transition candidates.</p>
669	Genomic and phylogenetic assessment of sea lamprey (Petromyzon marinus) Hox genes and analysis of Hox genes in association with myomeres across multiple lamprey genera Childs, Darcy 22 August 2013 (has links) Lampreys are an important model for the study of early vertebrate development due to their unique evolutionary position as one of only two extant jawless vertebrates. In this study, 12 new putative Hox gene fragments were identified within the recently available Petromyzon marinus (sea lamprey) genome. These and the other previously-identified Hox genes were analyzed phylogenetically, which enabled the assignment of many of the new sequences to distinct paralogous gene clusters and showed distinctions between gnathostome and lamprey Hox sequences. An examination of Hox genes in other lamprey species was conducted using genomic PCR-based detection methods and identified 26 putative Hox gene homeobox fragments from multiple Hox genes across nine lamprey species. A study of Hox10 coding sequences in different lamprey species failed to find any correlation with variable numbers of trunk myomeres in lampreys, which suggests that other sequences or factors regulate the number of myomeres in different species. Molecular Lamprey Bioinformatics Phylogenetics Hox
670	Constructing Mathematical Models of Gene Regulatory Networks for the Yeast Cell Cycle and Other Periodic Processes Deckard, Anastasia January 2014 (has links) <p>We work on constructing mathematical models of gene regulatory networks for periodic processes, such as the cell cycle in budding yeast, using biological data sets and applying or developing analysis methods in the areas of mathematics, statistics, and computer science. We identify genes with periodic expression and then the interactions between periodic genes, which defines the structure of the network. This network is then translated into a mathematical model, using Ordinary Differential Equations (ODEs), to describe these entities and their interactions. The models currently describe gene regulatory interactions, but we are expanding to capture other events, such as phosphorylation and ubiquitination. To model the behavior, we must then find appropriate parameters for the mathematical model that allow its dynamics to approximate the biological data. </p><p>This pipeline for model construction is not focused on a specific algorithm or data set for each step, but instead on leveraging several sources of data and analysis from several algorithms. For example, we are incorporating data from multiple time series experiments, genome-wide binding experiments, computationally predicted binding, and regulation inference to identify potential regulatory interactions.</p><p>These approaches are designed to be applicable to various periodic processes in different species. While we have worked most extensively on models for the cell cycle in <italic>Saccharomyces cerevisiae</italic>, we have also begun working with data sets for the metabolic cycle in <italic>S. cerevisiae</italic>, and the circadian rhythm in <italic>Mus musculus</italic>.</p> / Dissertation Bioinformatics Computational Biology Systems Biology

Search results