Spelling suggestions: "subject:"[een] BIOINFORMATICS"" "subject:"[enn] BIOINFORMATICS""
911 |
Microarray analysis using pattern discoveryBainbridge, Matthew Neil 10 December 2004
Analysis of gene expression microarray data has traditionally been conducted using hierarchical clustering. However, such analysis has many known disadvantages and pattern discovery (PD) has been proposed as an alternative technique. In this work, three similar but different PD algorithms Teiresias, Splash and Genes@Work were benchmarked for time and memory efficiency on a small yeast cell-cycle data set. Teiresias was found to be the fastest, and best over-all program. However, Splash was more memory efficient. This work also investigated the performance of four methods of discretizing microarray data: sign-of-the-derivative, K-means, pre-set value, and Genes@Work stratification. The first three methods were evaluated on their predisposition to group together biologically related genes. On a yeast cell-cycle data set, sign-of-the-derivative method yielded the most biologically significant patterns, followed by the pre-set value and K-means methods. K-means, preset-value, and Genes@Work were also compared on their ability to classify tissue samples from diffuse large b-cell lymphoma (DLBCL) into two subtypes determined by standard techniques. The Genes@Work stratification method produced the best patterns for discriminating between the two subtypes of lymphoma. However, the results from the second-best method, K-means, call into question the accuracy of the classification by the standard technique. Finally, a number of recommendations for improvement of pattern discovery algorithms and discretization techniques are made.
|
912 |
The Sequence and Function Relationship of Elastin: How Repetitive Sequences can Influence the Physical Properties of ElastinHe, David 09 January 2012 (has links)
Elastin is an essential extracellular protein that is a key component of elastic fibres, providing elasticity to cardiac, dermal, and arterial tissues. During the development of the human cardiovascular system, elastin self-assembles before being integrated into fibres, undergoing no significant turnover during the human lifetime. Abnormalities in elastin can adversely affect its self-assembly, and may lead to malformed elastic fibres. Due to the longevity required of these fibres, even minor abnormalities may have a large cumulative effect over the course of a lifetime, leading to late-onset vascular diseases. This thesis project has identified important, over-represented repetitive elements in elastin which are believed to be important for the self-assembly and elastomeric properties of elastin. Initial studies of single nucleotide polymorphisms (SNPs) from the HapMap project and dbSNP resulted in a set of genetic variation sites in the elastin gene. Based on these studies, glycine to serine and lysine to arginine substitutions were introduced in elastin-like polypeptides. The self-assembly properties of the resulting elastin-like polypeptides were observed under microscope and measured using absorbance at 440nm. Assembled polypeptides were also cross-linked to form thin membranes whose mechanical and physical properties were measured and compared. These mutations resulted in markedly different behavior than wild-type elastin-like proteins, suggesting that mutations in the repetitive elements of the elastin sequence can lead to adverse changes in the physical and functional properties of the resulting protein. Using next-generation sequencing, patients with thoracic aortic aneurysms are being genotyped to discover polymorphisms which may adversely affect the self-assembly properties of elastin, providing a link between genetic variation in elastin and cardiovascular disease.
|
913 |
Machine Learning Approaches to Biological Sequence and Phenotype Data AnalysisMin, Renqiang 17 February 2011 (has links)
To understand biology at a system level, I presented novel machine learning algorithms to reveal the underlying mechanisms of how genes and their products function in different biological levels in this thesis. Specifically, at sequence level, based on Kernel Support Vector Machines (SVMs), I proposed learned random-walk kernel and learned empirical-map kernel to identify protein remote homology solely based on sequence data, and I proposed a discriminative motif discovery algorithm to identify sequence motifs that characterize
protein sequences' remote homology membership. The proposed approaches significantly outperform previous methods, especially on some challenging protein families. At expression and protein level, using hierarchical Bayesian graphical models, I developed the first high-throughput computational predictive model to filter sequence-based predictions of microRNA targets by incorporating the
proteomic data of putative microRNA target genes, and I proposed another probabilistic model to explore the underlying mechanisms of microRNA regulation by combining the expression profile data of messenger RNAs and microRNAs. At cellular level, I further investigated how yeast genes manifest their
functions in cell morphology by performing gene function prediction from the morphology data of yeast temperature-sensitive alleles. The developed prediction models enable biologists to choose some interesting yeast
essential genes and study their predicted novel functions.
|
914 |
The Sequence and Function Relationship of Elastin: How Repetitive Sequences can Influence the Physical Properties of ElastinHe, David 09 January 2012 (has links)
Elastin is an essential extracellular protein that is a key component of elastic fibres, providing elasticity to cardiac, dermal, and arterial tissues. During the development of the human cardiovascular system, elastin self-assembles before being integrated into fibres, undergoing no significant turnover during the human lifetime. Abnormalities in elastin can adversely affect its self-assembly, and may lead to malformed elastic fibres. Due to the longevity required of these fibres, even minor abnormalities may have a large cumulative effect over the course of a lifetime, leading to late-onset vascular diseases. This thesis project has identified important, over-represented repetitive elements in elastin which are believed to be important for the self-assembly and elastomeric properties of elastin. Initial studies of single nucleotide polymorphisms (SNPs) from the HapMap project and dbSNP resulted in a set of genetic variation sites in the elastin gene. Based on these studies, glycine to serine and lysine to arginine substitutions were introduced in elastin-like polypeptides. The self-assembly properties of the resulting elastin-like polypeptides were observed under microscope and measured using absorbance at 440nm. Assembled polypeptides were also cross-linked to form thin membranes whose mechanical and physical properties were measured and compared. These mutations resulted in markedly different behavior than wild-type elastin-like proteins, suggesting that mutations in the repetitive elements of the elastin sequence can lead to adverse changes in the physical and functional properties of the resulting protein. Using next-generation sequencing, patients with thoracic aortic aneurysms are being genotyped to discover polymorphisms which may adversely affect the self-assembly properties of elastin, providing a link between genetic variation in elastin and cardiovascular disease.
|
915 |
Genetic Analysis of Stem Composition Variation in Sorghum BicolorEvans, Joseph 2012 August 1900 (has links)
Sorghum (Sorghum bicolor [L.] Moench) is the world's fifth most economically important cereal crop, grown worldwide as a source of food for both humans and livestock. Sorghum is a C4 grass that is well adapted to hot and arid climes and is popular for cultivation on lands of marginal quality. Recent interest in development of biofuels from lignocellulosic biomass has drawn attention to sorghum, which can be cultivated in areas not suitable for more traditional crops, and is capable of generating plant biomass in excess of 40 tons per acre. While the quantity of biomass and low water consumption make sorghum a viable candidate for biofuels growth, the biomass composition is enriched in lignin, which is problematic for enzymatic and chemical conversion techniques.
The genetic basis for stem composition was analyzed in sorghum populations using a combination of genetic, genomic, and bioinformatics techniques. Utilizing acetyl bromide extraction, the variation in stem lignin content was quantified across several sorghum cultivars, confirming that lignin content varied considerably among sorghum cultivars. Previous work identifying sorghum reduced-lignin lines has involved the monolignol biosynthetic pathway; all steps in the pathway were putatively identified in the sorghum genome using sequence analysis.
A bioinformatics toolkit was constructed to allow for the development of genetic markers in sorghum populations, and a database and web portal were generated to allow users to access previously developed genetic markers. Recombinant inbred lines were analyzed for stem composition using near infrared reflectance spectroscopy (NIR) and genetic maps constructed using restriction site-linked polymorphisms, revealing 34 quantitative trail loci (QTL) for stem composition variation in a BTx642 x RTx7000 population, and six QTL for stem composition variation in an SC56 x RTx7000 population.
Sequencing the genome of BTx642 and RTx7000 to a depth of ~11x using Illumina sequencing revealed approximately 1.4 million single nucleotide polymorphisms (SNPs) and 1 million SNPs, respectively. These polymorphisms can be used to identify putative amino acid changes in genes within these genotypes, and can also be used for fine mapping. Plotting the density of these SNPs revealed patterns of genetic inheritance from shared ancestral lines both between the newly sequenced genotypes and relative to the reference genotype BTx623.
|
916 |
On computational strategies for regulatory element and regulatory polymorphism detectionMontgomery, Stephen 11 1900 (has links)
Identification of the mechanisms by which genes are regulated in eukaryotes is one of the principal challenges of modern biology. The emergence of genome sequencing has facilitated the marked expansion of experimental and computational approaches designed to address this challenge. Integrating and assessing this information remains a major scientific endeavor that requires new and innovative application of technology. Furthermore, our limited understanding of the mechanisms of gene regulation in eukaryotes has undermined our ability to understand the role of genetics in gene regulation. Regulatory variants are thought to be responsible for a considerable amount of the heterogeneity within our population and to be fundamental determinants of health. New experimental approaches offer the opportunity to effectively identify markers of disease susceptibility in gene regulatory regions but the discovery of the molecular mechanism of dysregulation remains difficult and time-consuming. It is here where computational approaches are required to prioritize candidate regulatory variants. To do so requires the development of an extensive control set from which characteristic signals can be identified.
This thesis introduces novel approaches for discovering, utilizing, comparing and visualizing regulatory element predictions in completed genomes. This thesis also introduces novel bioinformatics infrastructure for curating regulatory element and variant datasets, and introduces the largest-available, open-access dataset of functional regulatory variants hand-curated from literature. This dataset is used to identify signals which discriminate functional variants from other variants in the promoter regions of human genes using regulatory and population genetics-based computational approaches.
|
917 |
On computational strategies for regulatory element and regulatory polymorphism detectionMontgomery, Stephen 11 1900 (has links)
Identification of the mechanisms by which genes are regulated in eukaryotes is one of the principal challenges of modern biology. The emergence of genome sequencing has facilitated the marked expansion of experimental and computational approaches designed to address this challenge. Integrating and assessing this information remains a major scientific endeavor that requires new and innovative application of technology. Furthermore, our limited understanding of the mechanisms of gene regulation in eukaryotes has undermined our ability to understand the role of genetics in gene regulation. Regulatory variants are thought to be responsible for a considerable amount of the heterogeneity within our population and to be fundamental determinants of health. New experimental approaches offer the opportunity to effectively identify markers of disease susceptibility in gene regulatory regions but the discovery of the molecular mechanism of dysregulation remains difficult and time-consuming. It is here where computational approaches are required to prioritize candidate regulatory variants. To do so requires the development of an extensive control set from which characteristic signals can be identified.
This thesis introduces novel approaches for discovering, utilizing, comparing and visualizing regulatory element predictions in completed genomes. This thesis also introduces novel bioinformatics infrastructure for curating regulatory element and variant datasets, and introduces the largest-available, open-access dataset of functional regulatory variants hand-curated from literature. This dataset is used to identify signals which discriminate functional variants from other variants in the promoter regions of human genes using regulatory and population genetics-based computational approaches.
|
918 |
Tracing the molecular and evolutionary determinants of novel functions in protein familiesDoxey, Andrew Charles January 2010 (has links)
This thesis explores the limits of homology-based inference of protein function and evolution, where overall similarity between sequences can be a poor indicator of functional similarity or evolutionary relationships. Each case presented has undergone different patterns of evolutionary change due to differing selective pressures. Surface adaptations and regulatory (e.g., gene expression) divergence are examined as molecular determinants of novel functions whose patterns are easily missed by assessments of overall sequence similarity. Following this, internal repeats and mosaic sequences are investigated as cases in which key evolutionary events involving fragments of protein sequences are masked by overall comparison. Lastly, virulence factors, which cannot be unified based on sequence, are predicted by analysis of elevated host-mimicry patterns in pathogenic versus non-pathogenic bacterial genomes. These patterns have resulted from unique co-evolutionary pressures that apply to bacterial pathogens, but may be lacking in their close relatives. A recurring theme in the proteins/genes/genomes analyzed is an involvement in microbial pathogenesis or pathogen-defense. Due to the ongoing "evolutionary arms race" between hosts and pathogens, virulence and defense proteins have undergone—and will likely continue to generate—evolutionary novelties. Thus, they demonstrate the necessity to look beyond overall sequence comparison, and assess multiple dimensions of functional innovation in proteins.
|
919 |
Algorithms for DNA Sequence Assembly and Motif SearchDinh, Hieu Trung 10 January 2013
Algorithms for DNA Sequence Assembly and Motif Search
|
920 |
Computational discovery and analysis of metabolic pathwaysJanuary 2010 (has links)
Finding novel or non-standard metabolic pathways, possibly spanning multiple species, has important applications in fields such as metabolic engineering, metabolic network analysis, and metabolic network reconstruction. Traditionally, this has been a manual process, but the large volume of metabolic data now available has created a need for computational tools to automatically identify biologically relevant pathways. This thesis presents new algorithms for automatically finding biologically meaningful linear and branched metabolic pathways in multi-genome scale metabolic networks. These algorithms utilize atom mapping data, which provides the correspondence between atoms in the substrates to atoms in the products of a chemical reaction, to find pathways which conserve a given number of atoms between desired start and target compounds.
The first algorithm presented identifies atom conserving linear pathways by explicitly tracking atoms during an exploration of a graph structure constructed from the atom mapping data. The explicit tracking of atoms enables finding branched pathways because it provides automatic identification of the reactions and compounds through which atoms are lost or gained. The thesis then describes two algorithmic approaches for identifying branched metabolic pathways based upon atom conserving linear pathways. One approach takes one linear pathway at a time and attempts to add branches that connect loss and gain compounds. The other approach takes a group of linear pathways and attempts to merge pathways that move mutually exclusive sets of atoms from the start to the target compounds. Comparisons to known metabolic pathways demonstrate that atom tracking causes the algorithms to avoid many unrealistic connections, often found in previous approaches, and return biologically meaningful pathways. While the theoretical complexity of finding even linear atom conserving pathways is high, by choosing the appropriate representations and heuristics, and perhaps due to the structure of the underlying data, the algorithms in this thesis have practical running times on real data. The results also demonstrate the potential of the algorithms to find novel or non-standard pathways that may span multiple organisms.
|
Page generated in 0.0798 seconds