Global ETD Search

11	Improving computational predictions of Cis-regulatory binding sites in genomic data Rezwan, Faisal Ibne January 2011 (has links) Cis-regulatory elements are the short regions of DNA to which specific regulatory proteins bind and these interactions subsequently influence the level of transcription for associated genes, by inhibiting or enhancing the transcription process. It is known that much of the genetic change underlying morphological evolution takes place in these regions, rather than in the coding regions of genes. Identifying these sites in a genome is a non-trivial problem. Experimental (wet-lab) methods for finding binding sites exist, but all have some limitations regarding their applicability, accuracy, availability or cost. On the other hand computational methods for predicting the position of binding sites are less expensive and faster. Unfortunately, however, these algorithms perform rather poorly, some missing most binding sites and others over-predicting their presence. The aim of this thesis is to develop and improve computational approaches for the prediction of transcription factor binding sites (TFBSs) by integrating the results of computational algorithms and other sources of complementary biological evidence. Previous related work involved the use of machine learning algorithms for integrating predictions of TFBSs, with particular emphasis on the use of the Support Vector Machine (SVM). This thesis has built upon, extended and considerably improved this earlier work. Data from two organisms was used here. Firstly the relatively simple genome of yeast was used. In yeast, the binding sites are fairly well characterised and they are normally located near the genes that they regulate. The techniques used on the yeast genome were also tested on the more complex genome of the mouse. It is known that the regulatory mechanisms of the eukaryotic species, mouse, is considerably more complex and it was therefore interesting to investigate the techniques described here on such an organism. The initial results were however not particularly encouraging: although a small improvement on the base algorithms could be obtained, the predictions were still of low quality. This was the case for both the yeast and mouse genomes. However, when the negatively labeled vectors in the training set were changed, a substantial improvement in performance was observed. The first change was to choose regions in the mouse genome a long way (distal) from a gene over 4000 base pairs away - as regions not containing binding sites. This produced a major improvement in performance. The second change was simply to use randomised training vectors, which contained no meaningful biological information, as the negative class. This gave some improvement over the yeast genome, but had a very substantial benefit for the mouse data, considerably improving on the aforementioned distal negative training data. In fact the resulting classifier was finding over 80% of the binding sites in the test set and moreover 80% of the predictions were correct. The final experiment used an updated version of the yeast dataset, using more state of the art algorithms and more recent TFBSs annotation data. Here it was found that using randomised or distal negative examples once again gave very good results, comparable to the results obtained on the mouse genome. Another source of negative data was tried for this yeast data, namely using vectors taken from intronic regions. Interestingly this gave the best results. 572.072
12	The de novo Prediction of Functionally Significant Sequence Motifs in Arabidopsis thaliana. Austin, Ryan 18 February 2010 (has links) This thesis performs de novo predictions for functionally significant sequence motifs in the Arabidopsis genome under two separate contexts. Each study applies the use of genomic positional information, statistical over-representation and several biologically contextual filters to maximize the visibility of biological signal in prediction results. Numerous literature supported motifs are prevalent in the results of both studies and a number of novel motif patterns possess a strong potential for in planta significance. The first study examines the statistical over-representation of C-terminal tripeptides as a means for identifying eukaryotic conserved protein targetting signatures. Comparative genomics is applied to the analysis of tripeptide frequencies in the C-terminus of 7 eukaryotic proteomes. While biological signal is maximized through the filtering of both simple sequences and homologous sequences present across protein families. The second study introduces a methodology for the effective prediction of transcription factor binding sites in Arabidopsis. A collection of motif prediction algorithms and a novel enumerative strategy are applied to the prediction of cis-acting regulatory elements within the promoters of genes found coexpressed within distinct tissues and under specific abiotic stress treatments. Overall, the analysis identifies 4 known motifs in expected contexts, 5 known motifs in novel contexts and 7 novel motifs with a high potential for biological function. arabidopsis transcription cis elements regulatory sequences c-terminus bioinformatics genomics microarray motif abiotic stress tissue-specific transcription factor binding site 0715
13	The de novo Prediction of Functionally Significant Sequence Motifs in Arabidopsis thaliana. Austin, Ryan 18 February 2010 (has links) This thesis performs de novo predictions for functionally significant sequence motifs in the Arabidopsis genome under two separate contexts. Each study applies the use of genomic positional information, statistical over-representation and several biologically contextual filters to maximize the visibility of biological signal in prediction results. Numerous literature supported motifs are prevalent in the results of both studies and a number of novel motif patterns possess a strong potential for in planta significance. The first study examines the statistical over-representation of C-terminal tripeptides as a means for identifying eukaryotic conserved protein targetting signatures. Comparative genomics is applied to the analysis of tripeptide frequencies in the C-terminus of 7 eukaryotic proteomes. While biological signal is maximized through the filtering of both simple sequences and homologous sequences present across protein families. The second study introduces a methodology for the effective prediction of transcription factor binding sites in Arabidopsis. A collection of motif prediction algorithms and a novel enumerative strategy are applied to the prediction of cis-acting regulatory elements within the promoters of genes found coexpressed within distinct tissues and under specific abiotic stress treatments. Overall, the analysis identifies 4 known motifs in expected contexts, 5 known motifs in novel contexts and 7 novel motifs with a high potential for biological function. arabidopsis transcription cis elements regulatory sequences c-terminus bioinformatics genomics microarray motif abiotic stress tissue-specific transcription factor binding site 0715
14	Computational and experimental approaches to regulatory genetic variation Andersen, Malin January 2007 (has links) Genetic variation is a strong risk factor for many human diseases, including diabetes, cancer, cardiovascular disease, depression, autoimmunity and asthma. Most of the disease genes identified so far alter the amino acid sequences of encoded proteins. However, a significant number of genetic variants affecting complex diseases may alter the regulation of gene transcription. The map of the regulatory elements in the human genome is still to a large extent unknown, and it remains a challenge to separate the functional regulatory genetic variations from linked neutral variations. The objective of this thesis was to develop methods for the identification of genetic variation with a potential to affect the transcriptional regulation of human genes, and to analyze potential regulatory polymorphisms in the CD36 glycoprotein, a candidate gene for cardiovascular disease. An in silico tool for the prediction of regulatory polymorphisms in human genes was implemented and is available at www.cisreg.ca/RAVEN. The tool was evaluated using experimentally verified regulatory single nucleotide polymorphisms (SNPs) collected from the scientific literature, and tested in combination with experimental detection of allele specific expression of target genes (allelic imbalance). Regulatory SNPs were shown to be located in evolutionary conserved regions more often than background SNPs, but predicted transcription factor binding sites were unable to enrich for regulatory SNPs unless additional information linking transcription factors with the target genes were available. The in silico tool was applied to the CD36 glycoprotein, a candidate gene for cardiovascular disease. Potential regulatory SNPs in the alternative promoters of this gene were identified and evaluated in vitro and in vivo using a clinical study for coronary artery disease. We observed association to the plasma concentrations of inflammation markers (serum amyloid A protein and C-reactive protein) in myocardial infarction patients, which highlights the need for further analyses of potential regulatory polymorphisms in this gene. Taken together, this thesis describes an in silico approach to identify putative regulatory polymorphisms which can be useful for directing limited laboratory resources to the polymorphisms most likely to have a phenotypic effect. Molecular biology Genetics single nucleotide polymprhism (SNP) regulatory SNP transcription factor binding site phylogenetic footprinting allelic imbalance EMSA CD36 cardiovascular disease. Genteknik inkl. funktionsgenomik
15	Transcriptional regulatory networks in the mouse hippocampus MacPherson, Cameron Ross January 2007 (has links) Magister Scientiae - MSc / Neurological diseases are socially disabling and often mortal. To efficiently combat these diseases, a deep understanding of involved cellular processes, gene functions and anatomy is required. However, differential regulation of genes across anatomy is not sufficiently well understood. This study utilized large-scale gene expression data to define the regulatory networks of genes expressing in the hippocampus to which multiple disease pathologies may be associated. Specific aims were: ident i fy key regulatory transcription factors (TFs) responsible for observed gene expression patterns, reconstruct transcription regulatory networks, and prioritize likely TFs responsible for anatomically restricted gene expression. Most of the analysis was restricted to the CA3 sub-region of Ammon’s horn within the hippocampus. We identified 155 core genes expressing throughout the CA3 sub-region and predicted corresponding TF binding site (TFBS) distributions. Our analysis shows plausible transcription regulatory networks for twelve clusters of co-expressed genes. We demonstrate the validity of the predictions by re-clustering genes based on TFBS distributions and found that genes tend to be correctly assigned to groups of previously identified co-expressing genes with sensitivity of 67.74% and positive predictive value of 100%. Taken together, this study represents one of the first to merge anatomical architecture, expression profiles and transcription regulatory potential on such a large scale in hippocampal sub-anatomy. / South Africa Brain / Neuro-anatomy Hippocampus Ammon's horn Neurodegenerative disease Alzheimer's disease Gene expression Transcription regulation Regulatory potential Transcription factor Promoter Transcription factor binding site Transcription regulatory network Allen Brain Atlas

Page generated in 0.107 seconds