Global ETD Search

11	Efficient network based approaches for pattern recognition and knowledge discovery from large and heterogeneous datasets Zhu, Cheng 25 October 2013 (has links) No description available. Computer Science Network approaches pattern recognition heterogeneous datasets rare orphan disease drug repositioning gene prediction
12	A Multimodal Graph Convolutional Approach to Predict Genes Associated with Rare Genetic Diseases Sahasrabudhe, Dhruva Shrikrishna 11 September 2020 (has links) There exist a large number of rare genetic diseases in humans. Our knowledge of the specific gene variants whose presence in the genome of a person predisposes them towards developing a disease, called gene associations, is incomplete. Computational tools which can predict genes which may be associated with a rare disease have great utility in healthcare. However, a majority of existing prediction algorithms require a set of already known "seed genes'' to further discover novel associations for a disease. This drawback becomes more serious for rare genetic diseases, since a large proportion do not have any known gene associations. In this work, we develop an approach for disease-gene association prediction that overcomes the reliance on seed genes. Our approach uses the similarity of the observable biological characteristics of diseases (i.e., phenotypes) along with a global map of direct and indirect human protein interactions, to transfer associations from diseases whose gene associations have been discovered to diseases with no known gene associations. We formulate disease-gene association prediction over a multimodal network of diseases and genes, and develop an approach based on graph convolutional networks. We show how our model design considerations impact prediction performance. We demonstrate that our approach outperforms simpler graph machine learning and traditional machine learning approaches, as well as a competitive network propagation based approach for the task of predicting disease-gene associations. / Master of Science / There exist a large number of rare genetic diseases in humans. Our knowledge of the specific gene variants whose presence in the genome of a person predisposes them towards developing a disease, called gene associations, is incomplete. Computational tools which can predict genes which may be associated with a rare disease have great utility in healthcare. However, a majority of existing prediction algorithms require a set of already known "seed genes'' to further discover novel associations for a disease. This drawback becomes more serious for rare genetic diseases, since a large proportion do not have any known gene associations. In this work, we develop an approach for disease-gene association prediction that overcomes the reliance on seed genes. Our approach uses the similarity of the observable biological characteristics of diseases (i.e. disease phenotypes) along with a global map of direct and indirect human protein interactions, to transfer gene associations from diseases whose gene associations have been discovered, to diseases with no known associations. We implement an approach based on the field of graph machine learning, namely graph convolutional networks, to predict the genes associated with rare genetic diseases. We show how our predictor performs, compared to other approaches, and analyze some of the choices made in the design of the predictor, along with some properties of the outputs of our predictor. Graph Machine Learning Disease Gene Prediction Graph Convolutional Networks Link Prediction Multimodal Networks
13	Gene Prediction with a Hidden Markov Model / Genvorhersage mit einem Hidden-Markow-Modell Stanke, Mario 21 January 2004 (has links) No description available. Mathematics and Computer Science Genvorhersage Hidden-Markow Eukaryoten 004 Informatik 54.80 31.80 AH
14	Prediction of mammalian essential genes based on sequence and functional features Kabir, Mitra January 2017 (has links) Essential genes are those whose presence is imperative for an organism's survival, whereas the functions of non-essential genes may be useful but not critical. Abnormal functionality of essential genes may lead to defects or death at an early stage of life. Knowledge of essential genes is therefore key to understanding development, maintenance of major cellular processes and tissue-specific functions that are crucial for life. Existing experimental techniques for identifying essential genes are accurate, but most of them are time consuming and expensive. Predicting essential genes using computational methods, therefore, would be of great value as they circumvent experimental constraints. Our research is based on the hypothesis that mammalian essential (lethal) and non-essential (viable) genes are distinguishable by various properties. We examined a wide range of features of Mus musculus genes, including sequence, protein-protein interactions, gene expression and function, and found 75 features that were statistically discriminative between lethal and viable genes. These features were used as inputs to create a novel machine learning classifier, allowing the prediction of a mouse gene as lethal or viable with the cross-validation and blind test accuracies of ∼91% and ∼93%, respectively. The prediction results are promising, indicating that our classifier is an effective mammalian essential gene prediction method. We further developed the mouse gene essentiality study by analysing the association between essentiality and gene duplication. Mouse genes were labelled as singletons or duplicates, and their expression patterns over 13 developmental stages were examined. We found that lethal genes originating from duplicates are considerably lower in proportion than singletons. At all developmental stages a significantly higher proportion of singletons and lethal genes are expressed than duplicates and viable genes. Lethal genes were also found to be more ancient than viable genes. In addition, we observed that duplicate pairs with similar patterns of developmental co-expression are more likely to be viable; lethal gene duplicate pairs do not have such a trend. Overall, these results suggest that duplicate genes in mouse are less likely to be essential than singletons. Finally, we investigated the evolutionary age of mouse genes across development to see if the morphological hourglass pattern exists in the mouse. We found that in mouse embryos, genes expressed in early and late stages are evolutionarily younger than those expressed in mid-embryogenesis, thus yielding an hourglass pattern. However, the oldest genes are not expressed at the phylotypic stage stated in prior studies, but instead at an earlier time point - the egg cylinder stage. These results question the application of the hourglass model to mouse development. 572
15	Bioremediation of Toxic Metals for Protecting Human Health and the Ecosystem Rahman, Aminur January 2016 (has links) Heavy metal pollutants, discharged into the ecosystem as waste by anthropogenic activities, contaminate drinking water for millions of people and animals in many regions of the world. Long term exposure to these metals, leads to several lethal diseases like cancer, keratosis, gangrene, diabetes, cardio- vascular disorders, etc. Therefore, removal of these pollutants from soil, water and environment is of great importance for human welfare. One of the possible eco-friendly solutions to this problem is the use of microorganisms that can accumulate the heavy metals from the contaminated sources, hence reducing the pollutant contents to a safe level. In this thesis an arsenic resistant bacterium Lysinibacillus sphaericus B1-CDA, a chromium resistant bacterium Enterobacter cloacae B2-DHA and a nickel resistant bacterium Lysinibacillus sp. BA2 were isolated and studied. The minimum inhibitory concentration values of these isolates are 500 mM sodium arsenate, 5.5 mM potassium chromate and 9 mM nickel chloride, respectively. The time of flight-secondary ion mass spectrometry and inductively coupled plasma-mass spectroscopy analyses revealed that after 120 h of exposure, the intracellular accumulation of arsenic in B1-CDA and chromium in B2-DHA were 5.0 mg/g dwt and 320 μg/g dwt of cell biomass, respectively. However, the arsenic and chromium contents in the liquid medium were reduced to 50% and 81%, respectively. The adsorption values of BA2 when exposed to nickel for 6 h were 238.04 mg of Ni(II) per gram of dead biomass indicating BA2 can reduce nickel content in the solution to 53.89%. Scanning electron micrograph depicted the effect of these metals on cellular morphology of the isolates. The genetic composition of B1-CDA and B2-DHA were studied in detail by sequencing of whole genomes. All genes of B1-CDA and B2-DHA predicted to be associated with resistance to heavy metals were annotated. The findings in this study accentuate the significance of these bacteria in removing toxic metals from the contaminated sources. The genetic mechanisms of these isolates in absorbing and thus removing toxic metals could be used as vehicles to cope with metal toxicity of the contaminated effluents discharged to the nature by industries and other human activities. Heavy Metals Pollution Accumulation Remediation Human Health Bacteria Genome Sequencing de novo Assembly Gene Prediction Other Biological Topics Annan biologi
16	Bioremediation of Toxic Metals for Protecting Human Health and the Ecosystem Rahman, Aminur January 2016 (has links) Heavy metal pollutants, discharged into the ecosystem as waste by anthropogenic activities, contaminate drinking water for millions of people and animals in many regions of the world. Long term exposure to these metals, leads to several lethal diseases like cancer, keratosis, gangrene, diabetes, cardio- vascular disorders, etc. Therefore, removal of these pollutants from soil, water and environment is of great importance for human welfare. One of the possible eco-friendly solutions to this problem is the use of microorganisms that can accumulate the heavy metals from the contaminated sources, hence reducing the pollutant contents to a safe level. In this thesis an arsenic resistant bacterium Lysinibacillus sphaericus B1-CDA, a chromium resistant bacterium Enterobacter cloacae B2-DHA and a nickel resistant bacterium Lysinibacillus sp. BA2 were isolated and studied. The minimum inhibitory concentration values of these isolates are 500 mM sodium arsenate, 5.5 mM potassium chromate and 9 mM nickel chloride, respectively. The time of flight-secondary ion mass spectrometry and inductively coupled plasma-mass spectroscopy analyses revealed that after 120 h of exposure, the intracellular accumulation of arsenic in B1-CDA and chromium in B2-DHA were 5.0 mg/g dwt and 320 μg/g dwt of cell biomass, respectively. However, the arsenic and chromium contents in the liquid medium were reduced to 50% and 81%, respectively. The adsorption values of BA2 when exposed to nickel for 6 h were 238.04 mg of Ni(II) per gram of dead biomass indicating BA2 can reduce nickel content in the solution to 53.89%. Scanning electron micrograph depicted the effect of these metals on cellular morphology of the isolates. The genetic composition of B1-CDA and B2-DHA were studied in detail by sequencing of whole genomes. All genes of B1-CDA and B2-DHA predicted to be associated with resistance to heavy metals were annotated. The findings in this study accentuate the significance of these bacteria in removing toxic metals from the contaminated sources. The genetic mechanisms of these isolates in absorbing and thus removing toxic metals could be used as vehicles to cope with metal toxicity of the contaminated effluents discharged to the nature by industries and other human activities. Heavy Metals Pollution Accumulation Remediation Human Health Bacteria Genome Sequencing de novo Assembly Gene Prediction Other Biological Topics Annan biologi
17	DEVELOPMENT OF COMPUTATIONAL APPROACHES FOR MEDICAL IMAGE RETRIEVAL, DISEASE GENE PREDICTION, AND DRUG DISCOVERY Chen, Yang 03 September 2015 (has links) No description available. Computer Science Biomedical Research Computer Engineering computational approach translational biomedical research image retrieval disease gene prediction drug repositioning
18	Improving algorithms of gene prediction in prokaryotic genomes, metagenomes, and eukaryotic transcriptomes Tang, Shiyuyun 27 May 2016 (has links) Next-generation sequencing has generated enormous amount of DNA and RNA sequences that potentially carry volumes of genetic information, e.g. protein-coding genes. The thesis is divided into three main parts describing i) GeneMarkS-2, ii) GeneMarkS-T, and iii) MetaGeneTack. In prokaryotic genomes, ab initio gene finders can predict genes with high accuracy. However, the error rate is not negligible and largely species-specific. Most errors in gene prediction are made in genes located in genomic regions with atypical GC composition, e.g. genes in pathogenicity islands. We describe a new algorithm GeneMarkS-2 that uses local GC-specific heuristic models for scoring individual ORFs in the first step of analysis. Predicted atypical genes are retained and serve as ‘external’ evidence in subsequent runs of self-training. GeneMarkS-2 also controls the quality of training process by effectively selecting optimal orders of the Markov chain models as well as duration parameters in the hidden semi-Markov model. GeneMarkS-2 has shown significantly improved accuracy compared with other state-of-the-art gene prediction tools. Massive parallel sequencing of RNA transcripts by the next generation technology (RNA-Seq) provides large amount of RNA reads that can be assembled to full transcriptome. We have developed a new tool, GeneMarkS-T, for ab initio identification of protein-coding regions in RNA transcripts. Unsupervised estimation of parameters of the algorithm makes unnecessary several steps in the conventional gene prediction protocols, most importantly the manually curated preparation of training sets. We have demonstrated that the GeneMarkS-T self-training is robust with respect to the presence of errors in assembled transcripts and the accuracy of GeneMarkS-T in identifying protein-coding regions and, particularly, in predicting gene starts compares favorably to other existing methods. Frameshift prediction (FS) is important for analysis and biological interpretation of metagenomic sequences. Reads in metagenomic samples are prone to sequencing errors. Insertion and deletion errors that change the coding frame impair the accurate identification of protein coding genes. Accurate frameshift prediction requires sufficient amount of data to estimate parameters of species-specific statistical models of protein-coding and non-coding regions. However, this data is not available; all we have is metagenomic sequences of unknown origin. The challenge of ab initio FS detection is, therefore, twofold: (i) to find a way to infer necessary model parameters and (ii) to identify positions of frameshifts (if any). We describe a new tool, MetaGeneTack, which uses a heuristic method to estimate parameters of sequence models used in the FS detection algorithm. It was shown on several test sets that the performance of MetaGeneTack FS detection is comparable or better than the one of earlier developed program FragGeneScan. Gene prediction Genome annotation Prokaryotic genomes Ribosomal binding site Hidden Markov models Adaptive training Unsupervised self-training Heuristic models RNA-Seq RNA transcripts Frameshift prediction Metagenomics
19	MYOP/ToPS/SGEval: Um ambiente computacional para estudo sistemático de predição de genes / MYOP/ToPS/SGEval: A computational framework for gene prediction Kashiwabara, André Yoshiaki 10 February 2012 (has links) O desafio de encontrar corretamente genes eucarioticos codificadores de proteinas nas sequencias genomicas e um problema em aberto. Neste trabalho, implementamos uma plata- forma, com o objetivo de melhorar a forma com que preditores de genes sao implementados e avaliados. Tres novas ferramentas foram implementadas: ToPS (Toolkit of Probabilistic Models of Sequences) foi o primeiro arcabouco orientado a objetos que fornece ferramentas para implementacao, manipulacao, e combinacao de modelos probabilisticos para representar sequencias de simbolos; MYOP (Make Your Own Predictor) e um sistema que tem como objetivo facilitar a construcao de preditores de genes; e SGEval utiliza grafos de splicing para comparar diferente anotacoes com eventos de splicing alternativos. Utilizamos nossas ferramentas para o desenvolvimentos de preditores de genes em onze genomas distintos: A. thaliana, C. elegans, Z. mays, P. falciparum, D. melanogaster, D. rerio, M. musculus, R. norvegicus, O. sativa, G. max e H. sapiens. Com esse desenvolvimento, estabelecemos um protocolo para implementacao de novos preditores. Alem disso, utilizando a nossa plata- forma, desenvolvemos um fluxo de trabalho para predicao de genes no projeto do genoma da cana de acucar, que ja foi utilizado em 109 sequencias de BAC geradas pelo BIOEN (FAPESP Bioenergy Program). / The challenge of correctly identify eukaryotic protein-coding genes in the genomic se- quences is an open problem. In this work, we implemented a plataform with the aim of improving the way that gene predictors are implemented and evaluated. ToPS (Toolkit of Probabilistic Models of Sequence) was the first object-oriented framework that provides tools for implementation, manipulation, and combination of probabilistic models that represent sequences of symbols. MYOP (Make Your Own Predictor) facilitates the construction of gene predictors. SGEval (Splicing Graph Evaluation) uses splicing graphs to compare dif- ferent annotations with alternative splicing events. We used our plataform to develop gene finders in eleven distinct genomes: A. thaliana, C. elegans, Z. mays, P. falciparum, D. me- lanogaster, D. rerio, M. musculus, R. norvegicus, O. sativa, G. max e H. sapiens. With this development, we established a protocol for implementing new gene predictors. In addi- tion, using our platform, we developed a pipeline to find genes in the 109 sugarcane BAC sequences produced by BIOEN (FAPESP Bioenergy Program). ab initio gene prediction Bioinformatica. bioinformatics. cadeia de Markov oculta generalizada generalized hidden Markov models modelos probabilisticos predicao ab initio de genes probabilistic models
20	Probabilistic Methods for Computational Annotation of Genomic Sequences / Probabilistische Methoden für computergestützte Genom-Annotation Keller, Oliver 26 January 2011 (has links) No description available. Genvorhersage Protein-Klassifikation Hidden-Markov-Modelle semi-Markov-Ketten Genomannotation gene prediction protein classification hidden Markov models semi-Markov chains genome annotation

Search results