Spelling suggestions: "subject:"gene prediction"" "subject:"ene prediction""
1 |
A Simple and Fast Homology-Based Gene Prediction in Mitochondrial GenomesHajianpour, Amirhossein 21 December 2021 (has links)
With the abundance of genomic data after the Human Genome Project, the need for analysis, and annotation of these data arise. Annotation of genomes helps us understand the functionality of different parts of the genomes of various species. In this thesis, we propose a simple, and fast homology-based gene prediction method called Exon Hunter (EH) that achieves a performance comparable with state-of-the-art methods in mitochondrial genomes. Mitochondria are crucial for a eukaryotic cell, and mutation in its DNA has connections with disorders such as Alzheimer and cancer. We used Hidden Markov Model (HMM) Protein Profile of a number of genes to search for protein-coding genes in different genomes. Our method forms every subset of the hit set, and calculates a score for each subset according to an objective function. Then it chooses the subset with the\ highest score. Finally, we analyze the codon usage bias of our dataset, and we discuss how it can help us improve this prediction. ExonHunter is written in Python and is publicly available on github.com/amirh-hajianpour/ExonHunter.
|
2 |
Meta State Generalized Hidden Markov Model for Eukaryotic Gene Structure IdentificationBaribault, Carl 20 December 2009 (has links)
Using a generalized-clique hidden Markov model (HMM) as the starting point for a eukaryotic gene finder, the objective here is to strengthen the signal information at the transitions between coding and non-coding (c/nc) regions. This is done by enlarging the primitive hidden states associated with individual base labeling (as exon, intron, or junk) to substrings of primitive hidden states or footprint states. Moreover, the allowed footprint transitions are restricted to those that include either one c/nc transition or none at all. (This effectively imposes a minimum length on exons and the other regions.) These footprint states allow the c/nc transitions to be seen sooner and have their contributions to the gene-structure identification weighted more heavily – yet contributing as such with a natural weighting determined by the HMM model itself according to the training data – rather than via introducing an artificial gain-parameter tuning on major transitions. The selection of the generalized HMM model is interpolated to highest Markov order on emission probabilities, and to highest Markov order (subsequence length) on the footprint states. The former is accomplished via simple count cutoff rules, the latter via an identification of anomalous base statistics near the major transitions using Shannon entropy. Preliminary indications, from applications to the C. elegans genome, are that the sensitivity/specificity (SN/SP) result for both the individual state and full exon predictions are greatly enhanced using the generalized-clique HMM when compared to the standard HMM. Here the standard HMM is represented by the choice of the smallest size of footprint state in the generalized-clique HMM. Even with these improvements, we observe that both extremely long and short exon and intron segments would go undetected without an explicit model of the duration of state. The key contributions of this effort are the full derivation and experimental confirmation of a rudimentary, yet powerful and competitive gene finding method based on a higher order hidden Markov model. With suitable extensions, this method is expected to provide superior gene finding capability – not only in the context of pre-conditioned data sets as in the evaluations cited but also in the wider context of less preconditioned and/or raw genomic data.
|
3 |
Análise gênica de comorbidades a partir da integração de dados epidemiológicos / Comorbidities genetic analysis from epidemological data integrationFerraz Néto, Karla 01 December 2014 (has links)
A identificação de genes responsáveis por doenças humanas pode fornecer conhecimentos sobre mecanismos patológicos e psicológicos que são essenciais para o desenvolvimento de novos diagnósticos e terapias. Sabemos que uma doença é raramente uma consequência de uma anormalidade num único gene, porém reflete desordens de uma rede intra e intercelular complexa. Muitas metodologias conhecidas na Bioinformática são capazes de priorizar genes relacionados a uma determinada doença. Algumas abordagens também podem validar a pertinência ou não destes genes em relação à doença estudada. Uma abordagem de priorização de genes é a investigação a partir de doenças que acometem pacientes ao mesmo tempo, as comorbidades. Existem muitas fontes de dados biomédicos que podem ser utilizadas para a coleta de comorbidades. Desta forma, podemos coletar pares de doenças que formam comorbidades epidemiológicas e assim analisar os genes de cada doença. Esta análise serve para expandirmos a lista de genes candidatos de cada uma dessas doenças e justificarmos a relação gênica entre essas comorbidades. O objetivo principal deste projeto é o de integração dos dados epidemiológicos e genéticos para a realização da predição de genes causadores de doenças. Isto se dará através do estudo de comorbidade destas doenças. / The identification of genes responsible for human diseases can provide knowledge about pathological and physiological mechanisms that are essential for the development of new diagnostics and therapeutics. It is known that a disease is rarely a consequence of an abnormality in a single gene, but reflects complex intra and intercellular network disorders. Many methodologies known in Bioinformatics are able to prioritize genes related to a particular disease. Some approaches can also validate how appropriate or not these genes are relative to a disease. An approach for prioritizing genes is the research from diseases afecting patients at the same time, i.e. comorbidities. There are many sources of biomedical data that can be used to collect comorbidities and analyse genes of each disease. We can also expand the list of candidate genes for each singular disease and justify the genetic relationship of these comorbidities. The main objective of this project is the integration of epidemiologic and genetic data to perform the prediction of causing genes through the study of comorbidity of these illnesses.
|
4 |
Applying Bioinformatic Techniques to Identify Cold-associated Genes in OatThorburn, Henrik January 2002 (has links)
<p>As the interest in biological sequence analysis increases, more efficient techniques to sequence, map and analyse genome data are needed. One frequently used technique is EST sequencing, which has proven to be a fast and cheap method to extract genome data. An EST sequencing generates large numbers of low-quality sequences which have to be managed and analysed further.</p><p>Performing complete searches and finding guaranteed results are very time consuming. This dissertation project presents a method that can be used to perform rapid gene prediction of function-specific genes in EST data, as well as the results and an estimation of the accuracy of the method.</p><p>This dissertation project applies various methods and techniques on actual data, attempting to identify genes involved in cold-associative processes in plants. The presented method consists of three steps. First, a database with genes known to have cold-associated properties is assembled. These genes are extracted from other, already sequenced and analysed organisms. Secondly, this database is used to identify homologues in an unanalysed EST dataset, generating a candidate-list of cold-associated genes. Last, each of the identified candidate cold-associative genes are verified, both to estimate the accuracy of the rapid gene prediction and also to support the removal of candidates which are not cold-associative.</p><p>The method was applied to a previously unanalysed Avena sativa EST dataset, and was able to identify 135 candidate genes from approximately 9500 EST's. Out of these, 103 were verified as cold-associated genes.</p>
|
5 |
Applying Bioinformatic Techniques to Identify Cold-associated Genes in OatThorburn, Henrik January 2002 (has links)
As the interest in biological sequence analysis increases, more efficient techniques to sequence, map and analyse genome data are needed. One frequently used technique is EST sequencing, which has proven to be a fast and cheap method to extract genome data. An EST sequencing generates large numbers of low-quality sequences which have to be managed and analysed further. Performing complete searches and finding guaranteed results are very time consuming. This dissertation project presents a method that can be used to perform rapid gene prediction of function-specific genes in EST data, as well as the results and an estimation of the accuracy of the method. This dissertation project applies various methods and techniques on actual data, attempting to identify genes involved in cold-associative processes in plants. The presented method consists of three steps. First, a database with genes known to have cold-associated properties is assembled. These genes are extracted from other, already sequenced and analysed organisms. Secondly, this database is used to identify homologues in an unanalysed EST dataset, generating a candidate-list of cold-associated genes. Last, each of the identified candidate cold-associative genes are verified, both to estimate the accuracy of the rapid gene prediction and also to support the removal of candidates which are not cold-associative. The method was applied to a previously unanalysed Avena sativa EST dataset, and was able to identify 135 candidate genes from approximately 9500 EST's. Out of these, 103 were verified as cold-associated genes.
|
6 |
Análise gênica de comorbidades a partir da integração de dados epidemiológicos / Comorbidities genetic analysis from epidemological data integrationKarla Ferraz Néto 01 December 2014 (has links)
A identificação de genes responsáveis por doenças humanas pode fornecer conhecimentos sobre mecanismos patológicos e psicológicos que são essenciais para o desenvolvimento de novos diagnósticos e terapias. Sabemos que uma doença é raramente uma consequência de uma anormalidade num único gene, porém reflete desordens de uma rede intra e intercelular complexa. Muitas metodologias conhecidas na Bioinformática são capazes de priorizar genes relacionados a uma determinada doença. Algumas abordagens também podem validar a pertinência ou não destes genes em relação à doença estudada. Uma abordagem de priorização de genes é a investigação a partir de doenças que acometem pacientes ao mesmo tempo, as comorbidades. Existem muitas fontes de dados biomédicos que podem ser utilizadas para a coleta de comorbidades. Desta forma, podemos coletar pares de doenças que formam comorbidades epidemiológicas e assim analisar os genes de cada doença. Esta análise serve para expandirmos a lista de genes candidatos de cada uma dessas doenças e justificarmos a relação gênica entre essas comorbidades. O objetivo principal deste projeto é o de integração dos dados epidemiológicos e genéticos para a realização da predição de genes causadores de doenças. Isto se dará através do estudo de comorbidade destas doenças. / The identification of genes responsible for human diseases can provide knowledge about pathological and physiological mechanisms that are essential for the development of new diagnostics and therapeutics. It is known that a disease is rarely a consequence of an abnormality in a single gene, but reflects complex intra and intercellular network disorders. Many methodologies known in Bioinformatics are able to prioritize genes related to a particular disease. Some approaches can also validate how appropriate or not these genes are relative to a disease. An approach for prioritizing genes is the research from diseases afecting patients at the same time, i.e. comorbidities. There are many sources of biomedical data that can be used to collect comorbidities and analyse genes of each disease. We can also expand the list of candidate genes for each singular disease and justify the genetic relationship of these comorbidities. The main objective of this project is the integration of epidemiologic and genetic data to perform the prediction of causing genes through the study of comorbidity of these illnesses.
|
7 |
MYOP: um arcabouço para predição de genes ab initio\" / MYOP: A framework for building ab initio gene predictorsKashiwabara, Andre Yoshiaki 23 March 2007 (has links)
A demanda por abordagens eficientes para o problema de reconhecer a estrutura de cada gene numa sequência genômica motivou a implementação de um grande número de programas preditores de genes. Fizemos uma análise dos programas de sucesso com abordagem probabilística e reconhecemos semelhanças na implementação dos mesmos. A maior parte desses programas utiliza a cadeia oculta generalizada de Markov (GHMM - generalized hiddenMarkov model) como um modelo de gene. Percebemos que muitos preditores têm a arquitetura da GHMM fixada no código-fonte, dificultando a investigação de novas abordagens. Devido a essa dificuldade e pelas semelhanças entre os programas atuais, implementamos o sistema MYOP (Make Your Own Predictor) que tem como objetivo fornecer um ambiente flexível o qual permite avaliar rapidamente cada modelo de gene. Mostramos a utilidade da ferramenta através da implementação e avaliação de 96 modelos de genes em que cada modelo é formado por um conjunto de estados e cada estado tem uma distribuição de duração e um outro modelo probabilístico. Verificamos que nem sempre um modelo probabilísticomais sofisticado fornece um preditor melhor, mostrando a relevância das experimentações e a importância de um sistema como o MYOP. / The demand for efficient approaches for the gene structure prediction has motivated the implementation of different programs. In this work, we have analyzed successful programs that apply the probabilistic approach. We have observed similarities between different implementations, the same mathematical framework called generalized hidden Markov chain (GHMM) is applied. One problem with these implementations is that they maintain fixed GHMM architectures that are hard-coded. Due to this problem and similarities between the programs, we have implemented the MYOP framework (Make Your Own Predictor) with the objective of providing a flexible environment that allows the rapid evaluation of each gene model. We have demonstrated the utility of this tool through the implementation and evaluation of 96 gene models in which each model has a set of states and each state has a duration distribution and a probabilistic model. We have shown that a sophisticated probabilisticmodel is not sufficient to obtain better predictor, showing the experimentation relevance and the importance of a system as MYOP.
|
8 |
Improvement of ab initio methods of gene prediction in genomic and metagenomic sequencesZhu, Wenhan 06 April 2010 (has links)
A metagenome originated from a shotgun sequencing of a microbial community is a heterogeneous mixture of rather short sequences. A vast majority of microbial species in a given community (99%) are likely to be non-cultivable. Many protein-coding regions in a new metagenome are likely to code for barely detectable homologs of already known proteins. Therefore, an ab initio method that would accurately identify the new genes is a vitally important tool of metagenomic sequence analysis. However, a heuristic model method for finding genes in short prokaryotic sequences with anonymous origin was proposed in 1999 prior to the advent of metagenomics. With hundreds of new prokaryotic genomes available it is now possible to enhance the original approach and to utilize direct polynomial and logistic approximations of oligonucleotide frequencies. The idea was to bypass traditional ways of parameter estimation such as supervised training on a set of validated genes or unsupervised training on an anonymous sequence supposed to contain a large enough number of genes. The codon frequencies, critical for the model parameterization, could be derived from frequencies of nucleotides observed in the short sequence. This method could be further applied for initializing the algorithms for iterative parameters estimation for prokaryotic as well as eukaryotic gene finders.
|
9 |
Genome-wide analysis of mutually exclusive splicingHatje, Klas 29 January 2013 (has links)
No description available.
|
10 |
MYOP: um arcabouço para predição de genes ab initio\" / MYOP: A framework for building ab initio gene predictorsAndre Yoshiaki Kashiwabara 23 March 2007 (has links)
A demanda por abordagens eficientes para o problema de reconhecer a estrutura de cada gene numa sequência genômica motivou a implementação de um grande número de programas preditores de genes. Fizemos uma análise dos programas de sucesso com abordagem probabilística e reconhecemos semelhanças na implementação dos mesmos. A maior parte desses programas utiliza a cadeia oculta generalizada de Markov (GHMM - generalized hiddenMarkov model) como um modelo de gene. Percebemos que muitos preditores têm a arquitetura da GHMM fixada no código-fonte, dificultando a investigação de novas abordagens. Devido a essa dificuldade e pelas semelhanças entre os programas atuais, implementamos o sistema MYOP (Make Your Own Predictor) que tem como objetivo fornecer um ambiente flexível o qual permite avaliar rapidamente cada modelo de gene. Mostramos a utilidade da ferramenta através da implementação e avaliação de 96 modelos de genes em que cada modelo é formado por um conjunto de estados e cada estado tem uma distribuição de duração e um outro modelo probabilístico. Verificamos que nem sempre um modelo probabilísticomais sofisticado fornece um preditor melhor, mostrando a relevância das experimentações e a importância de um sistema como o MYOP. / The demand for efficient approaches for the gene structure prediction has motivated the implementation of different programs. In this work, we have analyzed successful programs that apply the probabilistic approach. We have observed similarities between different implementations, the same mathematical framework called generalized hidden Markov chain (GHMM) is applied. One problem with these implementations is that they maintain fixed GHMM architectures that are hard-coded. Due to this problem and similarities between the programs, we have implemented the MYOP framework (Make Your Own Predictor) with the objective of providing a flexible environment that allows the rapid evaluation of each gene model. We have demonstrated the utility of this tool through the implementation and evaluation of 96 gene models in which each model has a set of states and each state has a duration distribution and a probabilistic model. We have shown that a sophisticated probabilisticmodel is not sufficient to obtain better predictor, showing the experimentation relevance and the importance of a system as MYOP.
|
Page generated in 0.1155 seconds