Spelling suggestions: "subject:"genomic sequences"" "subject:"enomic sequences""
1 |
Inferences On The Function Of Proteins And Protein-Protein Interactions Using Large Scale Sequence And Structure AnalysisKrishnadev, O 05 1900 (has links) (PDF)
No description available.
|
2 |
Computational Methods for Cis-Regulatory Module DiscoveryLiang, Xiaoyu January 2010 (has links)
No description available.
|
3 |
COVID-19 Variant Analyzer through Genomic Sequences and Jaccard SimilaritiesBharadwaj, Atul Narasimha Murthy 26 March 2025 (has links)
The COVID-19 pandemic has underscored the urgent need for efficient genomic surveillance to track the emergence and spread of SARS-CoV-2 variants. This study developed a novel computational framework to enhance variant detection by leveraging a database-driven approach and genomic sequence analysis. The framework utilizes MySQL database architecture where each variant is stored in distinct tables, enabling rapid comparison and classification of new variants through Jaccard similarity calculations.
The innovative aspect of this research lies in its unique database structure and classification method. Unlike traditional clustering approaches, this system creates individual tables for each variant, allowing for dynamic updates and efficient comparisons. When a new variant is introduced, the framework calculates Jaccard similarity scores between the new variant and existing variant tables, automatically creating new tables for potentially novel variants that fall below-established similarity thresholds. This approach enables real-time variant tracking and classification, adapting to the evolving nature of the virus.
The system employs advanced bioinformatics tools including sourmash for signature generation and NumPy for computational analysis, alongside Python-MySQL connectors for seamless database interactions. It implements similarity thresholds of 0.817 for primary classification and 0.867 for secondary validation to determine variant group membership. Whole-genome data was analyzed to compare its effectiveness in identifying variants of concern, with the database structure accommodating genomic data.
The results demonstrated the framework's ability to accurately detect and classify SARS-CoV-2 variants with high sensitivity and specificity. The study highlighted the potential of whole-genome sequences as a cost-effective alternative for variant detection in resource-limited settings, while also revealing their limitations compared to whole-genome analysis. This research contributes to global genomic surveillance efforts by providing scalable database tools for rapid variant identification, aiding public health strategies, vaccine development, and therapeutic interventions. / Master of Science / The COVID-19 pandemic has shown how important it is to track changes in the COVID-19 virus. This study focused on creating better ways to find and classify new versions of the virus (variants) by analyzing its genetic material. Using bioinformatics tools, the research aimed to make it easier and faster to identify these variants and understand how they are related.
The project used methods like comparing virus genomes and grouping similar ones to see how they evolve. It also tested whether analyzing only part of the virus's genetic material could be as effective as looking at the whole genome. These techniques helped identify patterns in the virus's mutations and group them into meaningful categories.
This work is important because it provides tools that can help scientists quickly spot new or dangerous variants of COVID-19. These findings can guide public health decisions, improve vaccines, and develop treatments more effectively. By making these methods scalable and accessible, this research supports global efforts to manage the ongoing pandemic and prepare for future outbreaks.
|
4 |
De l'intérêt des modèles grammaticaux pour la reconnaissance de motifs dans les séquences génomiques / Interest of grammatical models for pattern matching in genomic sequencesAntoine-Lorquin, Aymeric 01 December 2016 (has links)
Cette thèse en bioinformatique étudie l'intérêt de rechercher des motifs dans des séquences génomiques à l'aide de grammaires. Depuis les années 80, à l'initiative notamment de David Searls, des travaux ont montré qu'en théorie, des grammaires de haut niveau offrent suffisamment d'expressivité pour permettre la description de motifs biologiques complexes, notamment par le biais d'une nouvelle classe de grammaire dédiée à la biologie : les grammaires à variables de chaîne (SVG, String Variable Grammar). Ce formalisme a donné lieu à Logol, qui est un langage grammatical et un outil d'analyse développé dans l'équipe Dyliss où a lieu cette thèse. Logol est un langage conçu pour être suffisamment flexible pour se plier à une large gamme de motifs qu'il est possible de rencontrer en biologie. Le fait que les grammaires restent inutilisée pour la reconnaissance de motifs pose question. Le formalisme grammatical est-il vraiment pertinent pour modéliser des motifs biologiques ? Cette thèse tente de répondre à cette question à travers une démarche exploratoire. Ainsi, nous étudions la pertinence d'utiliser les modèles grammaticaux, via Logol, sur six applications différentes de reconnaissance de motifs sur des génomes. Au travers de la résolution concrète de problématiques biologiques, nous avons mis en évidence certaines caractéristiques des modèles grammaticaux. Une de leurs limites est que leur utilisation présente un coût en termes de performance. Un de leurs atouts est que leur expressivité couvre un large spectre des motifs biologiques, contrairement aux méthodes alternatives, et d'ailleurs certains motifs modélisés par les grammaires n'ont pas d'autres alternatives existantes. Il s'avère en particulier que pour certains motifs complexes, tels que ceux alliant séquence et structure, l'approche grammaticale est la plus adaptée. Pour finir, l'une des conclusions de cette thèse est qu'il n'y a pas réellement de compétition entre les différentes approches, mais plutôt qu'il y a tout à gagner d'une coopération fructueuse. / This thesis studies the interest to look for patterns in genomic sequences using grammars. Since the 80s, work has shown that, in theory, high level grammars offer enough expressivity to allow the description of complex biological patterns. In particular David Searls has proposed a new grammar dedicated to biology: string variable grammar (SVG). This formalism has resulted in Logol, a grammatical language and an analysis tool developed by Dyliss team where this thesis is taking place. Logol is a language designed to be flexible enough to express a wide range of biological patterns. The fact that the grammars remain unknown to model biological patterns raises questions. Is the grammatical formalism really relevant to the recognition of biological patterns? This thesis attempts to answer this question through an exploratory approach. We study the relevance of using the grammatical patterns, by using Logol on six different applications of genomic pattern matching. Through the practical resolution of biological problems, we have highlighted some features of grammatical patterns. First, the use of grammatical models presents a cost in terms of performance. Second the expressiveness of grammatical models covers a broad spectrum of biological patterns, unlike the others alternatives, and some patterns modeled by grammars have no other alternative solutions. It also turns out that for some complex patterns, such as those combining sequence and structure, the grammatical approach is the most suitable. Finally, a thesis conclusion is that there was no real competition between different approaches, but rather everything to gain from successful cooperation.
|
Page generated in 0.07 seconds