Spelling suggestions: "subject:"[een] BIOINFORMATICS"" "subject:"[enn] BIOINFORMATICS""
711 |
A method to identify the non-coding RNA gene for U1 RNA in species in which it has not yet been foundMathew, Sumi January 2007 (has links)
Background Non coding RNAs are the RNA molecules that do not code for proteins but play structural, catalytic or regulatory roles in the organisms in which they are found. These RNAs generally conserve their secondary structure more than their primary sequence. It is possible to look for protein coding genes using sequence signals like promoters, terminators, start and stop codons etc. However, this is not the case with non coding RNAs since these signals are weakly conserved in them. Hence the situation with non coding RNAs is more challenging. Therefore a protocol is devised to identify U1 RNA in species not previously known to have it. Results It is sufficient to use the covariance models to identify non coding RNAs but they are very slow and hence a filtering step is needed before using the covariance models to reduce the search space for identifying these genes. The protocol for identifying U1 RNA genes employs for the filtering a pattern matcher RNABOB that can conduct secondary structure pattern searches. The descriptor for RNABOB is made automatically such that it can also represent the bulges and interior loops in helices of RNA. The protocol is compared with the Rfam and Weinberg & Ruzzo approaches and has been able to identify new U1 RNA homologues in the Apicomplexan group where it has not previously been found. Conclusions The method has been used to identify the gene for U1 RNA in certain species in which it has not been detected previously. The identified genes may be further analyzed by wet laboratory techniques for the confirmation of their existence. 4
|
712 |
Design and Development of a Database for the Classification of Corynebacterium glutamicum Genes, Proteins, Mutants and Experimental ProtocolsMuhammad, Ashfaq January 2006 (has links)
Coryneform bacteria are largely distributed in nature and are rod like, aerobic soil bacteria capable of growing on a variety of sugars and organic acids. Corynebacterium glutamicum is a nonpathogenic species of Coryneform bacteria used for industrial production of amino acids. There are three main publicly available genome annotations, Cg, Cgl and NCgl for C. glutamicum. All these three annotations have different numbers of protein coding genes and varying numbers of overlaps of similar genes. The original data is only available in text files. In this format of genome data, it was not easy to search and compare the data among different annotations and it was impossible to make an extensive multidimensional customized formal search against different protein parameters. Comparison of all genome annotations for construction deletion, over-expression mutants, graphical representation of genome information, such as gene locations, neighboring genes, orientation (direct or complementary strand), overlapping genes, gene lengths, graphical output for structure function relation by comparison of predicted trans-membrane domains (TMD) and functional protein domains protein motifs was not possible when data is inconsistent and redundant on various publicly available biological database servers. There was therefore a need for a system of managing the data for mutants and experimental setups. In spite of the fact that the genome sequence is known, until now no databank providing such a complete set of information has been available. We solved these problems by developing a standalone relational database software application covering data processing, protein-DNA sequence extraction and management of lab data. The result of the study is an application named, CORYNEBASE, which is a software that meets our aims and objectives.
|
713 |
Construction of Evolutionary Tree Models for Oncogenesis of Endometrial AdenocarcinomaChen, Lei January 2005 (has links)
Endometrial adenocarcinoma (EAC) is the fourth leading cause of carcinoma in woman worldwide, but not much is known about genetic factors involved in this complex disease. During the EAC process, it is well known that losses and gains of chromosomal regions do not occur completely at random, but partly through some flow of causality. In this work, we used three different algorithms based on frequency of genomic alterations to construct 27 tree models of oncogenesis. So far, no study about applying pathway models to microsatellite marker data had been reported. Data from genome–wide scans with microsatellite markers were classified into 9 data sets, according to two biological approaches (solid tumor cell and corresponding tissue culture) and three different genetic backgrounds provided by intercrossing the susceptible rat BDII strain and two normal rat strains. Compared to previous study, similar conclusions were drawn from tree models that three main important regions (I, II and III) and two subordinate regions (IV and V) are likely to be involved in EAC development. Further information about these regions such as their likely order and relationships was produced by the tree models. A high consistency in tree models and the relationship among p19, Tp53 and Tp53 inducible protein genes provided supportive evidence for the reliability of results.
|
714 |
Improvements and extensions of a web-tool for finding candidate genes associated with rheumatoid arthritisDodda, Srinivasa Rao January 2005 (has links)
QuantitativeTraitLocus (QTL) is a statistical method used to restrict genomic regions contributing to specific phenotypes. To further localize genes in such regions a web tool called “Candidate Gene Capture” (CGC) was developed by Andersson et al. (2005). The CGC tool was based on the textual description of genes defined in the human phenotype database OMIM. Even though the CGC tool works well, the tool was limited by a number of inconsistencies in the underlying database structure, static web pages and some gene descriptions without properly defined function in the OMIM database. Hence, in this work the CGC tool was improved by redesigning its database structure, adding dynamic web pages and improving the prediction of unknown gene function by using exon analysis. The changes in database structure diminished the number of tables considerably, eliminated redundancies and made data retrieval more efficient. A new method for prediction of gene function was proposed, based on the assumption that similarity between exon sequences is associated with biochemical function. Using Blast with 20380 exon protein sequences and a threshold E-value of 0.01, 639 exon groups were obtained with an average of 11 exons per group. When estimating the functional similarity, it was found that on the average 72% of the exons in a group had at least one Gene Ontology (GO) term in common.
|
715 |
Numerical methods for mapping of multiple QTLLjungberg, Kajsa January 2003 (has links)
This thesis concerns numerical methods for mapping of multiple quantitative trait loci, QTL. Interactions between multiple genetic loci influencing important traits, such as growth rate in farm animals and predisposition to cancer in humans, make it necessary to search for several QTL simultaneously. Simultaneous search for n QTL involves solving an n-dimensional global optimization problem, where each evaluation of the objective function consists of solving a generalized least squares problem. In Paper A we present efficient algorithms, mainly based on updated QR factorizations, for evaluating the objective functions of different parametric QTL mapping methods. One of these algorithms reduces the computational work required for an important function class by one order of magnitude compared with the best of the methods used by other authors. In Paper B previously utilized techniques for finding the global optimum of the objective function are compared with a new approach based on the DIRECT algorithm of Jones et al. The new method gives accurate results in one order of magnitude less time than the best of the formerly employed algorithms. Using the algorithms presented in Papers A and B, simultaneous search for at least three QTL, including computation of the relevant empirical significance thresholds, can be performed routinely.
|
716 |
Inferring Gene Regulatory Networks in Cold-Acclimated Plants by Combinatorial Analysis of mRNA Expression Levels and Promoter RegionsChawade, Aakash January 2006 (has links)
Understanding the cold acclimation process in plants may help us develop genetically engineered plants that are resistant to cold. The key factor in understanding this process is to study the genes and thus the gene regulatory network that is involved in the cold acclimation process. Most of the existing approaches1-8 in deriving regulatory networks rely only on the gene expression data. Since the expression data is usually noisy and sparse the networks generated by these approaches are usually incoherent and incomplete. Hence a new approach is proposed here that analyzes the promoter regions along with the expression data in inferring the regulatory networks. In this approach genes are grouped into sets if they contain similar over-represented motifs or motif pairs in their promoter regions and if their expression pattern follows the expression pattern of the regulating gene. The network thus derived is evaluated using known literature evidence, functional annotations and from statistical tests.
|
717 |
The Paladin Suite| Multifaceted Characterization of Whole Metagenome Shotgun SequencesWestbrook, Anthony 14 March 2018 (has links)
<p> Whole metagenome shotgun sequencing is a powerful approach for assaying many aspects of microbial communities, including the functional and symbiotic potential of each contributing community member. The research community currently lacks tools that efficiently align DNA reads against protein references, the technique necessary for constructing functional profiles. This thesis details the creation of PALADIN—a novel modification of the Burrows-Wheeler Aligner that provides orders-of-magnitude improved efficiency by directly mapping in protein space. In addition to performance considerations, utilizing PALADIN and associated tools as the foundation of metagenomic pipelines also allows for novel characterization and downstream analysis. </p><p> The accuracy and efficiency of PALADIN were compared against existing applications that employ nucleotide or protein alignment algorithms. Using both simulated and empirically obtained reads, PALADIN consistently outperformed all compared alignment tools across a variety of metrics, mapping reads nearly 8,000 times faster than the widely utilized protein aligner, BLAST. A variety of analysis techniques were demonstrated using this data, including detecting horizontal gene transfer, performing taxonomic grouping, and generating declustered references.</p><p>
|
718 |
Identifying esophageal atresi associated variants from whole genome sequencing dataMattisson, Jonas January 2018 (has links)
Knowing the underlying cause of a genetic disorder could not only further our understanding of the disease itself, and the otherwise healthy mechanism that is disrupted. It could potentially improve people’s lives. Even if whole genome sequencing has drastically improved the potential of discovering the cause, a comparison of two non-related individual’s genome will find several million sequence variations. While most variants have no significant impact, it is enough for only one to functionally impact a gene, for it to cause a genetic disorder. This project therefore focused on the filtering of variants, from lists of several million possible causes, to the stage where they could feasible be manually analysed one by one. Single-nucleotide variants, indels and structural variants were filtered, based on a dataset where single-nucleotide variants and indels had already been called. The more difficult process of structural variants discovery was performed, but it required the application of four different tools to minimise the drawback of each separate discovery technique. The same three filtering approaches were applied to all variants; the intersecting of datasets that should contain the same variant, the removal of variants in common with the general population and the selection of variants impacting functionality. Each approach proved to be an efficient filtering step, with their combination reducing each list to only a couple of variants out of the original five million. Due to lower accuracy and sensitivity of the structural variant analysis, this data will likely require more extensive manual analysis.
|
719 |
A novel method for integrative biological studiesAl Watban, Abdullatif Sulaiman January 2016 (has links)
DNA microarray technology has been extensively utilized in the biomedical field, becoming a standard in identifying gene expression signatures for disease diagnosis/prognosis and pharmaceutical practices. Although cancer research has benefited from this technology, challenges such as large-scale data size, few replicates and complex heterogeneous data types remain; thus the biomarkers identified by various studies have a small proportion of overlap because of molecular heterogeneity. However, it is desirable in cancer research to consider robust and consistent biomarkers for drug development as well as diagnosis/prognosis. Although cancer is a highly heterogeneous disease, some mechanism common to developing cancers is believed to exist; integrating datasets from multiple experiments increases the accuracy of predictions because increasing the sample size improves and enhances biomarkers detection. Therefore, integrative study is required for compiling multiple cancer data sets when searching for the common mechanism leading to cancers. Some critical challenges of integration analysis remain despite many successful methods introduced. Few is able to work on data sets with different dimensionalities. More seriously, when the replicate number is small, most existing algorithms cannot deliver robust predictions through an integrative study. In fact, as modern high-throughput technology matures to provide increasingly precise data, and with well-designed experiments, variance across replicates is believed to be small for us to consider a mean pattern model. This model assumes that all the genes (or metabolites, proteins or DNA copies) are random samples of a hidden (mean pattern) model. The study implements this model using a hierarchical modelling structure. As the primary component of the system, a multi-scale Gaussian (MSG) model, designed to identify robust differentially-expressed genes to be integrated, was developed for predicting differentially expressed genes from microarray expression data of small replicate numbers. To assure the validity of the mean pattern hypothesis, a bimodality detection method that was a revision of the Bimodality index was proposed.
|
720 |
Framtagning av unika gemensamma sekvenser hos koagulasnegativa stafylokockerMattisson, Jonas, Gräsberg, Sofia, Rydberg Öhrling, Sara, Al-Jaff, Mohammed, Molin, Iris, Sandström, Eric January 2016 (has links)
I följande rapport kommer vi ta upp hur vi löste problemet med att hitta gemensamma sekvenser hos en mängd koagulasnegativa stafylokocker (KNS) för att bl.a. kunna skilja dem ifrån dess släkting Staphylococcus aureus (S. aureus). Problemet har sin grund i att projektbeställaren, Q-linea, vill kunna identifiera infekterande bakterier i fall av blodsjukdomen sepsis. Vi kunde dessvärre inte hitta sekvenser som fungerade för alla våra utvalda stafylokocker. Däremot lyckades vi hitta flera sekvenser som parvis fungerade tillsammans för att urskilja stafylokockgruppen mot S. aureus. För att utföra alla jämförelser konstruerade och implementerade vi en bioinformatisk pipeline med en tredelad optimeringsmetod för att göra de tunga beräkningarna snabbare.
|
Page generated in 0.0355 seconds