Spelling suggestions: "subject:"bioinformatik"" "subject:"bioinformatika""
41 |
Improvements and extensions of a web-tool for finding candidate genes associated with rheumatoid arthritisDodda, Srinivasa Rao January 2005 (has links)
<p>QuantitativeTraitLocus (QTL) is a statistical method used to restrict genomic regions contributing to specific phenotypes. To further localize genes in such regions a web tool called “Candidate Gene Capture” (CGC) was developed by Andersson et al. (2005). The CGC tool was based on the textual description of genes defined in the human phenotype database OMIM. Even though the CGC tool works well, the tool was limited by a number of inconsistencies in the underlying database structure, static web pages and some gene descriptions without properly defined function in the OMIM database. Hence, in this work the CGC tool was improved by redesigning its database structure, adding dynamic web pages and improving the prediction of unknown gene function by using exon analysis. The changes in database structure diminished the number of tables considerably, eliminated redundancies and made data retrieval more efficient. A new method for prediction of gene function was proposed, based on the assumption that similarity between exon sequences is associated with biochemical function. Using Blast with 20380 exon protein sequences and a threshold E-value of 0.01, 639 exon groups were obtained with an average of 11 exons per group. When estimating the functional similarity, it was found that on the average 72% of the exons in a group had at least one Gene Ontology (GO) term in common.</p>
|
42 |
Shape Analysis and Measurement for the HeLa cell classification of cultured cells in high throughput screeningHuque, Enamul January 2006 (has links)
<p>Feature extraction by digital image analysis and cell classification is an important task for cell culture automation. In High Throughput Screening (HTS) where thousands of data points are generated and processed at once, features will be extracted and cells will be classified to make a decision whether the cell-culture is going on smoothly or not. The culture is restarted if a problem is detected. In this thesis project HeLa cells, which are human epithelial cancer cells, are selected for the experiment. The purpose is to classify two types of HeLa cells in culture: Cells in cleavage that are round floating cells (stressed or dead cells are also round and floating) and another is, normal growing cells that are attached to the substrate. As the number of cells in cleavage will always be smaller than the number of cells which are growing normally and attached to the substrate, the cell-count of attached cells should be higher than the round cells. There are five different HeLa cell images that are used. For each image, every single cell is obtained by image segmentation and isolation. Different mathematical features are found for each cell. The feature set for this experiment is chosen in such a way that features are robust, discriminative and have good generalisation quality for classification. Almost all the features presented in this thesis are rotation, translation and scale invariant so that they are expected to perform well in discriminating objects or cells by any classification algorithm. There are some new features added which are believed to improve the classification result. The feature set is considerably broad rather than in contrast with the restricted sets which have been used in previous work. These features are used based on a common interface so that the library can be extended and integrated into other applications. These features are fed into a machine learning algorithm called Linear Discriminant Analysis (LDA) for classification. Cells are then classified as ‘Cells attached to the substrate’ or Cell Class A and ‘Cells in cleavage’ or Cell Class B. LDA considers features by leaving and adding shape features for increased performance. On average there is higher than ninety five percent accuracy obtained in the classification result which is validated by visual classification.</p>
|
43 |
Representation of Biochemical Pathway Models : Issues relating conversion of model representation from SBML to a commercial toolNaswa, Sudhir January 2005 (has links)
<p>Background: Computational simulation of complex biological networks lies at the heart of systems biology since it can confirm the conclusions drawn by experimental studies of biological networks and guide researchers to produce fresh hypotheses for further experimental validation. Since this iterative process helps in development of more realistic system models a variety of computational tools have been developed. In the absence of a common format for representation of models these tools were developed in different formats. As a result these tools became unable to exchange models amongst them, leading to development of SBML, a standard exchange format for computational models of biochemical networks. Here the formats of SBML and one of the commercial tools of systems biology are being compared to study the issues which may arise during conversion between their respective formats. A tool StoP has been developed to convert the format of SBML to the format of the selected tool.</p><p>Results: The basic format of SBML representation which is in the form of listings of various elements of a biochemical reaction system differs from the representation of the selected tool which is location oriented. In spite of this difference the various components of biochemical pathways including multiple compartments, global parameters, reactants, products, modifiers, reactions, kinetic formulas and reaction parameters could be converted from the SBML representation to the representation of the selected tool. The MathML representation of the kinetic formula in an SBML model can be converted to the string format of the selected tool. Some features of the SBML are not present in the selected tool. Similarly, the ability of the selected tool to declare parameters for locations, which are global to those locations and their children, is not present in the SBML.</p><p>Conclusions: Differences in representations of pathway models may include differences in terminologies, basic architecture, differences in capabilities of software’s, and adoption of different standards for similar things. But the overall similarity of domain of pathway models enables us to interconvert these representations. The selected tool should develop support for unit definitions, events and rules. Development of facility for parameter declaration at compartment level by SBML and facility for function declaration by the selected tool is recommended.</p>
|
44 |
Algorithmic aspects of some combinatorial problems in bioinformaticsBongartz, Dirk January 2006 (has links) (PDF)
Aachen, Techn. Hochsch., Diss., 2006
|
45 |
BacIL - En Bioinformatisk Pipeline för Analys av Bakterieisolat / BacIL - A Bioinformatic Pipeline for Analysis of Bacterial IsolatesÖstlund, Emma January 2019 (has links)
Listeria monocytogenes and Campylobacter spp. are bacteria that sometimes can cause severe illness in humans. Both can be found as contaminants in food that has been produced, stored or prepared improperly, which is why it is important to ensure that the handling of food is done correctly. The National Food Agency (Livsmedelsverket) is the Swedish authority responsible for food safety. One important task is to, in collaboration with other authorities, track and prevent food-related disease outbreaks. For this purpose bacterial samples are regularly collected from border control, at food production facilities and retail as well as from suspected food items and drinking water during outbreaks, and epidemiological analyses are employed to determine the type of bacteria present and whether they can be linked to a common source. One part of these epidemiological analyses involve bioinformatic analyses of the bacterial DNA. This includes determination of sequence type and serotype, as well as calculations of similarities between samples. Such analyses require data processing in several different steps which are usually performed by a bioinformatician using different computer programs. Currently the National Food Agency outsources most of these analyses to other authorities and companies, and the purpose of this project was to develop a pipeline that would allow for these analyses to be performed in-house. The result was a pipeline named BacIL - Bacterial Identification and Linkage which has been developed to automatically perform sequence typing, serotyping and SNP-analysis of Listeria monocytogenes as well as sequence typing and SNP-analysis of Campylobacter jejuni, C. coli and C. lari. The result of the SNP-analysisis is used to create clusters which can be used to identify related samples. The pipeline decreases the number of programs that have to be manually started from more than ten to two.
|
46 |
Computational analysis of metagenomic data: delineation of compositional features and screens for desirable enzymes / Computergestützte Analyse von Metagenomedate: Beschreibung von kompositionellen Eigenschaften und Suchen nach gewünschten EnzymenFörstner, Konrad Ulrich January 2008 (has links) (PDF)
The topic of my doctorial research was the computational analysis of metagenomic data. A metagenome comprises the genomic information from all the microorganisms within a certain environment. The currently available metagenomic data sets cover only parts of these usually huge metagenomes due to the high technical and financial effort of such sequencing endeavors. During my thesis I developed bioinformatic tools and applied them to analyse genomic features of different metagenomic data sets and to search for enzymes of importance for biotechnology or pharmaceutical applications in those sequence collections. In these studies nine metagenomic projects (with up to 41 subsamples) were analysed. These samples originated from diverse environments like farm soil, acid mine drainage, microbial mats on whale bones, marine water, fresh water, water treatment sludges and the human gut flora. Additionally, data sets of conventionally retrieved sequence data were taken into account and compared with each other / Das Thema meiner Doktorarbeit war die bioinformatische Analyse von metagenomischen Sequenzdaten. Ein Metagenom umfasst die genomische Information aller Mikroorganismen eines Biotops. Die bisher durchgeführten metagenomische Projekte sequenzierten auf Grund des technischen und finanziellen Aufwands einer solchen Unternehmung nur kleine Teile dieser im allgemeinen sehr großen Metagenome. Im Zuge meiner Doktorarbeit, die auf solchen Sequenzierungprojekten aufbaut, wurden bioinformatische Werkzeuge entwickelt und angewandt um genomische Eigenschaften verschiedener metagenomische Datensätze zu analysieren und um biotechnologisch und pharmakologisch relevante Enzyme exemplarisch in diesen Datensätzen zu suchen. In den Analysen wurden neun publizierte, metagenomische Projektedatensammlungen (teilweise mit bis zu 41 Subproben) untersucht. Die Probem stammen von zahlreichen unterschiedlichen Habitaten wie Farmerde, sauerer Minendrainage, dem mikrobiellen Belag auf Walknochen, Meerwasser, Süßwasser, Abwasseraufbereitungssschlamm und der menschlichen Darmu flora. Zusätzlich wurden in den meisten Analysen konventionell gewonnene Sequenzdaten vergleichend hinzugezogen und analysiert.
|
47 |
Taking advantage of phylogenetic trees in comparative genomicsÅkerborg, Örjan January 2008 (has links)
Phylogenomics can be regarded as evolution and genomics in co-operation. Various kinds of evolutionary studies, gene family analysis among them, demand access to genome-scale datasets. But it is also clear that many genomics studies, such as assignment of gene function, are much improved by evolutionary analysis. The work leading to this thesis is a contribution to the phylogenomics field. We have used phylogenetic relationships between species in genome-scale searches for two intriguing genomic features, namely and A-to-I RNA editing. In the first case we used pairwise species comparisons, specifically human-mouse and human-chimpanzee, to infer existence of functional mammalian pseudogenes. In the second case we profited upon later years' rapid growth of the number of sequenced genomes, and used 17-species multiple sequence alignments. In both these studies we have used non-genomic data, gene expression data and synteny relations among these, to verify predictions. In the A-to-I editing project we used 454 sequencing for experimental verification. We have further contributed a maximum a posteriori (MAP) method for fast and accurate dating analysis of speciations and other evolutionary events. This work follows recent years' trend of leaving the strict molecular clock when performing phylogenetic inference. We discretised the time interval from the leaves to the root in the tree, and used a dynamic programming (DP) algorithm to optimally factorise branch lengths into substitution rates and divergence times. We analysed two biological datasets and compared our results with recent MCMC-based methodologies. The dating point estimates that our method delivers were found to be of high quality while the gain in speed was dramatic. Finally we applied the DP strategy in a new setting. This time we used a grid laid out on a species tree instead of on an interval. The discretisation gives together with speciation times a common timeframe for a gene tree and the corresponding species tree. This is the key to integration of the sequence evolution process and the gene evolution process. Out of several potential application areas we chose gene tree reconstruction. We performed genome-wide analysis of yeast gene families and found that our methodology performs very well. / QC 20100923
|
48 |
Automated annotation of protein families / Automatiserad annotering av proteinfamiljerElfving, Eric January 2011 (has links)
Introduction: The great challenge in bioinformatics is data integration. The amount of available data is always increasing and there are no common unified standards of where, or how, the data should be stored. The aim of this workis to build an automated tool to annotate the different member families within the protein superfamily of medium-chain dehydrogenases/reductases (MDR), by finding common properties among the member proteins. The goal is to increase the understanding of the MDR superfamily as well as the different member families.This will add to the amount of knowledge gained for free when a new, unannotated, protein is matched as a member to a specific MDR member family. Method: The different types of data available all needed different handling. Textual data was mainly compared as strings while numeric data needed some special handling such as statistical calculations. Ontological data was handled as tree nodes where ancestry between terms had to be considered. This was implemented as a plugin-based system to make the tool easy to extend with additional data sources of different types. Results: The biggest challenge was data incompleteness yielding little (or no) results for some families and thus decreasing the statistical significance of the results. Results show that all the human and mouse MDR members have a Pfam ADH domain (ADH_N and/or ADH_zinc_N) and takes part in an oxidation-reduction process, often with NAD or NADP as cofactor. Many of the proteins contain zinc and are expressed in liver tissue. Conclusions: A python based tool for automatic annotation has been created to annotate the different MDR member families. The tool is easily extendable to be used with new databases and much of the results agrees with information found in literature. The utility and necessity of this system, as well as the quality of its produced results, are expected to only increase over time, even if no additional extensions are produced, as the system itself is able to make further and more detailed inferences as more and more data become available.
|
49 |
Method for recognizing local descriptors of protein structures using Hidden Markov ModelsBjörkholm, Patrik January 2008 (has links)
<p>Being able to predict the sequence-structure relationship in proteins will extend the scope of many bioinformatics tools relying on structure information. Here we use Hidden Markov models (HMM) to recognize and pinpoint the location in target sequences of local structural motifs (local descriptors of protein structure, LDPS) These substructures are composed of three or more segments of amino acid backbone structures that are in proximity with each other in space but not necessarily along the amino acid sequence. We were able to align descriptors to their proper locations in 41.1% of the cases when using models solely built from amino acid information. Using models that also incorporated secondary structure information, we were able to assign 57.8% of the local descriptors to their proper location. Further enhancements in performance was yielded when threading a profile through the Hidden Markov models together with the secondary structure, with this material we were able assign 58,5% of the descriptors to their proper locations. Hidden Markov models were shown to be able to locate LDPS in target sequences, the performance accuracy increases when secondary structure and the profile for the target sequence were used in the models.</p>
|
50 |
Predicting function of genes and proteins from sequence, structure and expression data /Hvidsten, Torgeir R., January 2004 (has links)
Diss. (sammanfattning) Uppsala : Univ., 2004. / Härtill 6 uppstaser.
|
Page generated in 0.0673 seconds