Global ETD Search

61	Exact, constraint-based structure prediction in simple protein models Will, Sebastian. Unknown Date (has links) (PDF) University, Diss., 2005--Jena.
62	Evaluation and Development of Methods for Identification of Biochemical Networks / Evaluering och utveckling av metoder för identifiering av biokemiska nätverk Jauhiainen, Alexandra January 2005 (has links) <p>Systems biology is an area concerned with understanding biology on a systems level, where structure and dynamics of the system is in focus. Knowledge about structure and dynamics of biological systems is fundamental information about cells and interactions within cells and also play an increasingly important role in medical applications. </p><p>System identification deals with the problem of constructing a model of a system from data and an extensive theory of particularly identification of linear systems exists. </p><p>This is a master thesis in systems biology treating identification of biochemical systems. Methods based on both local parameter perturbation data and time series data have been tested and evaluated in silico. </p><p>The advantage of local parameter perturbation data methods proved to be that they demand less complex data, but the drawbacks are the reduced information content of this data and sensitivity to noise. Methods employing time series data are generally more robust to noise but the lack of available data limits the use of these methods. </p><p>The work has been conducted at the Fraunhofer-Chalmers Research Centre for Industrial Mathematics in Göteborg, and at the division of Computational Biology at the Department of Physics and Measurement Technology, Biology, and Chemistry at Linköping University during the autumn of 2004.</p> Bioinformatics Systems Biology System Identification Biochemical Networks Bioinformatik Bioinformatics Bioinformatik
63	Tools for functional genomics applied to Staphylococci, Listeriae, Vaccinia virus and other organisms Liang, Chunguang January 2009 (has links) (PDF) Genome sequence analysis A combination of genome analysis application has been established here during this project. This offers an efficient platform to interactively compare similar genome regions and reveal loci differences. The genes and operons can be rapidly analyzed and local collinear blocks (LCBs) categorized according to their function. The features of interests are parsed, recognized, and clustered into reports. Phylogenetic relationships can be readily examined such as the evolution of critical factors or a certain highly-conserved region. The resulting platform-independent software packages (GENOVA and inGeno), have been proven to be efficient and easy to handle in a number of projects. The capabilities of the software allowed the investigation of virulence factors, e.g., rsbU, strains’ biological design, and in particular pathogenicity feature storage and management. We have successfully investigated the genomes of Staphylococcus aureus strains (COL, N315, 8325, RN1HG, Newman), Listeria spp. (welshimeri, innocua and monocytogenes), E.coli strains (O157:H7 and MG1655) and Vaccinia strains (WR, Copenhagen, Lister, LIVP, GLV-1h68 and parental strains). Metabolic network analysis Our YANAsquare package offers a workbench to rapidly establish the metabolic network of such as Staphylococcous aureus bacteria in genome-scale size as well as metabolic networks of interest such as the murine phagosome lipid signalling network. YANAsquare recruits reactions from online databases using an integrated KEGG browser. This reduces the efforts in building large metabolic networks. The involved calculation routines (METATOOL-derived wrapper or native Java implementation) readily obtain all possible flux modes (EM/EP) for metabolite fluxes within the network. Advanced layout algorithms visualize the topological structure of the network. In addition, the generated structure can be dynamically modified in the graphic interface. The generated network as well as the manipulated layout can be validated and stored (XML file: scheme of SBML level-2). This format can be further parsed and analyzed by other systems biology software, such as CellDesigner. Moreover, the integrated robustness-evaluation routine is able to examine the synthesis rates affected by each single mutation throughout the whole network. We have successfully applied the method to simulate single and multiple gene knockouts, and the affected fluxes are comprehensively revealed. Recently we applied the method to proteomic data and extra-cellular metabolite data of Staphylococci, the physiological changes regarding the flux distribution are studied. Calculations at different time points, including different conditions such as hypoxia or stress, show a good fit to experimental data. Moreover, using the proteomic data (enzyme amounts) calculated from 2D-Gel-EP experiments our study provides a way to compare the fluxome and the enzyme expression. Oncolytic vaccinia virus (VACV) We investigated the genetic differences between the de novo sequence of the recombinant oncolytic GLV-1h68 and other related VACVs, including function predictions for all found genome differences. Our phylogenetic analysis indicates that GLV-1h68 is closest to Lister strains but has lost several ORFs present in its parental LIVP strain, including genes encoding CrmE and a viral Golgi anti-apoptotic protein, v-GAAP. Functions of viral genes were either strain-specific, tissue-specific or host-specific comparing viral genes in the Lister, WR and COP strains. This helps to rationally design more optimized oncolytic virus strains to benefit cancer therapy in human patients. Identified differences from the comparison in open reading frames (ORFs) include genes for host-range selection, virulence and immune modulation proteins, e.g. ankyrin-like proteins, serine proteinase inhibitor SPI-2/CrmA, tumor necrosis factor (TNF) receptor homolog CrmC, semaphorin-like and interleukin-1 receptor homolog proteins. The contribution of foreign gene expression cassettes in the therapeutic and oncolytic virus GLV-1h68 was studied, including the F14.5L, J2R and A56R loci. The contribution of F14.5L inactivation to the reduced virulence is demonstrated by comparing the virulence data of GLV-1h68 with its F14.5L-null and revertant viruses. The comparison suggests that insertion of a foreign gene expression cassette in a nonessential locus in the viral genome is a practical way to attenuate VACVs, especially if the nonessential locus itself contains a virulence gene. This reduces the virulence of the virus without compromising too much the replication competency of the virus, the key to its oncolytic activity. The reduced pathogenicity of GLV-1h68 was confirmed by our experimental collaboration partners in male mice bearing C6 rat glioma and in immunocompetent mice bearing B16-F10 murine melanoma. In conclusion, bioinformatics and experimental data show that GLV-1h68 is a promising engineered VACV variant for anticancer therapy with tumor-specific replication, reduced pathogenicity and benign tissue tropism. / Genom Sequenz Analyse Im Zuge der vorliegenden Doktorarbeit wurden verschiedene Programme zur Genomanalyse kombiniert, um eine effiziente Plattform zum interaktiven Vergleich lokaler Ähnlichkeiten bzw. Unterschiede in Genomen bereitzustellen. Damit können Gene und Operons schnell untersucht und “local collinear blocks” entsprechend ihrer Funktion kategorisiert werden. Phylogenetische Beziehungen, wie beispielsweise die Evolution spezifischer Elemente oder stark konservierter Regionen können leicht überprüft werden. Die hierfür entwickelte plattformunabhängige Software (GENOVA und inGeno) hat sich in mehreren Projekten als effizient und leicht handhabbar bewährt. Die Programme erlauben die Untersuchung von Virulenzfaktoren auf Sequenz- oder Annotationsebene. Während der vorliegenden Doktorarbeit konnten so die Genome von verschiedenen Staphylococcus aureus, Listeria spp., Escherichia coli und Vaccinia Stämmen untersucht werden. Metabolische Netzwerk Analyse Unser “YANAsquare” Programmpaket bietet eine Oberfläche um schnell metabolische Netzwerke vom genomweiten Anzatz bis hinunter zum Einzelnetzwerk zu analysieren. Dafür greift YANA mit Hilfe des integrierten KEGG-Browsers auf Onlinedatenbanken zu, um die notwendigen Informationen zum metabolischen Reaktionsweg bereitzustellen und reduziert so maßgeblich den Arbeitsaufwand beim Beschreiben von Netzwerke. Die implementierten Methoden zur Berechnung (METATOOL, eigene Implementation in Java) des Netzwerkes liefern exakt alle die möglichen Elementarmoden (EM/EP) für die Metabolite zurück. Durch den Einsatz von fortgeschrittenen Layout Algorithmen wird anschliessend die Darstellung der Netzwerktopologie möglich. Außerdem kann in der grafischen Darstellung das generierte Netzwerklayout dynamisch verändert werden. Das Speichern der Daten erfolgt im XML (SBML level-2) Format und erlaubt so die Weiterverwendung in anderen systembiologischen Programmen, wie dem “CellDesigner”. Mit Hilfe einer gen-Knockout Simulations Methode kann der Einfluss von einzelnen Mutationen im gesamten Netzwerk auf die Syntheseraten untersucht werden. Wir konnten mit dieser Methode Einzel- sowie Mehrfachgenknockouts und deren Effekte auf die Elementarmoden analysieren. Die Methode wurde ebenfalls auf Proteomdaten und extrazelluläre Metabolite von Staphylokokken angewandt, um Änderungen bezüglich der Flussverteilung zu untersuchen. Die Simulationen zu verschieden Zeitpunkten und unter verschiedenen Stessbedingungen zeigen große Übereinstimmung mit experimentell erhobenen Daten. Onkolytischer Vaccinia Virus (VACV) Wir haben die genetischen Unterschiede zwischen der de novo Sequenz des rekombinanten onkolytischen Virus GLV-1h68 und anderen VACVs untersucht und gefundene Unterschiede funktionell charakterisiert. Die phylogenetische Analyse zeigt das GLV-1h68 mit dem Lister Stamm am nächsten verwandt ist. Auffällig ist dabei der Verlust von einigen open reading frames (ORFs), die noch im Eltern LIVP Stamm vorhanden sind (CrmE, v-GAAP). Beim Vergleich der Funktion viraler Gene aus Lister, WR und COP Stämmen treten stamm-, gewebe- und wirtsspezifische Gene auf. Diese Tatsache ermöglicht die Optimierung der onkolytischen Virusstämme für den Einsatz bei humanen Krebstherapien. Die beim Vergleich identifizierten Unterschiede zwischen den ORFs enthalten Gene für die Wirtsselektion, Virulenz und immunmodulierende Proteine (Ankyrin ähnliche Proteine, Serine-Proteinasen Inhibitor SPI-2/CrmA, Tumor Nekrose Faktor (TNF) Rezeptorhomolog CrmC, semaphorinähnliche und Interleukin-1 rezeptorhomologe Proteine). An den Loki F14.5L, J2R und A56R des GLV-1h68 Virus wurden die Vorteile der eingesetzten fremden Genexpressionskassetten untersucht. So zeigt GLV-1h68 mit F14.5L-Inaktivierung gegenüber der F14.5L-Revertanten Viren eine reduzierte Virulenz. Das erlaubt die Schlussfolgerung, dass die Insertion von fremden Genexpressionskassetten in nicht-essentielle Loki zur Verminderung der Virulenz von VACVs führt, besonders, wenn der nicht-essentielle Lokus selbst ein Virulenzgen enthält. Das Replikationsvermögen, welches ausschlaggebend für die onkolytische Aktivität des Virus ist, wird trotz der verminderten Virulenz nicht eingeschränkt. Die reduzierte Pathogenität des GLV-1h68 Virus wurde durch experimentelle Daten unserer Kollaborationspartner in männlichen Mäusen mit Ratten C6 Gliom und in immunokompetenten Mäusen mit B16-F10 Mausmelanom nachgewiesen. Zusammenfassend zeigen experimentelle und bioinformatisch gewonnene Daten, dass GLV-1h68 eine vielversprechende VACV Variante für die Krebstherapie mit tumorspezifischer Replikation, verringerter Pathogenität und hoher Gewebsspezifität ist. Genanalyse Bioinformatik Systembiologie ddc:570
64	Applying microarray‐based techniques to study gene expression patterns: a bio‐computational approach / Anwendung von Mikroarrayanalysen um Genexpressionsmuster zu untersuchen: Ein bioinformatischer Ansatz Vainshtein, Yevhen January 2010 (has links) (PDF) The regulation and maintenance of iron homeostasis is critical to human health. As a constituent of hemoglobin, iron is essential for oxygen transport and significant iron deficiency leads to anemia. Eukaryotic cells require iron for survival and proliferation. Iron is part of hemoproteins, iron-sulfur (Fe-S) proteins, and other proteins with functional groups that require iron as a cofactor. At the cellular level, iron uptake, utilization, storage, and export are regulated at different molecular levels (transcriptional, mRNA stability, translational, and posttranslational). Iron regulatory proteins (IRPs) 1 and 2 post-transcriptionally control mammalian iron homeostasis by binding to iron-responsive elements (IREs), conserved RNA stem-loop structures located in the 5’- or 3‘- untranslated regions of genes involved in iron metabolism (e.g. FTH1, FTL, and TFRC). To identify novel IRE-containing mRNAs, we integrated biochemical, biocomputational, and microarray-based experimental approaches. Gene expression studies greatly contribute to our understanding of complex relationships in gene regulatory networks. However, the complexity of array design, production and manipulations are limiting factors, affecting data quality. The use of customized DNA microarrays improves overall data quality in many situations, however, only if for these specifically designed microarrays analysis tools are available. Methods In this project response to the iron treatment was examined under different conditions using bioinformatical methods. This would improve our understanding of an iron regulatory network. For these purposes we used microarray gene expression data. To identify novel IRE-containing mRNAs biochemical, biocomputational, and microarray-based experimental approaches were integrated. IRP/IRE messenger ribonucleoproteins were immunoselected and their mRNA composition was analysed using an IronChip microarray enriched for genes predicted computationally to contain IRE-like motifs. Analysis of IronChip microarray data requires specialized tool which can use all advantages of a customized microarray platform. Novel decision-tree based algorithm was implemented using Perl in IronChip Evaluation Package (ICEP). Results IRE-like motifs were identified from genomic nucleic acid databases by an algorithm combining primary nucleic acid sequence and RNA structural criteria. Depending on the choice of constraining criteria, such computational screens tend to generate a large number of false positives. To refine the search and reduce the number of false positive hits, additional constraints were introduced. The refined screen yielded 15 IRE-like motifs. A second approach made use of a reported list of 230 IRE-like sequences obtained from screening UTR databases. We selected 6 out of these 230 entries based on the ability of the lower IRE stem to form at least 6 out of 7 bp. Corresponding ESTs were spotted onto the human or mouse versions of the IronChip and the results were analysed using ICEP. Our data show that the immunoselection/microarray strategy is a feasible approach for screening bioinformatically predicted IRE genes and the detection of novel IRE-containing mRNAs. In addition, we identified a novel IRE-containing gene CDC14A (Sanchez M, et al. 2006). The IronChip Evaluation Package (ICEP) is a collection of Perl utilities and an easy to use data evaluation pipeline for the analysis of microarray data with a focus on data quality of custom-designed microarrays. The package has been developed for the statistical and bioinformatical analysis of the custom cDNA microarray IronChip, but can be easily adapted for other cDNA or oligonucleotide-based designed microarray platforms. ICEP uses decision tree-based algorithms to assign quality flags and performs robust analysis based on chip design properties regarding multiple repetitions, ratio cut-off, background and negative controls (Vainshtein Y, et al., 2010). / Die Regulierung und Aufrechterhaltung der Eisen-Homeostase ist bedeutend für die menschliche Gesundheit. Als Bestandteil des Hämoglobins ist es wichtig für den Transport von Sauerstoff, ein Mangel führt zu Blutarmut. Eukaryotische Zellen benötigen Eisen zum Überleben und zum Proliferieren. Eisen ist am Aufbau von Hämo- und Eisenschwefelproteinen (Fe-S) beteiligt und kann als Kofaktor dienen. Die Aufnahme, Nutzung, Speicherung und der Export von Eisen ist zellulär auf verschiedenen molekularen Ebenen reguliert (Transkription, mRNA-Level, Translation, Protein-Level). Die iron regulatory proteins (IRPs) 1 und 2 kontrollieren die Eisen-Homeostase in Säugetieren posttranslational durch die Bindung an Iron-responsive elements (IREs). IREs sind konservierte RNA stem-loop Strukturen in den 5' oder 3' untranslatierten Bereichen von Genen, die im Eisenmetabolismus involviert sind (z.B. FTH1, FTL und TFRC). In dieser Arbeit wurden biochemische und bioinformatische Methoden mit Microarray-Experimenten kombiniert, um neue mRNAs mit IREs zu identifizieren. Genexpressionsstudien verbessern unser Verständnis über die komplexen Zusammenhänge in genregulatorischen Netzwerken. Das komplexe Design von Microarrays, deren Produktion und Manipulation sind dabei die limitierenden Faktoren bezüglich der Datenqualität. Die Verwendung von angepassten DNA Microarrays verbessert häufig die Datenqualität, falls entsprechende Analysemöglichkeiten für diese Arrays existieren. Methoden Um unser Verständnis von eisenregulierten Netzwerken zu verbessern, wurde im Rahmen dieses Projektes die Auswirkung einer Behandlung mit Eisen bzw. von Knockout Mutation unter verschiedenen Bedingungen mittels bioinformatischer Methoden untersucht. Hierfür nutzen wir Expressionsdaten aus Microarray-Experimenten. Durch die Verknüpfung von biochemischen, bioinformatischen und Microarray Ansätzen können neue Proteine mit IREs identifiziert werden. IRP/IRE messenger Ribonucleoproteine wurden immunpräzipitiert. Die Zusammensetzung der enthaltenen mRNAs wurde mittels einem IronChip Microarray analysiert: Für diesen Chip wurden bioinformatisch Gene vorhergesagt, die IRE-like Motive aufweisen. Der Chip wurde mit solchen Oligonucleotiden beschichtet und durch Hybridisierung überprüft, ob die präzipitierten mRNA sich hieran binden. Die Analyse der erhaltenen Daten erfordert ein spezialisiertes Werkzeug um von allen Vorteilen der angepassten Microarrays zu profitieren. Ein neuer Entscheidungsbaum-basierter Algorithmus wurde in Perl im IronChip Evaluation Package (ICEP) implementiert. Ergebnisse Aus großen Sequenz-Datenbanken wurden IRE-like Motive identifiziert. Dazu kombiniert der Algorithmus, insbesondere RNA-Primärsequenz und RNA-Strukturdaten. Solche Datenbankanalysen tendieren dazu, eine große Anzahl falsch positiver Treffer zu generieren. Daher wurden zusätzliche Bedingungen formuliert, um die Suche zu verfeinern und die Anzahl an falsch positiven Treffer zu reduzieren. Die angepassten Suchkriterien ergaben 15 IRE-like Motive. In einem weiteren Ansatz verwendeten wir eine Liste von 230 IRE-like Sequenzen aus UTR-Datenbanken. Daraus wurden 6 Sequenzen ausgewählt, die auch im unteren Teil stabil sind (untere Helix über 6 bp stabil). Die korrespondierenden Expressed Sequence Tags (ESTs) wurden auf die humane oder murine Version des IronChips aufgetragen. Die Microarray Ergebnisse wurden mit dem ICEP Programm ausgewertet. Unsere Ergebnisse zeigen, dass die Immunpräzipitation mit anschließender Microarrayanalyse ein nützlicher Ansatz ist, um bioinformatisch vorhergesagte IRE-Gene zu identifizieren. Darüber hinaus ermöglicht uns dieser Ansatz die Detektion neuer mRNAs, die IREs enthalten, wie das von uns gefundene Gen CDC14A (Sanchez et al., 2006). ICEP ist ein optimiertes Programmpaket aus Perl Programmen (Vainshtein et al., BMC Bioinformatics, 2010). Es ermöglicht die einfache Auswertung von Microarray Daten mit dem Fokus auf selbst entwickelten Microarray Designs. ICEP diente für die statistische und bioinformatische Analyse von selbst entwickelten IronChips, kann aber auch leicht an die Analyse von oligonucleotidbasierten oder cDNA Microarrays adaptiert werden. ICEP nutzt einen Entscheidungsbaum-basierten Algorithmus um die Qualität zu bewerten und führt eine robuste Analyse basierend auf Chipeigenschaften, wie mehrfachen Wiederholungen, Signal/Rausch Verhältnis, Hintergrund und Negativkontrollen durch. Microarray Genexpression Bioinformatik ddc:570
65	Comparative metagenomic analysis of the human intestinal microbiota / Vergleichende metagenomische Analyse des menschlichen Darmflora Arumugam, Manimozhiyan January 2010 (has links) (PDF) The human gut is home for thousands of microbes that are important for human life. As most of these cannot be cultivated, metagenomics is an important means to understand this important community. To perform comparative metagenomic analysis of the human gut microbiome, I have developed SMASH (Simple metagenomic analysis shell), a computational pipeline. SMASH can also be used to assemble and analyze single genomes, and has been successfully applied to the bacterium Mycoplasma pneumoniae and the fungus Chaetomium thermophilum. In the context of the MetaHIT (Metagenomics of the human intestinal tract) consortium our group is participating in, I used SMASH to validate the assembly and to estimate the assembly error rate of 576.7 Gb metagenome sequence obtained using Illumina Solexa technology from fecal DNA of 124 European individuals. I also estimated the completeness of the gene catalogue containing 3.3 million open reading frames obtained from these metagenomes. Finally, I used SMASH to analyze human gut metagenomes of 39 individuals from 6 countries encompassing a wide range of host properties such as age, body mass index and disease states. We find that the variation in the gut microbiome is not continuous but stratified into enterotypes. Enterotypes are complex host-microbial symbiotic states that are not explained by host properties, nutritional habits or possible technical biases. The concept of enterotypes might have far reaching implications, for example, to explain different responses to diet or drug intake. We also find several functional markers in the human gut microbiome that correlate with a number of host properties such as body mass index, highlighting the need for functional analysis and raising hopes for the application of microbial markers as diagnostic or even prognostic tools for microbiota-associated human disorders. / Der menschliche Darm beheimatet tausende Mikroben, die für das menschliche Leben wichtig sind. Da die meisten dieser Mikroben nicht kultivierbar sind, ist „Metagenomics“ ein wichtiges Werkzeug zum Verständnis dieser wichtigen mikrobiellen Gemeinschaft. Um vergleichende Metagenomanalysen durchführen zu können, habe ich das Computerprogramm SMASH (Simple metagenomic analysis shell) entwickelt. SMASH kann auch zur Assemblierung und Analyse von Einzelgenomen benutzt werden und wurde erfolgreich auch das Bakterium Mycoplasma pneumoniae und den Pilz Chaetomium thermophilum angewandt. Im Zusammenhang mit der Beteiligung unserer Arbeitsgruppe am MetaHIT (Metagenomics of the human intestinal tract) Konsortium, habe ich SMASH benutzt um die Assemblierung zu validieren und die Fehlerrate der Assemblierung von 576.7 Gb Metagenomsequenzen, die mit der Illumina Solexa Technologie aus der fäkalen DNS von 124 europäischen Personen gewonnen wurde, zu bestimmen. Des Weiteren habe ich die Vollständigkeit des Genkatalogs dieser Metagenome, der 3.3 Millionen offene Leserahmen enthält, geschätzt. Zuletzt habe ich SMASH benutzt um die Darmmetagenome von 39 Personen aus 6 Ländern zu analysieren. Hauptergebnis dieser Analyse war, dass die Variation der Darmmikrobiota nicht kontinuierlich ist. Anstatt dessen fanden wir so genannte Enterotypen. Enterotypen sind komplexe Zustände der Symbiose zwischen Wirt und Mikroben, die sich nicht durch Wirteigenschaften, wie Alter, Body-Mass-Index, Erkrankungen und Ernährungseigenschaften oder ein mögliches technisches Bias erklären lassen. Das Konzept der Enterotypen könnte weitgehende Folgen haben. Diese könnten zum Beispiel die unterschiedlichen Reaktionen auf Diäten oder Medikamenteneinahmen erklären. Weiterhin konnten wir eine Anzahl an Markern im menschlichen Darmmikrobiome finden, die mit unterschiedlichen Wirtseigenschaften wie dem Body-Mass-Index korrelieren. Dies hebt die Wichtigkeit dieser Analysemethode hervor und erweckt Hoffnungen auf Anwendung mikrobieller Marker als diagnostisches oder sogar prognostisches Werkzeug für menschliche Erkrankungen in denen das Mikrobiom eine Rolle spielt. Darmflora Metagenom Bioinformatik ddc:570
66	Coverage Analysis in Clinical Next-Generation Sequencing Odelgard, Anna January 2019 (has links) With the new way of sequencing by NGS new tools had to be developed to be able to work with new data formats and to handle the larger data sizes compared to the previous techniques but also to check the accuracy of the data. Coverage analysis is one important quality control for NGS data, the coverage indicates how many times each base pair has been sequenced and thus how trustworthy each base call is. For clinical purposes every base of interest must be quality controlled as one wrong base call could affect the patient negatively. The softwares used for coverage analysis with enough accuracy and detail for clinical applications are sparse. Several softwares like Samtools, are able to calculate coverage values but does not further process this information in a useful way to produce a QC report of each base pair of interest. My master thesis has therefore been to create a new coverage analysis report tool, named CAR tool, that extract the coverage values from Samtools and further uses this data to produce a report consisting of tables, lists and figures. CAR tool is created to replace the currently used tool, ExCID, at the Clinical Genomics facility at SciLifeLab in Uppsala and was developed to meet the needs of the bioinformaticians and clinicians. CAR tool is written in python and launched from a terminal window. The main function of the tool is to display coverage breath values for each region of interest and to extract all sub regions below a chosen coverage depth threshold. The low coverage regions are then reported together with region name, start and stop positions, length and mean coverage value. To make the tool useful to as many as possible several settings are possible by entering different flags when calling the tool. Such settings can be to generate pie charts of each region’s coverage values, filtering of the read and bases by quality or write your own entry that will be used for the coverage calculation by Samtools. The tool has been proved to find these low coverage regions very well. Most low regions found are also found by ExCID, the currently used tool, some differences did however occur and every such region was verified by IGV. The coverage values shown in IGV coincided with those found by CAR tool. CAR tool is written to find all low coverage regions even if they are only one base pair long, while ExCID instead seem to generate larger low regions not taking very short low regions into account. To read more about the functions and how to use CAR tool I refer to User instructions in the appendix and on GitHub at the repository anod6351 Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
67	Implementation of an automatic quality control of derived data files for NONMEM Sandström, Eric January 2019 (has links) A pharmacometric analysis must be based on correct data to be valid. Source clinical data is rarely ready to be modelled as is, but rather needs to be reprogrammed to fit the format required by the pharmacometric modelling software. The reprogramming steps include selecting the subsets of data relevant for modelling, deriving new information from the source and adjusting units and encoding. Sometimes, the source data may also be flawed, containing vague definitions and missing or confusing values. In either setting, the source data needs to be reprogrammed to remedy this, followed by extensive quality control to capture any errors or inconsistencies produced along the way. The quality control is a lengthy task which is often performed manually, either by the scientists conducting the pharmacometric study or by independent reviewers. This project presents an automatic data quality control with the purpose of aiding the data curation process, as to minimize any potential errors that would otherwise have to be detected by the manual quality control. The automatic quality control is implemented as an R-package and is specifically tailored for the needs of Pharmetheus. Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
68	Integrated functional analysis of biological networks / Integrierte funktionelle Analyse biologischer Netzwerke Beisser, Daniela January 2011 (has links) (PDF) In recent years high-throughput experiments provided a vast amount of data from all areas of molecular biology, including genomics, transcriptomics, proteomics and metabolomics. Its analysis using bioinformatics methods has developed accordingly, towards a systematic approach to understand how genes and their resulting proteins give rise to biological form and function. They interact with each other and with other molecules in highly complex structures, which are explored in network biology. The in-depth knowledge of genes and proteins obtained from high-throughput experiments can be complemented by the architecture of molecular networks to gain a deeper understanding of biological processes. This thesis provides methods and statistical analyses for the integration of molecular data into biological networks and the identification of functional modules, as well as its application to distinct biological data. The integrated network approach is implemented as a software package, termed BioNet, for the statistical language R. The package includes the statistics for the integration of transcriptomic and functional data with biological networks, the scoring of nodes and edges of these networks as well as methods for subnetwork search and visualisation. The exact algorithm is extensively tested in a simulation study and outperforms existing heuristic methods for the calculation of this NP-hard problem in accuracy and robustness. The variability of the resulting solutions is assessed on perturbed data, mimicking random or biased factors that obscure the biological signal, generated for the integrated data and the network. An optimal, robust module can be calculated using a consensus approach, based on a resampling method. It summarizes optimally an ensemble of solutions in a robust consensus module with the estimated variability indicated by confidence values for the nodes and edges. The approach is subsequently applied to two gene expression data sets. The first application analyses gene expression data for acute lymphoblastic leukaemia (ALL) and differences between the subgroups with and without an oncogenic BCR/ABL gene fusion. In a second application gene expression and survival data from diffuse large B-cell lymphomas are examined. The identified modules include and extend already existing gene lists and signatures by further significant genes and their interactions. The most important novelty is that these genes are determined and visualised in the context of their interactions as a functional module and not as a list of independent and unrelated transcripts. In a third application the integrative network approach is used to trace changes in tardigrade metabolism to identify pathways responsible for their extreme resistance to environmental changes and endurance in an inactive tun state. For the first time a metabolic network approach is proposed to detect shifts in metabolic pathways, integrating transcriptome and metabolite data. Concluding, the presented integrated network approach is an adequate technique to unite high-throughput experimental data for single molecules and their intermolecular dependencies. It is flexible to apply on diverse data, ranging from gene expression changes over metabolite abundances to protein modifications in a combination with a suitable molecular network. The exact algorithm is accurate and robust in comparison to heuristic approaches and delivers an optimal, robust solution in form of a consensus module with confidence values. By the integration of diverse sources of information and a simultaneous inspection of a molecular event from different points of view, new and exhaustive insights into biological processes can be acquired. / In den letzten Jahren haben Hochdurchsatz-Experimente gewaltige Mengen an molekularbiologischen Daten geliefert, angefangen mit dem ersten sequenzierten Genom von Haemophilus influenzae im Jahr 1995 und dem menschlichen Genom im Jahr 2001. Mittlerweile umfassen die resultierenden Daten neben der Genomik die Bereiche der Transkriptomik, Proteomik und Metabolomik. Die Analyse der Daten mithilfe von bioinformatischen Methoden hat sich entsprechend mit verändert und weiterentwickelt. Durch neuartige, systembiologische Ansätze versucht man zu verstehen, wie Gene und die aus ihnen resultierenden Proteine, biologische Formen und Funktionen entstehen lassen. Dabei interagieren sie miteinander und mit anderen Molekülen in hoch komplexen Strukturen, welche durch neue Ansätze der Netzwerkbiologie untersucht werden. Das tiefgreifende Wissen über einzelne Moleküle, verfügbar durch Hochdurchsatz-Technologien, kann komplementiert werden durch die Architektur und dynamischen Interaktionen molekularer Netzwerke und somit ein umfassenderes Verständnis biologischer Prozesse ermöglichen. Die vorliegende Dissertation stellt Methoden und statistische Analysen zur Integration molekularer Daten in biologische Netzwerke, Identifikation robuster, funktionaler Subnetzwerke sowie die Anwendung auf verschiedenste biologische Daten vor. Der integrative Netzwerkansatz wurde als ein Softwarepaket, BioNet, in der statistischen Programmiersprache R implementiert. Das Paket beinhaltet statistische Verfahren zur Integration transkriptomischer und funktionaler Daten, die Gewichtung von Knoten und Kanten in biologischen Netzwerken sowie Methoden zur Suche signifikanter Bereiche, Module, und deren Visualisierung. Der exakte Algorithmus wird ausführlich in einer Simulationsstudie getestet und übertrifft heuristische Methoden zur Lösung dieses NP-vollständigen Problems in Genauigkeit und Robustheit. Die Variabilität der resultierenden Lösungen wird bestimmt anhand von gestörten integrierten Daten und gestörten Netzwerken, welche zufällige und verzerrende Einflüsse darstellen, die die Daten verrauschen. Ein optimales, robustes Modul kann durch einen Konsensusansatz bestimmt werden. Basierend auf einer wiederholten Stichprobennahme der integrierten Daten, wird ein Ensemble von Lösungen erstellt, aus welchem sich das robuste und optimale Konsensusmodul berechnen lässt. Zusätzlich erlaubt dieser Ansatz eine Schätzung der Variabilität des Konsensusmoduls und die Berechnung von Konfidenzwerte für Knoten und Kanten. Der Ansatz wird anschließend auf zwei Genexpressionsdatensätze angewandt. Die erste Anwendung untersucht Genexpressionsdaten für akute lymphoblastische Leukämie (ALL) und analysiert Unterschiede in Subgruppen mit und ohne BRC/ABL Genfusion. Die zweite Anwendung wertet Genexpressions- und Lebenszeitdaten für diffuse großzellige B-Zell Lymphome (DLBCL) aus, beruhend auf molekularen Unterschieden zwischen zwei DLBCL Subtypen mit unterschiedlicher Malignität. In einer dritten Anwendung wird der integrierte Netzwerkansatz benutzt, um Veränderungen im Metabolismus von Tardigraden aufzuspüren und Signalwege zu identifizieren, welche für die extreme Anpassungsfähigkeit an wechselnde Umweltbedingungen und Überdauerung in einem inaktiven Tönnchenstadium verantwortlich sind. Zum ersten Mal wird dafür ein metabolischer Netzwerkansatz vorgeschlagen, der metabolische Veränderungen durch die Integration von metabolischen und transkriptomischen Daten bestimmt. Abschließend ist zu bemerken, dass die präsentierte integrierte Netzwerkanalyse eine adäquate Technik ist, um experimentelle Daten aus Hochdurchsatz-Methoden, die spezialisiert auf eine Molekülart sind, mit ihren intermolekularen Wechselwirkungen und Abhängigkeiten in Verbindung zu bringen. Sie ist flexibel in der Anwendung auf verschiedenste Daten, von der Analyse von Genexpressionsveränderungen, über Metabolitvorkommen bis zu Proteinmodifikationen, in Kombination mit einem geeigneten molekularen Netzwerk. Der exakte Algorithmus ist akkurat und robust in Vergleich zu heuristischen Methoden und liefert eine optimale, robuste Lösung in Form eines Konsensusmoduls mit zugewiesenen Konfidenzwerten. Durch die Integration verschiedenster Informationsquellen und gleichzeitige Betrachtung eines biologischen Ereignisses von diversen Blickwinkeln aus, können neue und vollständigere Erkenntnisse physiologischer Prozesse gewonnen werden. Bioinformatik differenzielle Genexpression ddc:570
69	Automatisierte Klassifizierung und Viabilitätsanalyse von Phytoplankton / Automated classification and viability analysis for phytoplankton Schulze, Katja January 2014 (has links) (PDF) Zentrales Ziel dieser Arbeit war es, Methoden der Mikroskopie, Bildverarbeitung und Bilderkennung für die Charakterisierungen verschiedener Phyotplankter zu nutzen, um deren Analyse zu verbessern und zu vereinfachen. Der erste Schwerpunkt der Arbeit lag auf der Analyse von Phytoplanktongemeinschaften, die im Rahmen der Überprüfung der Süßwasserqualität als Marker dienen. Die konventionelle Analyse ist dabei sehr aufwendig, da diese noch immer vollständig von Hand durchgeführt wird und hierfür speziell ausgebildetes Personal eingesetzt werden muss. Ziel war es, ein System zur automatischen Erkennung aufzubauen, um die Analyse vereinfachen zu können. Mit Hilfe von automatischer Mikroskopie war es möglich Plankter unterschiedlicher Ausdehnung durch die Integration mehrerer Schärfeebenen besser in einem Bild aufzunehmen. Weiterhin wurden verschiedene Fluoreszenzeigenschaften in die Analyse integriert. Mit einem für ImageJ erstellten Plugin können Organismen vom Hintergrund der Aufnahmen abgetrennt und eine Vielzahl von Merkmalen berechnet werden. Über das Training von neuralen Netzen wird die Unterscheidung von verschieden Gruppen von Planktontaxa möglich. Zudem können weitere Taxa einfach in die Analyse integriert und die Erkennung erweitert werden. Die erste Analyse von Mischproben, bestehend aus 10 verschiedenen Taxa, zeigte dabei eine durchschnittliche Erkennungsrate von 94.7% und eine durchschnittliche Falsch-Positiv Rate von 5.5%. Im Vergleich mit bestehenden Systemen konnte die Erkennungsrate verbessert und die Falsch Positiv Rate deutlich gesenkt werde. Bei einer Erweiterung des Datensatzes auf 22 Taxa wurde darauf geachtet, Arten zu verwenden, die verschiedene Stadien in ihrem Wachstum durchlaufen oder höhere Ähnlichkeiten zu den bereits vorhandenen Arten aufweisen, um evtl. Schwachstellen des Systemes erkennen zu können. Hier ergab sich eine gute Erkennungsrate (86.8%), bei der der Ausschluss von nicht-planktonischen Partikeln (11.9%) weiterhin verbessert war. Der Vergleich mit weiteren Klassifikationsverfahren zeigte, dass neuronale Netze anderen Verfahren bei dieser Problemstellung überlegen sind. Ähnlich gute Klassifikationsraten konnten durch Support Vektor Maschinen erzielt werden. Allerdings waren diese bei der Unterscheidung von unbekannten Partikeln dem neuralen Netz deutlich unterlegen. Der zweite Abschnitt stellt die Entwicklung einer einfachen Methode zur Viabilitätsanalyse von Cyanobakterien, bei der keine weitere Behandlung der Proben notwendig ist, dar. Dabei wird die rote Chlorophyll - Autofluoreszenz als Marker für lebende Zellen und eine grüne unspezifische Fluoreszenz als Marker für tote Zellen genutzt. Der Assay wurde mit dem Modellorganismus Synechocystis sp. PCC 6803 etabliert und validiert. Die Auswahl eines geeigeneten Filtersets ermöglicht es beide Signale gleichzeitig anzuregen und zu beobachten und somit direkt zwischen lebendenden und toten Zellen zu unterscheiden. Die Ergebnisse zur Etablierung des Assays konnten durch Ausplattieren, Chlorophyllbestimmung und Bestimmung des Absorbtionsspektrums bestätigt werden. Durch den Einsatz von automatisierter Mikroskopie und einem neu erstellten ImageJ Plugin wurde eine sehr genaue und schnelle Analyse der Proben möglich. Der Einsatz beim Monitoring einer mutagenisierten Kultur zur Erhöhung der Temperaturtoleranz ermöglichte genaue und zeitnahe Einblicke in den Zustand der Kultur. Weitere Ergebnisse weisen darauf hin, dass die Kombination mit Absorptionsspektren es ermöglichen können bessere Einblicke in die Vitalität der Kultur zu erhalten. / Central goal of this work was to improve and simplify the characterization of different phytoplankter by the use of automated microscopy, image processing and image analysis. The first part of the work dealt with the analysis of pytoplankton communities, which are used as a marker for the determination of fresh water quality. The current routine analysis, is very time consuming and expensive, as it is carried out manually by trained personnel. Thus the goal of this work was to develop a system for automating the analysis. With the use of automated microscopy different focal planes could be integrated into one image, which made it possible to image plankter of different focus levels simultaneously. Additionally it allowed the integration of different fluorescence characteristics into the analysis. An image processing routine, developed in ImageJ, allows the segmentation of organisms from the image background and the calculation of a large range of features. Neural networks are then used for the classification of previously defined groups of plankton taxa. The program allows easy integration of additional taxa and expansion of the recognition targets. The analysis of samples containing 10 different taxa showed an average recognition rate of 94.7% and an average error rate of 5.5%. The obtained recognition rate was better than those of existing systems and the exclusion of non-plankton particles could be greatly improved. After extending the data set to 22 different classes of (more demanding) taxa a still good recognition (86.9 %) and still improved error rate (11.9 %) were obtained. This extended set was specifically selected in order to target potential weaknesses of the system. It contained mainly taxa that showed strong similarities to each other or taxa that go through various different morphological stages during their growth. The obtained recognition rates were comparable or better than those of existing systems and the exclusion of non-plankton particles could be greatly improved. A comparison of different classification methods showed, that neural networks are superior to all other investigated methods when used for this specific task. While similar recognition rates could be achieved with the use of support vector machines they were vastly inferior for the differentiation of unknown particles. The second part focused on the development of a simple live - dead assay for unicellular cyanobacteria without the need of sample preparation. The assay uses red chlorophyll fluorescence, corresponding to viable cells, and an unspecific green autofluorescence, that can only be observed in non viable cells. The assay was established and validated for the model organism Synechocystis sp. PCC 6803. With the selection of a suitable filter-set both signals could be excited and observed simultaneously, allowing a direct classification of viable and non-viable cells. The results were confirmed by plating/colony count, absorption spectra and chlorophyll measurements. The use of an automated fluorescence microscope and an ImageJ based image analysis plugin allows a very precise and fast analysis. The monitoring of a random mutagenized culture undergoing selection for improved temperature tolerance allowed an accurate and prompt insight into the condition of the culture. Further results indicate that a combination of the new assay with absorption spectra or chlorophyll concentration measurements allows the estimation of the vitality of cells. Bilderkennnung Bioinformatik Phytoplankton ddc:570
70	New Proteomics Methods and Fundamental Aspects of Peptide Fragmentation / Nya Proteomik Metoder och Fundamentala Aspekter av Peptid Fragmentering Savitski, Mikhail January 2007 (has links) <p>The combination of collision-activated dissociation, (CAD) and electron capture dissociation, (ECD) yielded a 125% increase in protein identification. The S-score was developed for measuring the information content in MS/MS spectra. This measure made it possible to single out good quality spectra that were not identified by a search engine. Poor quality MS/MS data was filtered out, streamlining the identification process.</p><p>A proteomics grade de novo sequencing approach was developed enabling to almost completely sequence 19% of all MS/MS data with 95% reliability in a typical proteomics experiment.</p><p>A new tool, Modificomb, for identifying all types of modifications in a fast, reliable way was developed. New types of modifications have been discovered and the extent of modifications in gel based proteomics turned out to be greater than expected.</p><p>PhosTShunter was developed for sensitive identification of all phosphorylated peptides in an MS/MS dataset.</p><p>Application of these programs to human milk samples led to identification of a previously unreported and potentially biologically important phosphorylation site.</p><p>Peptide fragmentation has been studied. It was shown emphatically on a dataset of 15.000 MS/MS spectra that CAD and ECD have different cleavage preferences with respect to the amino acid context.</p><p>Hydrogen rearrangement involving z• species has been investigated. Clear trends have been unveiled. This information elucidated the mechanism of hydrogen transfer.</p><p>Partial side-chain losses in ECD have been studied. The potential of these ions for reliably distinguishing Leu/Iso residues was shown. Partial sidechain losses occurring far away from the cleavage site have been detected. </p><p>A strong correlation was found between the propensities of amino acids towards peptide bond cleavage employing CAD and the propensity of amino acids to accept in solution backbone-backbone H-bonds and form stable motifs. This indicated that the same parameter governs formation of secondary structures in solution and directs fragmentation in peptide ions by CAD.</p> Bioinformatics Proteomics Peptide fragmentation Bioinformatik

Search results