Global ETD Search

141	Filtering of Clinical NGS Data to Improve Low Allele Frequency Variant Calling Cumlin, Tomas January 2022 (has links) Massive parallel sequencing (NGS) is useful in detecting and later classifying somatic driver mutations in cancer tumours. False-positive variants occur in the NGS workflow and they may be mistaken for low frequency somatic cancer mutations in a patient sample. This pushes the need for decreasing the noise rate in the NGS workflow since it may improve the detection of rare allele frequency variants, in particular cancer mutations. In this project, the aim was to reduce the level of false-positive variants in an NGS workflow. The scope was limited to looking at substitution errors and their neighbouring nucleotides. Alongside this, it was also a way to understand how different types of substitution errors are distributed in the data, if their frequencies are affected by neighbouring nucleotides and how data processing may affect these substitution rates. A bioinformatic pipeline was set up where a commercially available genomic DNA sample with known variants was subjected to different trimming and filtering settings. The goal was to reduce the substitution error rate as much as possible, without removing any true variants from the data. The optimised settings were trimming the sequencing reads with 5 bp from the tail and filtering sequencing reads that contained 5 or more substitutions. Three additional samples, whereof two were clinical and the third commercial, were tested with these settings. The results showed that in all samples, C:G>T:A substitutions were of a higher frequency compared to the rest of the substitution types. For all samples, A:T>C:G substitutions, where the neighbouring nucleotide was a C or a G on each side, had a higher frequency compared to A:T>C:G substitutions with other neighbouring nucleotides on both sides. Those substitution types were especially targeted by the trimming. For the two commercial samples, substitutions that resulted in the nucleotide combinations >XAA or >XTT were of a higher frequency compared to the same substitution types that did not result in those nucleotide combinations. Filtering reads with 5 or more substitutions particularly targeted these substitution types. Consequently, filtering had a greater effect on the commercial samples, compared to the clinical samples. Overall, trimming and filtering helped reduce transversions more than the transitions, increasing the transition/transversion ratio after processing the data. The results suggest that trimming and filtering can be a useful method to computationally reduce the transversion errors introduced in an NGS workflow, but transition errors to a lesser extent, in particular A:T>G:C transitions. To confirm these findings, more samples should be tested using this methodology. To better understand the effect of trimming and filtering on variant calling, the scope could in the future be expanded to also look at small insertions and deletions. Next-generation sequencing Variant calling Clinical sequencing Substitution error Cancer sequencing Error rate substitution Somatic mutation Bioinformatics and Systems Biology Bioinformatik och systembiologi
142	Statistical and machine learning methods to analyze large-scale mass spectrometry data The, Matthew January 2016 (has links) As in many other fields, biology is faced with enormous amounts ofdata that contains valuable information that is yet to be extracted. The field of proteomics, the study of proteins, has the luxury of having large repositories containing data from tandem mass-spectrometry experiments, readily accessible for everyone who is interested. At the same time, there is still a lot to discover about proteins as the main actors in cell processes and cell signaling. In this thesis, we explore several methods to extract more information from the available data using methods from statistics and machine learning. In particular, we introduce MaRaCluster, a new method for clustering mass spectra on large-scale datasets. This method uses statistical methods to assess similarity between mass spectra, followed by the conservative complete-linkage clustering algorithm.The combination of these two resulted in up to 40% more peptide identifications on its consensus spectra compared to the state of the art method. Second, we attempt to clarify and promote protein-level false discovery rates (FDRs). Frequently, studies fail to report protein-level FDRs even though the proteins are actually the entities of interest. We provided a framework in which to discuss protein-level FDRs in a systematic manner to open up the discussion and take away potential hesitance. We also benchmarked some scalable protein inference methods and included the best one in the Percolator package. Furthermore, we added functionality to the Percolator package to accommodate the analysis of studies in which many runs are aggregated. This reduced the run time for a recent study regarding a draft human proteome from almost a full day to just 10 minutes on a commodity computer, resulting in a list of proteins together with their corresponding protein-level FDRs. / <p>QC 20160412</p> mass spectrometry - LC-MS/MS statistical analysis data processing and analysis protein inference large-scale studies simulation Bioinformatics and Systems Biology Bioinformatik och systembiologi
143	Characterisation of Potential Inhibitors of Calmodulin from Plasmodium falciparum Iversen, Alexandra, Nordén, Ebba, Bjers, Julia, Wickström, Filippa, Zhou, Martin, Hassan, Mohamed January 2020 (has links) Each year countless lives are affected and about half a million people die from malaria, a disease caused by parasites originating from the Plasmodium family. The most virulent species of the parasite is Plasmodium falciparum (P. falciparum). Calmodulin (CaM) is a small, 148 amino acid long, highly preserved and essential protein in all eukaryotic cells. Previous studies have determined that CaM is important for the reproduction and invasion of P. falciparum in host cells. The primary structure of human CaM (CaMhum) and CaM from P. falciparum (CaMpf) differ in merely 16 positions, making differences in their structures and ligand affinity interesting to study. Especially since possible inhibitors of CaMpf in favor of CaMhum, in extension, could give rise to new malaria treatments. Some antagonists, functioning as inhibitors of CaM, have already been analysed in previous studies. However, there are also compounds that have not yet been studied in regards to being possible antagonists of CaM. This study regards three known antagonists; trifluoperazine (TFP), calmidazolium (CMZ) and artemisinin (ART) and also three recently created fentanyl derivatives; 3-OH-4-OMe-cyclopropylfentanyl (ligand 1), 4-OH-3OMe-4F-isobutyrylfentanyl (ligand 2) and 3-OH-4-OMe-isobutyrylfentanyl (ligand 3). Bioinformatic methods, such as modelling and docking, were used to compare the structures of CaMhum and CaMpf as well as observe the interaction of the six ligands to CaM from both species. In addition to the differences in primary structure, distinguished with ClustalW, disparities in tertiary structure were observed. Structure analysis of CaMhum and CaMpf in PyMOL disclosed a more open conformation as well as a larger, more defined, hydrophobic cleft in CaMhum compared to CaMpf. Simulated binding of the six ligands to CaM from both species, using Autodock 4.2, indicated that TFP and ART bind with higher affinity to CaMhum which is expected. Ligand 2 and ligand 3 also bound with higher affinity and facilitated stronger binding to CaMhum, which is reasonable since their docking is based on how TFP binds to CaM. However, ligand 1 as well as CMZ both bound to CaMpf with higher affinity. Despite promising results for ligand 1 and CMZ, no decisive conclusion can be made solely based on bioinformatic studies. To gain a better understanding on the protein-ligand interactions of the six ligands to CaMhum and CaMpf, further studies using e.g. circular dichroism and fluorescence would be advantageous. Based on the results from this study, future studies on the binding of CMZ and ligand 1 to CaM as well as ligands with similar characteristics would be especially valuable. This is because they, based on the results from this study, possibly are better inhibitors of CaMpf than CaMhum and thereby could function as possible antimalarial drugs. Plasmodium falciparum Calmodulin Malaria Trifluoperazine Calmidazolium Artemisinin Fentanyl derivatives Modelling Docking Bioinformatics Engineering and Technology Teknik och teknologier Bioinformatics and Systems Biology Bioinformatik och systembiologi Biochemistry and Molecular Biology Biokemi och molekylärbiologi
144	Computational prediction of cell-cell interactions in the brain-tumour microenvironment Camargo Romera, Paula January 2023 (has links) Glioblastoma is the fastest-growing, and the most common malignant brain tumour in adults. It is normally treated with surgery and radio- or chemotherapy, but the approximate life expectancy is of 15 months with a high probability of cancer recurring. Therefore, there is a need for decreasing its severity. Bulk and single-cell RNA sequencing allow the identification of cellular states in tumours affected by cell-intrinsic and extrinsic factors. Four different cellular states have been identified in glioblastoma: neural progenitor-like, oligodendrocyte progenitor-like, astrocyte-like, and mesenchymal-like. As glioblastoma is an immunosuppressive tumour, it can alter the immune system and increase the tumour's immune escaping by segregating immunosuppressive factors or interacting with the brain microenvironment.Two datasets were used in this study to explore if the localization of the tumour in the brain microenvironment and the tendency of glioblastomas to activate microglial cells are due to particular ligand-receptor interactions. Data quality control was applied to both datasets and SingleCellSignalR and CellphoneDB packages were used to predict the possible interactions. A total of seven experiments were designed for this study. The first dataset, GBmap, allowed us to do a comparison between tumour cells and microglia, tumour cells and other cell types in the brain, and the four cellular states of glioblastoma with microglia and macrophages. Next, healthy microglia from GBmap was used to compare with the tumour bulk data from the second dataset, HGCC. The bootstrap technique was performed to compare bulk data vs single-cell data, and a comparison between tumour cells and microglia or other cell types was analysed.Results showed specific and shared interactions between cell types or cellular states, revealing the different localization of the tumour cells depends on the expressed ligand-receptor pairs. Also, a total of four patterns of interactions were found in the 50 samples to have a different tendency to activate microglial cells, which are promising results to further explore drugs to interfere with or how these interactions are related to patient survival. Furthermore, even if glioblastoma is a heterogenous disease, more interactions were predicted with microglial/macrophage cells without a uniform pattern between patients, and therefore, this study is a starting point upon which further in vitro studies would be needed to study the predicted interactions as potential targets to stop the progression of this type of cancer. Cancer Glioblastoma Computational Biology Bioinformatics Microenvironment Brain tumour Interactions Systems Biology Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Bioinformatics and Systems Biology Bioinformatik och systembiologi Cancer and Oncology Cancer och onkologi
145	Identifying structural variants from plant short-read sequencing data Buinovskaja, Greta January 2022 (has links) No description available. bioinformatics structural variant association study bean structural variant detection pipeline NGS population genetics Bioinformatics and Systems Biology Bioinformatik och systembiologi
146	Isolation of the native chloroplast proteome from plant for identification of protein-metabolite interactions / Isolering av det nativa kloroplastproteomet från planta i syfte att identifiera protein-metabolitinteraktioner Strandberg, Linnéa January 2021 (has links) För att kunna livnära en växande population behöver avkastningen på skördar öka. En lösning på dettaär att optimera plantornas fotosyntes, vilket innefattar förbättrad koldioxidfixering. För att lyckas meddet krävs kunskap i hur reglering av nyckelproteiner i kloroplasten går till. Syftet med detta projekt är identifiera möjliga reglerande protein-metabolitinteraktioner i Arabidopsis thaliana. Målproteinerna ärde 11 enzymerna i Calvin-Benson-Basshamcykeln. Metaboliterna som testas är 3PGA, ATP, FBP, GAP, vilka är mellan produkter eller kofaktorer i cykeln; 2PG, som är en produkt av en konkurrerande reaktion i cykeln; och slutligen G6P, citrat och sackaros, vilka är centrala metaboliter i andra viktiga reaktioner i cellen. Före experimenten med Arabidopsis testades protokollen med spenat. Som ett första steg isolerades kloroplasterna från blad. När intakta kloroplaster verifierats extraherades proteinerna. Inter-aktioner mellan metaboliterna och proteinerna analyserades med en metod kallad limited proteolysis-small molecule mapping. Denna teknik, vilken kombinerar begränsad proteolys med masspektrometri, detekterade flertalet protein-metabolit interaktioner. I Arabidopsis uppvisade alla enzym förutom FB-Pase, PPE och TIM minst en interaktion. I spenat sågs interaktioner med FBA, GAPDH, PGK, PRK, RuBisCO, TIM och TK. Resultaten visar möjliga reglerande interaktioner, vilka skulle kunna användasför att identifiera flaskhalsar i kolfixeringen. Denna kunskap kan i sin tur utnyttjas för att öka flödet i Calvin-Benson-Basshamcykeln och därigenom förbättra växters koldioxidfixering. / In order to feed a growing population, the crop yield needs to be increased. One way to do this is to optimise the photosynthetic activity in the plant, which includes improvement of carbon fixation. To succeed with this, knowledge of the regulation of key proteins in the chloroplast is required. The aim of this project is to identify possible regulatory protein-metabolite interactions in chloroplasts from Arabidopsis thaliana. The target proteins are the 11 enzymes of the Calvin-Benson-Bassham cycle. The metabolites of interest are 3PGA, ATP, FBP, GAP, which are intermediates or co-factors of the cycle;2PG, which is a product of a competing reaction in the cycle; and finally G6P, citrate and sucrose, which are central metabolites in other vital reactions in the cell. Before the experiments with Arabidopsis, spinach was used as a test organism to evaluate the proposed protocols. First, chloroplasts were isolatedfrom leaves. When the integrity of the chloroplasts had been validated, the proteins were extracted. Metabolic interactions with the extracted proteins were analyzed with limited proteolysis-small molecule mapping. This method, which combines limited proteolysis with mass spectrometry, detected severalprotein-metabolite interactions. In Arabidopsis, all enzymes except for FBPase, PPE and TIM had atleast one interaction. In spinach, interactions were seen with FBA, GAPDH, PGK, PRK, RuBisCO,TIM and TK. The results highlight potential regulatory events, which could be used to target bottlenecks in carbon fixation. This could provide a pathway to increase the flux in the Calvin-Benson-Bassham cycle, and thereby improve carbon fixation in plants. Chloroplast metabolism proteomics Calvin cycle Arabidopsis Kloroplast metabolism proteomik Calvincykel Arabidopsis Biochemistry and Molecular Biology Biokemi och molekylärbiologi Bioinformatics and Systems Biology Bioinformatik och systembiologi
147	Epidemiological and statistical basis for detection and prediction of influenza epidemics Spreco, Armin January 2017 (has links) A large number of emerging infectious diseases (including influenza epidemics) has been identified during the last century. The emergence and re-emergence of infectious diseases have a negative impact on global health. Influenza epidemics alone cause between 3 and 5 million cases of severe illness annually, and between 250,000 and 500,000 deaths. In addition to the human suffering, influenza epidemics also impose heavy demands on the health care system. For example, hospitals and intensive care units have limited excess capacity during infectious diseases epidemics. Therefore, it is important that increased influenza activity is noticed early at local levels to allow time to adjust primary care and hospital resources that are already under pressure. Algorithms for the detection and prediction of influenza epidemics are essential components to achieve this. Although a large number of studies have reported algorithms for detection or prediction of influenza epidemics, outputs that fulfil standard criteria for operational readiness are seldom produced. Furthermore, in the light of the rapidly growing availability of “Big Data” from both diagnostic and prediagnostic (syndromic) data sources in health care and public health settings, a new generation of epidemiologic and statistical methods, using several data sources, is desired for reliable analyses and modeling. The rationale for this thesis was to inform the planning of local response measures and adjustments to health care capacity during influenza epidemics. The overall aim was to develop a method for detection and prediction of influenza epidemics. Before developing the method, three preparatory studies were performed. In the first of these studies, the associations (in terms of correlation) between diagnostic and pre-diagnostic data sources were examined, with the aim of investigating the potential of these sources for use in influenza surveillance systems. In the second study, a literature study of detection and prediction algorithms used in the field of influenza surveillance was performed. In the third study, the algorithms found in the previous study were compared in a prospective evaluation study. In the fourth study, a method for nowcasting of influenza activity was developed using electronically available data for real-time surveillance in local settings followed by retrospective application on the same data. This method includes three functions: detection of the start of the epidemic at the local level and predictions of the peak timing and the peak intensity. In the fifth and final study, the nowcasting method was evaluated by prospective application on authentic data from Östergötland County, Sweden. In the first study, correlations with large effect sizes between diagnostic and pre-diagnostic data were found, indicating that pre-diagnostic data sources have potential for use in influenza surveillance systems. However, it was concluded that further longitudinal research incorporating prospective evaluations is required before these sources can be used for this purpose. In the second study, a meta-narrative review approach was used in which two narratives for reporting prospective evaluation of influenza detection and prediction algorithms were identified: the biodefence informatics narrative and the health policy research narrative. As a result of the promising performances of one detection algorithm and one prediction algorithm in the third study, it was concluded that both further evaluation research and research on methods for nowcasting of influenza activity were warranted. In the fourth study, the performance of the nowcasting method was promising when applied on retrospective data but it was concluded that thorough prospective evaluations are necessary before recommending the method for broader use. In the fifth study, the performance of the nowcasting method was promising when prospectively applied on authentic data, implying that the method has potential for routine use. In future studies, the validity of the nowcasting method must be investigated by application and further evaluation in multiple local settings, including large urbanizations. Biomedical Laboratory Science/Technology Bioinformatics and Systems Biology Bioinformatik och systembiologi Computer Science Datavetenskap (datalogi)
148	Introducing quality assessment and efficient management of cellular thermal shift assay mass spectrometry data Hellner, Joakim January 2017 (has links) Recent advances in molecular biology has led to the discovery of many new potential drugs. However, difficulties with in situ analysis of ligand binding prevents quick advancement in clinical trials, which stresses the need for better direct methods. A relatively new methodology, called Cellular Thermal Shift Assay (CETSA), allows for detection of ligand binding in a cells natural environment and can be used in combination with Mass Spectrometry (MS) for readout. With help from the Pelago Bioscience team, I developed a pipeline for processing of CETSA MS data and a web based system for viewing the results. The system, called CETSA Analytics, also evaluates the results relevance and helps its users to locate information efficiently. CETSA Analytics is currently being tested by Pelago Bioscience AB as a tool for experimental data distribution. Cellular thermal shift assay Information management system Quality Assessent Melt curve Target engagement Mass spectrometry Information Systems Bioinformatics and Systems Biology Bioinformatik och systembiologi Engineering and Technology Teknik och teknologier
149	Development of an API for creating and editing openEHR archetypes Klasson, Filip, Väyrynen, Patrik January 2009 (has links) Archetypes are used to standardize a way of creating, presenting and distributing health care data. In this master thesis project the open specifications of openEHR was followed. The objective of this master thesis project has been to develop a Java based API for creating and editing openEHR archetypes. The API is a programming toolbox that can be used when developing archetype editors. Another purpose has been to implement validation functionality for archetypes. An important aspect is that the functionality of the API is well documented, this is important to ease the understanding of the system for future developers. The result was a Java based API that is a platform for future archetype editors. The API-kernel has optional immutability so developed archetypes can be locked for modification by making them immutable. The API is compatible with the openEHR specifications 1.0.1, it can load and save archetypes in ADL (Archetype Definition Language) format. There is also a validation feature that verifies that the archetype follows the right structure with respect to predefined reference models. This master thesis report also presents a basic GUI proposal. openEHR archetype archetypes API validation immutability mutability mutable immutable ADL AOM LinkEHR AM IM RM dADL cADL Computer and Information Sciences Data- och informationsvetenskap Bioinformatics and Systems Biology Bioinformatik och systembiologi Computer Engineering Datorteknik Software Engineering Programvaruteknik
150	Global functional association network inference and crosstalk analysis for pathway annotation Ogris, Christoph January 2017 (has links) Cell functions are steered by complex interactions of gene products, like forming a temporary or stable complex, altering gene expression or catalyzing a reaction. Mapping these interactions is the key in understanding biological processes and therefore is the focus of numerous experiments and studies. Small-scale experiments deliver high quality data but lack coverage whereas high-throughput techniques cover thousands of interactions but can be error-prone. Unfortunately all of these approaches can only focus on one type of interaction at the time. This makes experimental mapping of the genome-wide network a cost and time intensive procedure. However, to overcome these problems, different computational approaches have been suggested that integrate multiple data sets and/or different evidence types. This widens the stringent definition of an interaction and introduces a more general term - functional association. FunCoup is a database for genome-wide functional association networks of Homo sapiens and 16 model organisms. FunCoup distinguishes between five different functional associations: co-membership in a protein complex, physical interaction, participation in the same signaling cascade, participation in the same metabolic process and for prokaryotic species, co-occurrence in the same operon. For each class, FunCoup applies naive Bayesian integration of ten different evidence types of data, to predict novel interactions. It further uses orthologs to transfer interaction evidence between species. This considerably increases coverage, and allows inference of comprehensive networks even for not well studied organisms. BinoX is a novel method for pathway analysis and determining the relation between gene sets, using functional association networks. Traditionally, pathway annotation has been done using gene overlap only, but these methods only get a small part of the whole picture. Placing the gene sets in context of a network provides additional evidence for pathway analysis, revealing a global picture based on the whole genome. PathwAX is a web server based on the BinoX algorithm. A user can input a gene set and get online network crosstalk based pathway annotation. PathwAX uses the FunCoup networks and 280 pre-defined pathways. Most runs take just a few seconds and the results are summarized in an interactive chart the user can manipulate to gain further insights of the gene set's pathway associations. / <p>At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 2: Manuscript.</p> biological networks global gene association networks gene networks protein networks functional association functional coupling network biology pathway analysis pathway annotation pathway enrichment network-based enrichment enrichment Bioinformatics and Systems Biology Bioinformatik och systembiologi

Search results