Global ETD Search

131	Structural Heterogeneity Analysis of Crystallographic Fragment Screening Data: A Statistical Approach Strid Holmertz, Ylva January 2024 (has links) Understanding and exploring macromolecular dynamics is crucial for drug development, studying biological functions, and protein engineering, and thus remain a core focus in structural biology. However, revealing partially-occupied states in X-ray crystallography faces challenges due to inherent averaging over a population of proteins in different states, obscuring low-occupancy states in experimentally determined electron densities. One way forward is offered by analysing multiple datasets together, in particular the variations of occupancies found between equivalent datasets, i.e. data collected from different crystals grown in equivalent experimental conditions. In this thesis, I consider real datasets as realisations of a distribution of possible occupancy sets, and try to fit a model against this data using maximum likelihood methods. The resulting network outputs maps resembling electron density maps, containing small differences but does not yet reveal underlying states for the residues. This is believed to be caused by a local minimum, as the model optimises the maps to replicate an average of the real data. Bioinformatics Machine learning Statistics Maximum likelihood Structural Biology Crystallography Bioinformatik Statistik Maskininlärning Strukturbiologi Bioinformatics and Systems Biology Bioinformatik och systembiologi
132	Novel resources enabling comparative regulomics in forest tree species / Nya verktyg för komparativ regulomik i skogsträd Sundell, David January 2017 (has links) Lignocellulosic plants are the most abundant source of terrestrial biomass and are one of the potential sources of renewable energy that can replace the use of fossil fuels. For a country such as Sweden, where the forest industry accounts for 10% of the total export, there would be large economical benefits associated with increased biomass yield. The availability of research on wood development conducted in conifer tree species, which represent the majority of the forestry in Sweden, is limited and the majority of research has been conducted in model angiosperm species such as Arabidopsis thaliana. However, the large evolutionary distance between angiosperms and gymnosperms limits the possibility to identify orthologous genes and regulatory pathways by comparing sequence similarity alone. At such large evolutionary distances, the identification of gene similarity is, in most cases, not sufficient and additional information is required for functional annotation. In this thesis, two high-spatial resolution datasets profiling wood development were processed; one from the angiosperm tree Populus tremula and the other from the conifer species Picea abies. These datasets were each published together with a web resource including tools for the exploration of gene expression, co-expression and functional enrichment of gene sets. One developed resource allows interactive, comparative co-expression analysis between species to identify conserved and diverged co-expression modules. These tools make it possible to identifying conserved regulatory modules that can focus downstream research and provide biologists with a resource to identify regulatory genes for targeted trait improvement. / Lignocellulosa är den vanligast förekommande källan till markburen biomassa och är en av de förnybara energikällor som potentiellt kan ersätta användningen av fossila bränslen. För ett land som Sverige, där skogsindustrin som står för 10 \% av den totala exporten, skulle därför en ökad produktion av biomassa kunna ge stora ekonomiska fördelar. Forskningen på barrträd, som utgör majoriteten av svensk skog är begränsad och den huvudsakliga forskningen som har bedrivits på växter, har skett i modell organismer tillhörande gruppen gömfröiga växter som till exempel i Arabidopsis thaliana. Det evolutionära avståndet mellan gömfröiga (blommor och träd) och nakenfröiga (gran och tall) begränsar dock möjligheten att identifiera regulatoriska system mellan dessa grupper. Vid sådana stora evolutionära avstånd krävs det mer än att bara identifiera en gen i en modellorganism utan ytterligare information krävs som till exempel genuttrycksdata. I denna avhandling har två högupplösta experiment som profilerar vedens utveckling undersökts; ett från gömfröiga träd Populus tremula och det andra från nakenföriga träd (barrträd) Picea abies. Datat som behandlats har publicerats tillsammans med webbsidor med flera olika verktyg för att bland annat visa genuttryck, se korrelationer av genuttryck och test för anrikning av funktionella gener i en grupp. En resurs som utvecklats tillåter interaktiva jämförelser av korrelationer mellan arter för att kunna identifiera moduler (grupper av gener) som bevaras eller skilts åt mellan arter över tid. Identifieringen av sådana bevarade moduler kan hjälpa att fokusera framtida forskning samt ge biologer en möjlighet att identifiera regulatoriska gener för en riktad förbättring av egenskaper hos träd. Comparative genomics Web resource Wood development RNA-Seq Forestry Lignocellulose Regulomics High-spatial resolution Populus tremula Picea abies Orthology. Bioinformatics and Systems Biology Bioinformatik och systembiologi
133	Microbial DNA Sequencing in Environmental Studies Hu, Yue January 2017 (has links) The field of microbial ecology has just entered a new era of rapid technological development and generation of big data. The high-throughput sequencing techniques presently available provide an opportunity to extensively inventorize the blueprints of life. Now, millions of microbes of natural microbial communities can be studied simultaneously without prior cultivation. New species and new functions (genes) can be discovered just by mining sequencing data. However, there is still a tremendous number of microorganisms not yet examined, nor are the ecosystem functions these carry out. The modern genomic technologies can contribute to solve environmental problems and help us understand ecosystems, but to most efficiently do so, methods need to be continuously optimised. During my Ph. D. studies, I developed a method to survey eukaryotic microbial diversity with a higher accuracy, and applied various sequencing-based approaches in an attempt to answer questions of importance in environmental research and ecology. In PAPER-I, we developed a set of 18S rRNA gene PCR primers with high taxonomic coverage, meeting the requirements of currently popular sequencing technologies and matching the richness of 18S rRNA reference sequences accumulated so far. In PAPER-II, we conducted the first sequencing-based spatial survey on the combined eukaryotic and bacterial planktonic community in the Baltic Sea to uncover the relationship of microbial diversity and environmental conditions. Here, the 18S primers designed in PAPER-I and a pair of broad-coverage 16S primers were employed to target the rRNA genes of protists and bacterioplankton for amplicon sequencing. In PAPER-III, we integrated metagenomic, metabarcoding, and metatranscriptomic data in an effort to scrutinise the protein synthesis potential (i.e., activity) of microbes in the sediment at a depth of 460 m in the Baltic Sea and, thus, disclosing microbial diversity and their possible ecological functions within such an extreme environment. Lastly, in PAPER-IV, we compared the performance of E. coli culturing, high-throughput sequencing, and portable real-time sequencing in tracking wastewater contamination in an urban stormwater system. From the aspects of cost, mobility and accuracy, we evaluated the usage of sequencing-based approaches in civil engineering, and for the first time, validated the real-time sequencing device in use within water quality monitoring. In summary, these studies demonstrate how DNA sequencing of microbial communities can be applied in environmental monitoring and ecological research. / <p>Yue Hu was supported by a scholarship from the China Scholarship Council (CSC #201206950024)</p><p>Yue Hu has been publishing papers under the name "Yue O. O. Hu".</p><p>QC 20170403</p> Microbiology Mikrobiologi Civil Engineering Samhällsbyggnadsteknik Ecology Ekologi Bioinformatics and Systems Biology Bioinformatik och systembiologi
134	Deciphering the ontogeny of unmutated and mutated subsets of Chronic Lymphocytic Leukemia Mohamed, Ahmed January 2019 (has links) Chronic Lymphocytic Leukemia (CLL) is a type of cancer that affects the B cells of the immune system causing problems in the process of producing antibodies. It can be sorted into mutated and unmutated CLL based on the percentage of somatic mutations in the Immunoglobulin Heavy chain Variable region (IgHV). The B cells of healthy individuals can be sorted into three groups; CD27dull memory B cells (MBCs), CD27bright MBCs and naïve B cells. The hypothesis for the project was that the unmutated CLL subset originates from CD27dull MBCs and the mutated CLL subset originates from CD27bright MBCs. RNA-sequencing data from healthy individuals were acquired from a collaboration partner in Rome and CLL-patients were collected from public datasets available online. Several bioinformatic tools were used to analyze the data. First, the quality of the data files was checked, then adapter sequence from the sequencing process and low-quality bases were removed (trimming). Good quality of the files was confirmed after the trimming. Secondly, these files were mapped against the human reference genome (GRCh38/hg38) for alignment, then the resulted data was used to check for genes that showed differential expression between the different groups. Results were analyzed and visualized using Venn diagrams, Principal Component Analysis (PCA) and heatmap plots and random forest. A list of 85 genes was generated based on the different comparisons and was used in one PCA plot that showed clear separation between the different groups. The SWAP70 gene was analyzed for single nucleotide polymorphisms (SNPs). The study concluded five genes that could be used as biomarkers for CLL and the diagnosis of its subtypes where some of them were discussed in previous studies. Also, the mutated CLL subset showed a similar behavior to the healthy individuals and this could validate the original hypothesis and justifies the better disease prognosis for this subtype. CLL NGS mutated CLL unmutated CLL CD27 bright CD27 dull memory B cell RNA-sequencing Bioinformatics and Systems Biology Bioinformatik och systembiologi Immunology Immunologi
135	Evaluating the biological relevance of disease consensus modules : An in silico study of IBD pathology using a bioinformatics approach Ströbaek, Joel January 2019 (has links) Inflammatory bowel disease encompasses a variety of heterogeneous chronic inflammatory diseases that affect the gastrointestinal tract, where Crohn’s disease and ulcerative colitis are the principal examples. The etiology of these, and many other complex human diseases, remain largely unknown and therefore pose relevant targets for novel research strategies. One such strategy is the in silico application of network theory derived methods to data sourced from publicly available repositories of e.g. gene expression data. Specifically, methods generating graphs of interconnected elements enriched by differentially expressed genes—disease modules—were inferred with data available through the Gene Expression Omnibus. Based on a previous method, the current project aimed to evaluate disease modules, combined from stand-alone inferential methods, in disease consensus modules: representing pathophenotypical motifs for the diseases of interest. The modules found to be significantly enriched by genome-wide association study inferred single-nucleotide polymorphisms, as validated using the Pathway Scoring Algorithm, were subsequently subjects for further analysis using Kyoto Encyclopedia of Genes and Genomes-pathway enrichment, and literature searches. The results of this study adheres to previous findings relating to the employed method, but lack any novelty pertaining the diseases of interest. However, the results substantiate the preceding methods’ conclusion by including parameters that increase statistical validity. In addition, the study contributed to peripheral results concerning both the methodology of consensus module methods, and the elucidation of inflammatory bowel disease etiology and disease subtype differentiation, that pose interesting subjects for future investigation. Bioinformatics Disease module Consensus module Disease consensus module Network medicine Inflammatory bowel disease Crohn's disease Ulcerative colitis S2B NSC Tetralith MODifieR Bioinformatics and Systems Biology Bioinformatik och systembiologi
136	Identification of personalized multi-omic disease modules in asthma Martínez Enguita, David January 2018 (has links) Asthma is a respiratory syndrome associated with airflow limitation, bronchial hyperresponsiveness and inflammation of the airways in the lungs. Despite the ongoing research efforts, the outstanding heterogeneity displayed by the multiple forms in which this condition presents often hampers the attempts to determine and classify the phenotypic and endotypic biological structures at play, even when considering a limited assembly of asthmatic subjects. To increase our understanding of the molecular mechanisms and functional pathways that govern asthma from a systems medicine perspective, a computational workflow focused on the identification of personalized transcriptomic modules from the U-BIOPRED study cohorts, by the use of the novel MODifieR integrated R package, was designed and applied. A feature selection of candidate asthma biomarkers was implemented, accompanied by the detection of differentially expressed genes across sample categories, the production of patient-specific gene modules and the subsequent construction of a set of core disease modules of asthma, which were validated with genomic data and analyzed for pathway and disease enrichment. The results indicate that the approach utilized is able to reveal the presence of components and signaling routes known to be crucially involved in asthma pathogenesis, while simultaneously uncovering candidate genes closely linked to the latter. The present project establishes a valuable pipeline for the module-driven study of asthma and other related conditions, which can provide new potential targets for therapeutic intervention and contribute to the development of individualized treatment strategies. modules asthma personalized systems biology molecular biology transcriptomics genomics networks disease biomarker microarray individualized endotype phenotype Bioinformatics and Systems Biology Bioinformatik och systembiologi
137	Proteus : A new predictor for protean segments Söderquist, Fredrik January 2015 (has links) The discovery of intrinsically disordered proteins has led to a paradigm shift in protein science. Many disordered proteins have regions that can transform from a disordered state to an ordered. Those regions are called protean segments. Many intrinsically disordered proteins are involved in diseases, including Alzheimer's disease, Parkinson's disease and Down's syndrome, which makes them prime targets for medical research. As protean segments often are the functional part of the proteins, it is of great importance to identify those regions. This report presents Proteus, a new predictor for protean segments. The predictor uses Random Forest (a decision tree ensemble classifier) and is trained on features derived from amino acid sequence and conservation data. Proteus compares favourably to state of the art predictors and performs better than the competition on all four metrics: precision, recall, F1 and MCC. The report also looks at the differences between protean and non-protean regions and how they differ between the two datasets that were used to train the predictor. bioinformatics protein machine learning predictor protean segments molecular recognition feature intrinsically disordered proteins proteus Bioinformatics and Systems Biology Bioinformatik och systembiologi Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
138	Model-Based Hypothesis Testing in Biomedicine : How Systems Biology Can Drive the Growth of Scientific Knowledge Johansson, Rikard January 2017 (has links) The utilization of mathematical tools within biology and medicine has traditionally been less widespread compared to other hard sciences, such as physics and chemistry. However, an increased need for tools such as data processing, bioinformatics, statistics, and mathematical modeling, have emerged due to advancements during the last decades. These advancements are partly due to the development of high-throughput experimental procedures and techniques, which produce ever increasing amounts of data. For all aspects of biology and medicine, these data reveal a high level of inter-connectivity between components, which operate on many levels of control, and with multiple feedbacks both between and within each level of control. However, the availability of these large-scale data is not synonymous to a detailed mechanistic understanding of the underlying system. Rather, a mechanistic understanding is gained first when we construct a hypothesis, and test its predictions experimentally. Identifying interesting predictions that are quantitative in nature, generally requires mathematical modeling. This, in turn, requires that the studied system can be formulated into a mathematical model, such as a series of ordinary differential equations, where different hypotheses can be expressed as precise mathematical expressions that influence the output of the model. Within specific sub-domains of biology, the utilization of mathematical models have had a long tradition, such as the modeling done on electrophysiology by Hodgkin and Huxley in the 1950s. However, it is only in recent years, with the arrival of the field known as systems biology that mathematical modeling has become more commonplace. The somewhat slow adaptation of mathematical modeling in biology is partly due to historical differences in training and terminology, as well as in a lack of awareness of showcases illustrating how modeling can make a difference, or even be required, for a correct analysis of the experimental data. In this work, I provide such showcases by demonstrating the universality and applicability of mathematical modeling and hypothesis testing in three disparate biological systems. In Paper II, we demonstrate how mathematical modeling is necessary for the correct interpretation and analysis of dominant negative inhibition data in insulin signaling in primary human adipocytes. In Paper III, we use modeling to determine transport rates across the nuclear membrane in yeast cells, and we show how this technique is superior to traditional curve-fitting methods. We also demonstrate the issue of population heterogeneity and the need to account for individual differences between cells and the population at large. In Paper IV, we use mathematical modeling to reject three hypotheses concerning the phenomenon of facilitation in pyramidal nerve cells in rats and mice. We also show how one surviving hypothesis can explain all data and adequately describe independent validation data. Finally, in Paper I, we develop a method for model selection and discrimination using parametric bootstrapping and the combination of several different empirical distributions of traditional statistical tests. We show how the empirical log-likelihood ratio test is the best combination of two tests and how this can be used, not only for model selection, but also for model discrimination. In conclusion, mathematical modeling is a valuable tool for analyzing data and testing biological hypotheses, regardless of the underlying biological system. Further development of modeling methods and applications are therefore important since these will in all likelihood play a crucial role in all future aspects of biology and medicine, especially in dealing with the burden of increasing amounts of data that is made available with new experimental techniques. / Användandet av matematiska verktyg har inom biologi och medicin traditionellt sett varit mindre utbredd jämfört med andra ämnen inom naturvetenskapen, såsom fysik och kemi. Ett ökat behov av verktyg som databehandling, bioinformatik, statistik och matematisk modellering har trätt fram tack vare framsteg under de senaste decennierna. Dessa framsteg är delvis ett resultat av utvecklingen av storskaliga datainsamlingstekniker. Inom alla områden av biologi och medicin så har dessa data avslöjat en hög nivå av interkonnektivitet mellan komponenter, verksamma på många kontrollnivåer och med flera återkopplingar både mellan och inom varje nivå av kontroll. Tillgång till storskaliga data är emellertid inte synonymt med en detaljerad mekanistisk förståelse för det underliggande systemet. Snarare uppnås en mekanisk förståelse först när vi bygger en hypotes vars prediktioner vi kan testa experimentellt. Att identifiera intressanta prediktioner som är av kvantitativ natur, kräver generellt sett matematisk modellering. Detta kräver i sin tur att det studerade systemet kan formuleras till en matematisk modell, såsom en serie ordinära differentialekvationer, där olika hypoteser kan uttryckas som precisa matematiska uttryck som påverkar modellens output. Inom vissa delområden av biologin har utnyttjandet av matematiska modeller haft en lång tradition, såsom den modellering gjord inom elektrofysiologi av Hodgkin och Huxley på 1950‑talet. Det är emellertid just på senare år, med ankomsten av fältet systembiologi, som matematisk modellering har blivit ett vanligt inslag. Den något långsamma adapteringen av matematisk modellering inom biologi är bl.a. grundad i historiska skillnader i träning och terminologi, samt brist på medvetenhet om exempel som illustrerar hur modellering kan göra skillnad och faktiskt ofta är ett krav för en korrekt analys av experimentella data. I detta arbete tillhandahåller jag sådana exempel och demonstrerar den matematiska modelleringens och hypotestestningens allmängiltighet och tillämpbarhet i tre olika biologiska system. I Arbete II visar vi hur matematisk modellering är nödvändig för en korrekt tolkning och analys av dominant-negativ-inhiberingsdata vid insulinsignalering i primära humana adipocyter. I Arbete III använder vi modellering för att bestämma transporthastigheter över cellkärnmembranet i jästceller, och vi visar hur denna teknik är överlägsen traditionella kurvpassningsmetoder. Vi demonstrerar också frågan om populationsheterogenitet och behovet av att ta hänsyn till individuella skillnader mellan celler och befolkningen som helhet. I Arbete IV använder vi matematisk modellering för att förkasta tre hypoteser om hur fenomenet facilitering uppstår i pyramidala nervceller hos råttor och möss. Vi visar också hur en överlevande hypotes kan beskriva all data, inklusive oberoende valideringsdata. Slutligen utvecklar vi i Arbete I en metod för modellselektion och modelldiskriminering med hjälp av parametrisk ”bootstrapping” samt kombinationen av olika empiriska fördelningar av traditionella statistiska tester. Vi visar hur det empiriska ”log-likelihood-ratio-testet” är den bästa kombinationen av två tester och hur testet är applicerbart, inte bara för modellselektion, utan också för modelldiskriminering. Sammanfattningsvis är matematisk modellering ett värdefullt verktyg för att analysera data och testa biologiska hypoteser, oavsett underliggande biologiskt system. Vidare utveckling av modelleringsmetoder och tillämpningar är därför viktigt eftersom dessa sannolikt kommer att spela en avgörande roll i framtiden för biologi och medicin, särskilt när det gäller att hantera belastningen från ökande datamängder som blir tillgänglig med nya experimentella tekniker. systems biology modeling ODE hypothesis testing falsificationism insulin signaling yeast population heterogeneity cell-to-cell variation facilitation pyramidal synaptic bootstrapping personalized medicine omics Bioinformatics and Systems Biology Bioinformatik och systembiologi
139	ARG-MATEE Automated Pipeline for Detection of Antimicrobial Resistance in WGS Data Collected from Pig Farms and Surrounding Communities / Tracking Antimicrobial Resistance at Pig Farms Halstead, Holly January 2020 (has links) As part of recognizing the interconnected nature of different sectors in relation to health, AMR (antimicrobial resistance) has emerged as an issue of high global importance. E. coli isolates were taken from pig farms in Thailand, which serves as a point of interest in the study of ARGs (antimicrobial resistance genes) in emerging economies. The fecal samples were collected from pigs, humans who came in contact with the pigs, and humans who did not have contact with pigs to be analyzed for ARGS, virulence genes, and plasmids. Data was analyzed with an automated pipeline in the form of ARG-MATEE, the Antimicrobial Resistance Gene Multi-Analysis Tool for Enteric E. coli, a tool designed in this study to be used here and in future investigations. ARG-MATEE regulates and records internal software versions in a produced report which also includes data tables for all non phylogeny results in Boyce–Codd normal form and data visualizations for plasmids, ARGs, virulence genes, and phylogeny. Through the use of ARG-MATEE, the iss virulence gene was seen to be significantly different between testing groups as it is present in only human testing groups, suggesting the loss of function of the iss gene in pigs, showing host specialization. Bioinformatics Microbiology Pipeline Snakemake ARG-MATEE Antimicrobial Resistance Pig Farms Conda iss virulence gene Thailand surveillance escherichia coli E. coli klebsiella pneumoniae One Health Bioinformatics and Systems Biology Bioinformatik och systembiologi
140	Development of a DNA barcode for species identification of tuna Nordquist, Clara, Edwall, Jonathan, Eriksson, Leonora, Mäkinen, Nelly, Sayehban, Minna, Styfberg, Matilda January 2022 (has links) Today, DNA-barcoding with the gene COI is regularly used in the identification of fish. However, this is not an adequate way of identifying species of tuna due to COI lacking sufficient interspecies divergence. This is problematic since fraud and mislabeling are a major concern within the fish and tuna industries. Thus, there is a need for a new genetic barcode region when identifying the 15 tuna species within the tribe Thunnini. This study has considered six mitochondrial genetic regions (16S, ATP8, COII, CR, CytB, and ND2) and their potential as barcodes in comparison to COI. To be of practical use, the barcode has to be able to differentiate between all 15 tuna species, as well as contain conserved primer binding sites and be approximately 400 bp, or shorter. Analyses of the regions were made through Multiple Sequence Alignments built using ClustalW in Mega 11.0. The candidates were first evaluated through neighbor-joining trees and plots of inter- and intraspecies variation, and then analyzed further in search of conserved regions for primer binding, flanking a segment of approximately 400 bp (or shorter). This resulted in two possible barcode candidates with corresponding primers from the CR and ND2 genes. As a final step, these two were analyzed for specificity using BLAST, to evaluate their actual utility in differentiating the tuna species. The results show that they both can identify the different tuna species, but that ND2 is superior with 100% identification accuracy. In addition to the theoretical analysis, the ability of the primers was measured through a real PCR amplification. Unfortunately, only the CR barcode could be evaluated, but the results show it to be practically useful. Even though the utility of ND2 in PCR could not be analyzed, it is highly recommended as a region for further investigations. Given the strong theoretical support, it definitely shows promise as a new barcode for species identification of tuna. Thunnini D-loop mitochondrial control region CR 16S ATP8 COII Cox2 CytB COI Cox1 ND2 NADH dehydrogenase 2 Bioinformatics and Systems Biology Bioinformatik och systembiologi

Search results