Spelling suggestions: "subject:"bioinformatics anda lemsystems biology"" "subject:"bioinformatics anda lemsystems ciology""
131 |
Structural Heterogeneity Analysis of Crystallographic Fragment Screening Data: A Statistical ApproachStrid Holmertz, Ylva January 2024 (has links)
Understanding and exploring macromolecular dynamics is crucial for drug development, studying biological functions, and protein engineering, and thus remain a core focus in structural biology. However, revealing partially-occupied states in X-ray crystallography faces challenges due to inherent averaging over a population of proteins in different states, obscuring low-occupancy states in experimentally determined electron densities. One way forward is offered by analysing multiple datasets together, in particular the variations of occupancies found between equivalent datasets, i.e. data collected from different crystals grown in equivalent experimental conditions. In this thesis, I consider real datasets as realisations of a distribution of possible occupancy sets, and try to fit a model against this data using maximum likelihood methods. The resulting network outputs maps resembling electron density maps, containing small differences but does not yet reveal underlying states for the residues. This is believed to be caused by a local minimum, as the model optimises the maps to replicate an average of the real data.
|
132 |
Novel resources enabling comparative regulomics in forest tree species / Nya verktyg för komparativ regulomik i skogsträdSundell, David January 2017 (has links)
Lignocellulosic plants are the most abundant source of terrestrial biomass and are one of the potential sources of renewable energy that can replace the use of fossil fuels. For a country such as Sweden, where the forest industry accounts for 10% of the total export, there would be large economical benefits associated with increased biomass yield. The availability of research on wood development conducted in conifer tree species, which represent the majority of the forestry in Sweden, is limited and the majority of research has been conducted in model angiosperm species such as Arabidopsis thaliana. However, the large evolutionary distance between angiosperms and gymnosperms limits the possibility to identify orthologous genes and regulatory pathways by comparing sequence similarity alone. At such large evolutionary distances, the identification of gene similarity is, in most cases, not sufficient and additional information is required for functional annotation. In this thesis, two high-spatial resolution datasets profiling wood development were processed; one from the angiosperm tree Populus tremula and the other from the conifer species Picea abies. These datasets were each published together with a web resource including tools for the exploration of gene expression, co-expression and functional enrichment of gene sets. One developed resource allows interactive, comparative co-expression analysis between species to identify conserved and diverged co-expression modules. These tools make it possible to identifying conserved regulatory modules that can focus downstream research and provide biologists with a resource to identify regulatory genes for targeted trait improvement. / Lignocellulosa är den vanligast förekommande källan till markburen biomassa och är en av de förnybara energikällor som potentiellt kan ersätta användningen av fossila bränslen. För ett land som Sverige, där skogsindustrin som står för 10 \% av den totala exporten, skulle därför en ökad produktion av biomassa kunna ge stora ekonomiska fördelar. Forskningen på barrträd, som utgör majoriteten av svensk skog är begränsad och den huvudsakliga forskningen som har bedrivits på växter, har skett i modell organismer tillhörande gruppen gömfröiga växter som till exempel i Arabidopsis thaliana. Det evolutionära avståndet mellan gömfröiga (blommor och träd) och nakenfröiga (gran och tall) begränsar dock möjligheten att identifiera regulatoriska system mellan dessa grupper. Vid sådana stora evolutionära avstånd krävs det mer än att bara identifiera en gen i en modellorganism utan ytterligare information krävs som till exempel genuttrycksdata. I denna avhandling har två högupplösta experiment som profilerar vedens utveckling undersökts; ett från gömfröiga träd Populus tremula och det andra från nakenföriga träd (barrträd) Picea abies. Datat som behandlats har publicerats tillsammans med webbsidor med flera olika verktyg för att bland annat visa genuttryck, se korrelationer av genuttryck och test för anrikning av funktionella gener i en grupp. En resurs som utvecklats tillåter interaktiva jämförelser av korrelationer mellan arter för att kunna identifiera moduler (grupper av gener) som bevaras eller skilts åt mellan arter över tid. Identifieringen av sådana bevarade moduler kan hjälpa att fokusera framtida forskning samt ge biologer en möjlighet att identifiera regulatoriska gener för en riktad förbättring av egenskaper hos träd.
|
133 |
Microbial DNA Sequencing in Environmental StudiesHu, Yue January 2017 (has links)
The field of microbial ecology has just entered a new era of rapid technological development and generation of big data. The high-throughput sequencing techniques presently available provide an opportunity to extensively inventorize the blueprints of life. Now, millions of microbes of natural microbial communities can be studied simultaneously without prior cultivation. New species and new functions (genes) can be discovered just by mining sequencing data. However, there is still a tremendous number of microorganisms not yet examined, nor are the ecosystem functions these carry out. The modern genomic technologies can contribute to solve environmental problems and help us understand ecosystems, but to most efficiently do so, methods need to be continuously optimised. During my Ph. D. studies, I developed a method to survey eukaryotic microbial diversity with a higher accuracy, and applied various sequencing-based approaches in an attempt to answer questions of importance in environmental research and ecology. In PAPER-I, we developed a set of 18S rRNA gene PCR primers with high taxonomic coverage, meeting the requirements of currently popular sequencing technologies and matching the richness of 18S rRNA reference sequences accumulated so far. In PAPER-II, we conducted the first sequencing-based spatial survey on the combined eukaryotic and bacterial planktonic community in the Baltic Sea to uncover the relationship of microbial diversity and environmental conditions. Here, the 18S primers designed in PAPER-I and a pair of broad-coverage 16S primers were employed to target the rRNA genes of protists and bacterioplankton for amplicon sequencing. In PAPER-III, we integrated metagenomic, metabarcoding, and metatranscriptomic data in an effort to scrutinise the protein synthesis potential (i.e., activity) of microbes in the sediment at a depth of 460 m in the Baltic Sea and, thus, disclosing microbial diversity and their possible ecological functions within such an extreme environment. Lastly, in PAPER-IV, we compared the performance of E. coli culturing, high-throughput sequencing, and portable real-time sequencing in tracking wastewater contamination in an urban stormwater system. From the aspects of cost, mobility and accuracy, we evaluated the usage of sequencing-based approaches in civil engineering, and for the first time, validated the real-time sequencing device in use within water quality monitoring. In summary, these studies demonstrate how DNA sequencing of microbial communities can be applied in environmental monitoring and ecological research. / <p>Yue Hu was supported by a scholarship from the China Scholarship Council (CSC #201206950024)</p><p>Yue Hu has been publishing papers under the name "Yue O. O. Hu".</p><p>QC 20170403</p>
|
134 |
Deciphering the ontogeny of unmutated and mutated subsets of Chronic Lymphocytic LeukemiaMohamed, Ahmed January 2019 (has links)
Chronic Lymphocytic Leukemia (CLL) is a type of cancer that affects the B cells of the immune system causing problems in the process of producing antibodies. It can be sorted into mutated and unmutated CLL based on the percentage of somatic mutations in the Immunoglobulin Heavy chain Variable region (IgHV). The B cells of healthy individuals can be sorted into three groups; CD27dull memory B cells (MBCs), CD27bright MBCs and naïve B cells. The hypothesis for the project was that the unmutated CLL subset originates from CD27dull MBCs and the mutated CLL subset originates from CD27bright MBCs. RNA-sequencing data from healthy individuals were acquired from a collaboration partner in Rome and CLL-patients were collected from public datasets available online. Several bioinformatic tools were used to analyze the data. First, the quality of the data files was checked, then adapter sequence from the sequencing process and low-quality bases were removed (trimming). Good quality of the files was confirmed after the trimming. Secondly, these files were mapped against the human reference genome (GRCh38/hg38) for alignment, then the resulted data was used to check for genes that showed differential expression between the different groups. Results were analyzed and visualized using Venn diagrams, Principal Component Analysis (PCA) and heatmap plots and random forest. A list of 85 genes was generated based on the different comparisons and was used in one PCA plot that showed clear separation between the different groups. The SWAP70 gene was analyzed for single nucleotide polymorphisms (SNPs). The study concluded five genes that could be used as biomarkers for CLL and the diagnosis of its subtypes where some of them were discussed in previous studies. Also, the mutated CLL subset showed a similar behavior to the healthy individuals and this could validate the original hypothesis and justifies the better disease prognosis for this subtype.
|
135 |
Evaluating the biological relevance of disease consensus modules : An in silico study of IBD pathology using a bioinformatics approachStröbaek, Joel January 2019 (has links)
Inflammatory bowel disease encompasses a variety of heterogeneous chronic inflammatory diseases that affect the gastrointestinal tract, where Crohn’s disease and ulcerative colitis are the principal examples. The etiology of these, and many other complex human diseases, remain largely unknown and therefore pose relevant targets for novel research strategies. One such strategy is the in silico application of network theory derived methods to data sourced from publicly available repositories of e.g. gene expression data. Specifically, methods generating graphs of interconnected elements enriched by differentially expressed genes—disease modules—were inferred with data available through the Gene Expression Omnibus. Based on a previous method, the current project aimed to evaluate disease modules, combined from stand-alone inferential methods, in disease consensus modules: representing pathophenotypical motifs for the diseases of interest. The modules found to be significantly enriched by genome-wide association study inferred single-nucleotide polymorphisms, as validated using the Pathway Scoring Algorithm, were subsequently subjects for further analysis using Kyoto Encyclopedia of Genes and Genomes-pathway enrichment, and literature searches. The results of this study adheres to previous findings relating to the employed method, but lack any novelty pertaining the diseases of interest. However, the results substantiate the preceding methods’ conclusion by including parameters that increase statistical validity. In addition, the study contributed to peripheral results concerning both the methodology of consensus module methods, and the elucidation of inflammatory bowel disease etiology and disease subtype differentiation, that pose interesting subjects for future investigation.
|
136 |
Towards systems pharmacology models of druggable targets and disease mechanismsKnight-Schrijver, Vincent January 2019 (has links)
The development of essential medicines is being slowed by a lack of efficiency in drug development as ninety per cent of drugs fail at some stage during clinical evaluation. This attrition in drug development is seen not because of a reduction in pharmaceutical research expenditure nor is it caused by a declining understanding of biology, if anything, these are both increasing. Instead, drugs are failing because we are unable to effectively predict how they will work before they are given to patients. This is due to limitations of the current methods used to evaluate a drug's toxicity and efficacy prior to its development. Quite simply, these methods do not account for the full complexity of biology in humans. Systems pharmacology models are a likely candidate for increasing the efficiency of drug discovery as they seek to comprehensively model the fundamental biology of disease mechanisms in a quantit- ative manner. They are computational models, designed and hailed as a strategy for making well-informed and cost effective decisions on drug viability and target druggability and therefore attempt to reduce this time-consuming and costly attrition. Using text mining and text classification I present a growing landscape of systems pharmacology models in literature growing from humble roots because of step-wise increases in our understanding of biology. Furthermore, I develop a case for the capability of systems pharmacology models in making predictions by constructing a model of interleukin-6 signalling for rheumatoid arthritis. This model shows that druggable target selection is not necessarily an intuitive task as it results in an emergent but unanswered hypothesis for safety concerns in a monoclonal antibody. Finally, I show that predictive classification models can also be used to explore gene expression data in a novel work flow by attempting to predict patient response classes to an influenza vaccine.
|
137 |
Characterisation of the tumour microenvironment in ovarian cancerJiménez Sánchez, Alejandro January 2019 (has links)
The tumour microenvironment comprises the non-cancerous cells present in the tumour mass (fibroblasts, endothelial, and immune cells), as well as signalling molecules and extracellular matrix. Tumour growth, invasion, metastasis, and response to therapy are influenced by the tumour microenvironment. Therefore, characterising the cellular and molecular components of the tumour microenvironment, and understanding how they influence tumour progression, represent a crucial aim for the success of cancer therapies. High-grade serous ovarian cancer provides an excellent opportunity to systematically study the tumour microenvironment due to its clinical presentation of advanced disseminated disease and debulking surgery being standard of care. This thesis first presents a case report of a long-term survivor (>10 years) of metastatic high-grade serous ovarian cancer who exhibited concomitant regression/progression of the metastatic lesions (5 samples). We found that progressing metastases were characterized by immune cell exclusion, whereas regressing metastases were infiltrated by CD8+ and CD4+ T cells. Through a T cell - neoepitope challenge assay we demonstrated that pre- dicted neoepitopes were recognised by the CD8+ T cells obtained from blood drawn from the patient, suggesting that regressing tumours were subjected to immune attack. Immune excluded tumours presented a higher expression of immunosuppressive Wnt signalling, while infiltrated tumours showed a higher expression of the T cell chemoattractant CXCL9 and evidence of immunoediting. These findings suggest that multiple distinct tumour immune microenvironments can co-exist within a single individual and may explain in part the hetero- geneous fates of metastatic lesions often observed in the clinic post-therapy. Second, this thesis explores the prevalence of intra-patient tumour microenvironment het- erogeneity in high-grade serous ovarian cancer at diagnosis (38 samples from 8 patients), as well as the effect of chemotherapy on the tumour microenvironment (80 paired samples from 40 patients). Whole transcriptome analysis and image-based quantification of T cells from treatment-naive tumours revealed highly prevalent variability in immune signalling and distinct immune microenvironments co-existing within the same individuals at diagnosis. ConsensusTME, a method that generates consensus immune and stromal cell gene signatures by intersecting state-of-the-art deconvolution methods that predict immune cell populations using bulk RNA data was developed. ConsensusTME improved accuracy and sensitivity of T cell and leukocyte deconvolutions in ovarian cancer samples. As previously observed in the case report, Wnt signalling expression positively correlated with immune cell exclusion. To evaluate the effect of chemotherapy on the tumour microenvironment, we compared site-matched and site-unmatched tumours before and after neoadjuvant chemotherapy. Site- matched samples showed increased cytotoxic immune activation and oligoclonal expansion of T cells after chemotherapy, unlike site-unmatched samples where heterogeneity could not be accounted for. In addition, low levels of immune activation pre-chemotherapy were found to be correlated with immune activation upon chemotherapy treatment. These results cor- roborate that the tumour-immune interface in advanced high-grade serous ovarian cancer is intrinsically heterogeneous, and that chemotherapy induces an immunogenic effect mediated by cytotoxic cells. Finally, the different deconvolution methods were benchmarked along with ConsensusTME in a pan-cancer setting by comparing deconvolution scores to DNA-based purity scores, leukocyte methylation data, and tumour infiltrating lymphocyte counts from image analysis. In so far as it has been benchmarked, unlike the other methods, ConsensusTME performs consistently among the top three methods across cancer-related benchmarks. Additionally, ConsensusTME provides a dynamic and evolvable framework that can integrate newer de- convolution tools and benchmark their performance against itself, thus generating an ever updated version. Overall, this thesis presents a systematic characterisation of the tumour microenvironment of high grade serous ovarian cancer in treatment-naive and chemotherapy treated samples, and puts forward the development of an integrative computational method for the systematic analysis of the tumour microenvironment of different tumour types using bulk RNA data.
|
138 |
Identification of personalized multi-omic disease modules in asthmaMartínez Enguita, David January 2018 (has links)
Asthma is a respiratory syndrome associated with airflow limitation, bronchial hyperresponsiveness and inflammation of the airways in the lungs. Despite the ongoing research efforts, the outstanding heterogeneity displayed by the multiple forms in which this condition presents often hampers the attempts to determine and classify the phenotypic and endotypic biological structures at play, even when considering a limited assembly of asthmatic subjects. To increase our understanding of the molecular mechanisms and functional pathways that govern asthma from a systems medicine perspective, a computational workflow focused on the identification of personalized transcriptomic modules from the U-BIOPRED study cohorts, by the use of the novel MODifieR integrated R package, was designed and applied. A feature selection of candidate asthma biomarkers was implemented, accompanied by the detection of differentially expressed genes across sample categories, the production of patient-specific gene modules and the subsequent construction of a set of core disease modules of asthma, which were validated with genomic data and analyzed for pathway and disease enrichment. The results indicate that the approach utilized is able to reveal the presence of components and signaling routes known to be crucially involved in asthma pathogenesis, while simultaneously uncovering candidate genes closely linked to the latter. The present project establishes a valuable pipeline for the module-driven study of asthma and other related conditions, which can provide new potential targets for therapeutic intervention and contribute to the development of individualized treatment strategies.
|
139 |
Proteus : A new predictor for protean segmentsSöderquist, Fredrik January 2015 (has links)
The discovery of intrinsically disordered proteins has led to a paradigm shift in protein science. Many disordered proteins have regions that can transform from a disordered state to an ordered. Those regions are called protean segments. Many intrinsically disordered proteins are involved in diseases, including Alzheimer's disease, Parkinson's disease and Down's syndrome, which makes them prime targets for medical research. As protean segments often are the functional part of the proteins, it is of great importance to identify those regions. This report presents Proteus, a new predictor for protean segments. The predictor uses Random Forest (a decision tree ensemble classifier) and is trained on features derived from amino acid sequence and conservation data. Proteus compares favourably to state of the art predictors and performs better than the competition on all four metrics: precision, recall, F1 and MCC. The report also looks at the differences between protean and non-protean regions and how they differ between the two datasets that were used to train the predictor.
|
140 |
Model-Based Hypothesis Testing in Biomedicine : How Systems Biology Can Drive the Growth of Scientific KnowledgeJohansson, Rikard January 2017 (has links)
The utilization of mathematical tools within biology and medicine has traditionally been less widespread compared to other hard sciences, such as physics and chemistry. However, an increased need for tools such as data processing, bioinformatics, statistics, and mathematical modeling, have emerged due to advancements during the last decades. These advancements are partly due to the development of high-throughput experimental procedures and techniques, which produce ever increasing amounts of data. For all aspects of biology and medicine, these data reveal a high level of inter-connectivity between components, which operate on many levels of control, and with multiple feedbacks both between and within each level of control. However, the availability of these large-scale data is not synonymous to a detailed mechanistic understanding of the underlying system. Rather, a mechanistic understanding is gained first when we construct a hypothesis, and test its predictions experimentally. Identifying interesting predictions that are quantitative in nature, generally requires mathematical modeling. This, in turn, requires that the studied system can be formulated into a mathematical model, such as a series of ordinary differential equations, where different hypotheses can be expressed as precise mathematical expressions that influence the output of the model. Within specific sub-domains of biology, the utilization of mathematical models have had a long tradition, such as the modeling done on electrophysiology by Hodgkin and Huxley in the 1950s. However, it is only in recent years, with the arrival of the field known as systems biology that mathematical modeling has become more commonplace. The somewhat slow adaptation of mathematical modeling in biology is partly due to historical differences in training and terminology, as well as in a lack of awareness of showcases illustrating how modeling can make a difference, or even be required, for a correct analysis of the experimental data. In this work, I provide such showcases by demonstrating the universality and applicability of mathematical modeling and hypothesis testing in three disparate biological systems. In Paper II, we demonstrate how mathematical modeling is necessary for the correct interpretation and analysis of dominant negative inhibition data in insulin signaling in primary human adipocytes. In Paper III, we use modeling to determine transport rates across the nuclear membrane in yeast cells, and we show how this technique is superior to traditional curve-fitting methods. We also demonstrate the issue of population heterogeneity and the need to account for individual differences between cells and the population at large. In Paper IV, we use mathematical modeling to reject three hypotheses concerning the phenomenon of facilitation in pyramidal nerve cells in rats and mice. We also show how one surviving hypothesis can explain all data and adequately describe independent validation data. Finally, in Paper I, we develop a method for model selection and discrimination using parametric bootstrapping and the combination of several different empirical distributions of traditional statistical tests. We show how the empirical log-likelihood ratio test is the best combination of two tests and how this can be used, not only for model selection, but also for model discrimination. In conclusion, mathematical modeling is a valuable tool for analyzing data and testing biological hypotheses, regardless of the underlying biological system. Further development of modeling methods and applications are therefore important since these will in all likelihood play a crucial role in all future aspects of biology and medicine, especially in dealing with the burden of increasing amounts of data that is made available with new experimental techniques. / Användandet av matematiska verktyg har inom biologi och medicin traditionellt sett varit mindre utbredd jämfört med andra ämnen inom naturvetenskapen, såsom fysik och kemi. Ett ökat behov av verktyg som databehandling, bioinformatik, statistik och matematisk modellering har trätt fram tack vare framsteg under de senaste decennierna. Dessa framsteg är delvis ett resultat av utvecklingen av storskaliga datainsamlingstekniker. Inom alla områden av biologi och medicin så har dessa data avslöjat en hög nivå av interkonnektivitet mellan komponenter, verksamma på många kontrollnivåer och med flera återkopplingar både mellan och inom varje nivå av kontroll. Tillgång till storskaliga data är emellertid inte synonymt med en detaljerad mekanistisk förståelse för det underliggande systemet. Snarare uppnås en mekanisk förståelse först när vi bygger en hypotes vars prediktioner vi kan testa experimentellt. Att identifiera intressanta prediktioner som är av kvantitativ natur, kräver generellt sett matematisk modellering. Detta kräver i sin tur att det studerade systemet kan formuleras till en matematisk modell, såsom en serie ordinära differentialekvationer, där olika hypoteser kan uttryckas som precisa matematiska uttryck som påverkar modellens output. Inom vissa delområden av biologin har utnyttjandet av matematiska modeller haft en lång tradition, såsom den modellering gjord inom elektrofysiologi av Hodgkin och Huxley på 1950‑talet. Det är emellertid just på senare år, med ankomsten av fältet systembiologi, som matematisk modellering har blivit ett vanligt inslag. Den något långsamma adapteringen av matematisk modellering inom biologi är bl.a. grundad i historiska skillnader i träning och terminologi, samt brist på medvetenhet om exempel som illustrerar hur modellering kan göra skillnad och faktiskt ofta är ett krav för en korrekt analys av experimentella data. I detta arbete tillhandahåller jag sådana exempel och demonstrerar den matematiska modelleringens och hypotestestningens allmängiltighet och tillämpbarhet i tre olika biologiska system. I Arbete II visar vi hur matematisk modellering är nödvändig för en korrekt tolkning och analys av dominant-negativ-inhiberingsdata vid insulinsignalering i primära humana adipocyter. I Arbete III använder vi modellering för att bestämma transporthastigheter över cellkärnmembranet i jästceller, och vi visar hur denna teknik är överlägsen traditionella kurvpassningsmetoder. Vi demonstrerar också frågan om populationsheterogenitet och behovet av att ta hänsyn till individuella skillnader mellan celler och befolkningen som helhet. I Arbete IV använder vi matematisk modellering för att förkasta tre hypoteser om hur fenomenet facilitering uppstår i pyramidala nervceller hos råttor och möss. Vi visar också hur en överlevande hypotes kan beskriva all data, inklusive oberoende valideringsdata. Slutligen utvecklar vi i Arbete I en metod för modellselektion och modelldiskriminering med hjälp av parametrisk ”bootstrapping” samt kombinationen av olika empiriska fördelningar av traditionella statistiska tester. Vi visar hur det empiriska ”log-likelihood-ratio-testet” är den bästa kombinationen av två tester och hur testet är applicerbart, inte bara för modellselektion, utan också för modelldiskriminering. Sammanfattningsvis är matematisk modellering ett värdefullt verktyg för att analysera data och testa biologiska hypoteser, oavsett underliggande biologiskt system. Vidare utveckling av modelleringsmetoder och tillämpningar är därför viktigt eftersom dessa sannolikt kommer att spela en avgörande roll i framtiden för biologi och medicin, särskilt när det gäller att hantera belastningen från ökande datamängder som blir tillgänglig med nya experimentella tekniker.
|
Page generated in 0.0903 seconds