Spelling suggestions: "subject:"bioinformatics (computational biology)"" "subject:"bioinformatics (computational ciology)""
81 |
Neural networks for imputation of missing genotype data : An alternative to the classical statistical methods in bioinformaticsAndersson, Alfred January 2020 (has links)
In this project, two different machine learning models were tested in an attempt at imputing missing genotype data from patients on two different panels. As the integrity of the patients had to be protected, initial training was done on data simulated from the 1000 Genomes Project. The first model consisted of two convolutional variational autoencoders and the latent representations of the networks were shuffled to force the networks to find the same patterns in the two datasets. This model was unfortunately unsuccessful at imputing the missing data. The second model was based on a UNet structure and was more successful at the task of imputation. This model had one encoder for each dataset, making each encoder specialized at finding patterns in its own data. Further improvements are required in order for the model to be fully capable at imputing the missing data.
|
82 |
Streamlining user processes for a general data repository for life science in accordance with the FAIR principlesAsklöf, Anna January 2021 (has links)
With the increasing amounts of data generated in life science, methods for data storage and sharing are being developed and implemented. Online data repositories are more and more commonly used for data sharing. The national Swedish platform Science of Life Laboratory has decided to use an institutional data repository as a mean to address the increasing amounts of data generated at the platform. In this project, the system used for the institutional repository at SciLifeLab was studied and compared to implementations of the same system at other institutions to create user documentation for the repository. This documentation was created with the FAIR principles as a guidance. Feedback on the guidelines were then sought from users and based on the received feedback, the user documentation was improved. Using a FAIR evaluation tool called FAIR evaluation services, items published on the repository were evaluated. Investigation of these results and their correlation to the items record on the repository were carried out. Out of ten evaluated datasets all except one scored exactly the same on the FAIR evaluation services tests. This could indicate that the test used is not evaluating aspects needed to encounter the differences in these published items. Based on this, conclusions as to in what extent user documentation can increase the FAIRness of data cannot be drawn.
|
83 |
Predicting safe drug combinations with Graph Neural Networks (GNN)Amanzadi, Amirhossein January 2021 (has links)
Many people - especially during their elderly - consume multiple drugs for the treatment of complex or co-existing diseases. Identifying side effects caused by polypharmacy is crucial for reducing mortality and morbidity of the patients which will lead to improvement in their quality of life. Since there is immense space for possible drug combinations, it is infeasible to examine them entirely in the lab. In silico models can offer a convenient solution, however, due to the lack of a sufficient amount of homogenous data it is difficult to develop both reliable and scalable models in its ability to accurately predict Polypharmacy Side Effect. Recent advancement in the field of representational learning has utilized the power of graph networks to harmonize information from the heterogeneous biological databases and interactomes. This thesis takes advantage of those techniques and incorporates them with the state-of-the-art Graph Neural Network algorithms to implement a Deep learning pipeline capable of predicting the Adverse Drug Reaction of any given paired drug combinations.
|
84 |
Evolutionary evidence of chromosomal rearrangements through SNAP : Selection during Niche AdaPtationMota Merlo, Marina January 2021 (has links)
The Selection during Niche AdaPtation (SNAP) hypothesis aims to explain how the gene order in bacterial chromosomes can change as the result of bacteria adapting to a new environment. It starts with a duplication of a chromosomal segment that includes some genes providing a fitness advantage. The duplication of these genes is preserved by positive selection. However, the rest of the duplicated segment accumulates mutations, including deletions. This results in a rearranged gene order. In this work, we develop a method to identify SNAP in bacterial chromosomes. The method was tested in Salmonella and Bartonella genomes. First, each gene was assigned an orthologous group (OG). For each genus, single-copy panorthologs (SCPos), the OGs that were present in most of the genomes as one copy, were targeted. If these SCPos were present twice or more in a genome, they were used to build duplicated regions within said genome. The resulting regions were visualized and their possible compatibility with the SNAP hypothesis was discussed. Even though the method proved to be effective on Bartonella genomes, it was less efficient on Salmonella. In addition, no strong evidence of SNAP was detected in Salmonella genomes.
|
85 |
Characterization of the Recombination Landscape in Red-Breasted and Taiga FlycatchersVilhelmsson Sinclair, Bella January 2019 (has links)
Between closely related species there are genomic regions with a higher level of differentiation compared to the rest of the genome. For a time it was believed that these regions harbored loci important for speciation but it has now been shown that these patterns can arise from other mechanisms, like recombination. The aim of this project was to estimate the recombination landscape for red-breasted flycatcher (Ficedula parva) and taiga flycatcher (F. albicilla) using patterns of linkage disequilibrium. For the analysis, 15 red-breasted and 65 taiga individuals were used. Scaffolds on autosomes were phased using fastPHASE and the population recombination rate was estimated using LDhelmet. To investigate the accuracy of the phasing, two re-phasings were done for one scaffold. The correlation between the rephases were weak on the fine-scale, and strong between means in 200 kb windows. 2,176 recombination hotspots were detected in red-breasted flycatcher and 2,187 in taiga flycatcher. Of those 175 hotspots were shared, more than what was expected by chance if the species were completely independent (31 hotspots). Both species showed a small increase in the rate at hotspots unique to the other species. The low number of shared hotspots might indicate that the recombination landscape is less conserved between red-breasted and taiga flycatchers than found between collared and pied flycatcher. However, the investigation of the phasing step indicate that the fine-scale estimation, on which hotspots are found, might not be reliable. For future analysis, it is important to use high-quality data and carefully chose methods.
|
86 |
Dual RNA-seq analysis of host-pathogen interaction in Eimeria infection of chickensSigurðarson Sandholt, Arnar Kári January 2020 (has links)
Eimeria tenella is a eukaryotic, intracellular parasite that, along with six other Eimeria species, causes coccidiosis in chickens. This disease can result in weight loss or even death and is estimated to cause 2 billion euros of damages to the chicken industry each year. While much is known of the life cycle of E. tenella in the chicken, less is known about molecular mechanisms of infection and the chicken immune response. In this study, we produced a pipeline for dual RNA-sequencing analysis of a mixed chicken and E. tenella dataset. We then carried out an analysis on an in vitro infection of the chicken macrophage HD-11 cell line. This was followed by a differential expression analysis across six time points, 2, 4, 12, 24, 48, and 72 hours post-infection, in order to elucidate these mechanisms. The results showed clear patterns of expression for the chicken immune genes, with strong down-regulation of genes across the immune system at 24 hours and a repetition of early patterns at 72 hours, indicating that reinfection by a second generation of parasite cells was occurring. Several genes that may have important roles in the immune reaction of the chicken were identified, such as MRC2, ITGB3 and ITGA9, along with genes with known roles, such as TLR15. The expression of surface antigen genes in E. tenella was also examined, showing a clear upregulation in the late stages of merogony, suggesting important roles for merozoites. Finally, a co-expression analysis was carried out, showing considerable co-expression among the two organisms. One of the gene co-expression networks identified appeared to be enriched with both infection specific genes from E. tenella and chicken immune genes. These results, along with the pipeline, will be used in further studies on E. tenella infections and bring us closer to the eventual goal of a vaccine for coccidiosis.
|
87 |
Causality in CoexpressionBarros, Carolina January 2020 (has links)
One of the main goals of genetics has been to understand the link between genotype and phenotype. Using yeast (Saccharomyces cerevisiae) as our model organism, we take a closer look at the connection between genetic variation and gene expression to learn more about the mechanisms of gene regulation. We propose an algorithm based on ANOVA to detect causal relationships between coexpressed genes. We first identify expression quantitative trait loci (eQTLs) with strong effects on gene expression. The algorithm then uses these eQTLs with strong effects and the expression of all genes to identify how genes are affecting each other. This is done by analysing coexpressed gene pairs where both genes have an eQTL and finding if the eQTL of one gene affects the expression of the other. Genes that were found to affect the expression of other genes were named “causal genes”. We evaluate our method by comparing its results with known causal genes and conclude that it is a good predictor of known interactions. Using this algorithm, we found 741 genes having causal effects on gene expression, many of which affected the gene expression of many other genes across the genome (2278 total affected genes). Some of the causal genes clustered at six hotspot regions in the genome. Genes in hotspot regions were found to have lower heritability than genes outside these regions. We hypothesize that hotspot regions may be enriched for essential and/or fitness related genes.
|
88 |
Voxel-wise Longitudinal Analysis of Weight Gain from Different Dietary Fats using Image Registration-Based "Imiomics" AnalysisAndersson, Vendela January 2022 (has links)
There is an emerging global epidemic of obesity and related complications, such as type 2diabetes (T2D). Alterations in body composition (adipose tissue, muscle volume and fatcontents) are known to be associated with an increased metabolic risk. Understanding of theunderlying mechanisms is key for development of novel intervention strategies. One study investigating the effect on body composition by different diets is Lipogain1. In this study, it was found that a small weight gain induced by polyunsaturated fats (PUFA, n=19) or saturated fats (SFA, n=20) had very different effects on body fat, liver fat and lean tissue mass respectively. The SFA group gained more liver fat and fat mass in general, while the PUFA group gained more muscle mass. These results were determined by magnetic resonance imaging. The goal of this project was to visualize the results from Lipogain1 by utilizing the noveltechnique Imiomics. Imiomics is a method for statistical analysis of whole-body medical images. By utilizing image registration, all images are transformed to a common reference space. This enables point-wise comparisons between all images included in the analysis. In this project, mean images of the alterations in fat content and local volume change of the two groups were created. These were used to visualize the alterations in body composition from the study. Additionally, statistical tests were used to visualize statistically significant differences between the groups. Differences between the groups could be seen in the mean images. Mainly a higher fat content increase was seen in SFA in comparison to PUFA. There was also a larger volume expansion in fat tissue in SFA than in PUFA, while PUFA instead had a larger volume expansion in muscles. An unexpected result was also found; the liver had expanded in PUFA but not in SFA. Unfortunately, few significant differences could be visualized between the groups when the statistical test was performed. The conclusion was that this method is promising for visualization of these kinds of studies, especially due to the potential of finding new, unexpected results. However, a somewhat larger cohort and possibly larger alterations in body composition might be needed to be able to visualize and quantify statistically significant differences between the groups on a voxel-wise level.
|
89 |
Classification of Neuronal Subtypes in the Striatum and the Effect of Neuronal Heterogeneity on the Activity Dynamics / Klassificering av neuronala subtyper i striatum och effekten av neuronal heterogenitet på aktivitetsdynamikenBekkouche, Bo January 2016 (has links)
Clustering of single-cell RNA sequencing data is often used to show what states and subtypes cells have. Using this technique, striatal cells were clustered into subtypes using different clustering algorithms. Previously known subtypes were confirmed and new subtypes were found. One of them is a third medium spiny neuron subtype. Using the observed heterogeneity, as a second task, this project questions whether or not differences in individual neurons have an impact on the network dynamics. By clustering spiking activity from a neural network model, inconclusive results were found. Both algorithms indicating low heterogeneity, but by altering the quantity of a subtype between a low and high number, and clustering the network activity in each case, results indicate that there is an increase in the heterogeneity. This project shows a list of potential striatal subtypes and gives reasons to keep giving attention to biologically observed heterogeneity.
|
90 |
Statistical Analysis of PAR-CLIP dataGolumbeanu, Monica January 2013 (has links)
From creation to its degradation, the RNA molecule is the action field of many binding proteins with different roles in regulation and RNA metabolism. Since these proteins are involved in a large number of processes, a variety of diseases are related to abnormalities occurring within the binding mechanisms. One of the experimental methods for detecting the binding sites of these proteins is PAR-CLIP built on the next generation sequencing technology. Due to its size and intrinsic noise, PAR-CLIP data analysis requires appropriate pre-processing and thorough statistical analysis. The present work has two main goals. First, to develop a modular pipeline for preprocessing PAR-CLIP data and extracting necessary signals for further analysis. Second, to devise a novel statistical model in order to carry out inference about presence of protein binding sites based on the signals extracted in the pre-processing step.
|
Page generated in 0.1605 seconds