Global ETD Search

151	Hybrid Variational Autoencoder for Clustering of Single-Cell RNA-seq Data : Introducing HybridVI, a Variational Autoencoder with two Latent Spaces / Hybrid Variational autoencoder för analys av enkelcells RNA-sekvensering data Narrowe Danielsson, Sarah January 2022 (has links) Single-cell analysis means to analyze cells on an individual level. This individual analysis enhances the investigation of the heterogeneity among and the classification of individual cells. Single-cell analysis is a broad term and can include various measurements. This thesis utilizes single-cell RNA sequence data that measures RNA sequences representing genes for individual cells. This data is often high-dimensional, with tens of thousands of RNA sequences measured for each cell. Dimension reduction is therefore necessary when analyzing the data. One proposed dimension reduction method is the unsupervised machine learning method variational autoencoders. The scVI framework has previously implemented a variational autoencoder for analyzing single-cell RNA sequence data. The variational autoencoder of the scVI has one latent space with a Gaussian distribution. Several extensions have been made to the scVI framework since its creation. This thesis proposes an additional extension consisting of a variational autoencoder with two latent spaces, called hybridVI. One of these latent spaces has a Gaussian distribution and the other a von Mises-Fisher distribution. The data is separated between these two latent spaces, meaning that some of the genes go through one latent space and the rest go through the other. In this thesis the cell cycle genes go through the von Mises-Fisher latent space and the rest of the genes go through the Gaussian latent space. The motivation behind the von Mises-Fisher latent space is that cell cycle genes are believed to follow a circular distribution. Putting these genes through a von Mises-Fisher latent space instead of a Gaussian latent space could provide additional insights into the data. The main focus of this thesis was to analyze the impact this separation. The analysis consisted of comparing the performance of the hybridVI model, to the original scVI variational autoencoder. The comparison utilized three annotated datasets, one peripheral blood mononuclear cell dataset, one cortex cell dataset, and one B cell dataset collected by the Henriksson lab at Umeå University. The evaluation metrics used were the adjusted rand index, normalized mutual information and a Wilcoxon signed ranks test was used to determine if the results had statistical significance. The results indicate that the size of the dataset was essential for achieving robust and statistically significant results. For the two datasets that yielded statistically significant results, the scVI model performed better than the hybridVI model. However, more research analyzing biological aspects is necessary to declare the hybridVI model’s effect on the biological interpretation of the results. / Individuell cellanalys är en relativt ny metod som möjliggör undersökning av celler på indivudiell nivå. Det här examensarbetet analyserar RNA sekvens data, där RNA sekvenser är specifierade för individuella celler. Den här sortens data är ofta högdimensionell med flera tusen gener noterade för varje cell. För att möjliggöra en analys av den här datan krävs någon form av dimensionreducering. En föreslagen metod är den ovövervakade maskininlärningsmetoden variational autoencoders. Ett ramverk, scVI, har framtagit en variational autoencoder designad för att hantera den här sortens data. Den här modellen har endast en latentrymd med en normalfördelning. Det här examensarbetet föreslår en utökning av det här ramverket med en variational autoencoder med två latentrymder,där den ena är normalfördelad och den andra följer en von Mises-Fisher fördelning. Motiveringen till en sådan fördelning är att cellcykelgener är antagna att tillhöra en cirkulär fördelning. Cellcykelgenerna i datan kan därmed hanteras av den cirkulära latentrymden. Huvudfokuset i den här studien är att undersöka om den här separationen av gener kan förbättra modellens förmåga att hitta korrekta kluster. Experimentet utfördes på tre annoterade dataset, ett som bestod av perifera mononukleära blodceller, ett som bestod av hjärnbarksceller och ett som bestod av B celler insamlat av Henrikssongruppen vid Umeå universitet. Modellen från scVI ramverket jämfördes med den nya metoden med två latentrymder, hybridVI. Måtten som användes för att bedöma de modellerna var adjusted rand index och normaliserad mutual information och ett Wilcoxon Signed-Ranks test användes för att bedöma resultatens statistiska signifikans. Resultaten påvisar att de båda modellerna preseterar bättre och mer konsekvent för större dataset. Två dataset gav statistiskt signifikanta resultat och visade att scVI modellen presterade bättre än hybridmodellen. Det behövs dock en biologisk analys av resultaten för att undersöka vilken modells resultat som har mest biologisk relevans. Bioinformatics scRNAseq Variational Autoencoder Single-Cell Analysis Bioinformatik scRNAseq Variational Autoencoder individuell cellanalys Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Computer and Information Sciences Data- och informationsvetenskap
152	Deep learning prediction of Quantmap clusters Parakkal Sreenivasan, Akshai January 2021 (has links) The hypothesis that similar chemicals exert similar biological activities has been widely adopted in the field of drug discovery and development. Quantitative Structure-Activity Relationship (QSAR) models have been used ubiquitously in drug discovery to understand the function of chemicals in biological systems. A common QSAR modeling method calculates similarity scores between chemicals to assess their biological function. However, due to the fact that some chemicals can be similar and yet have different biological activities, or conversely can be structurally different yet have similar biological functions, various methods have instead been developed to quantify chemical similarity at the functional level. Quantmap is one such method, which utilizes biological databases to quantify the biological similarity between chemicals. Quantmap uses quantitative molecular network topology analysis to cluster chemical substances based on their bioactivities. This method by itself, unfortunately, cannot assign new chemicals (those which may not yet have biological data) to the derived clusters. Owing to the fact that there is a lack of biological data for many chemicals, deep learning models were explored in this project with respect to their ability to correctly assign unknown chemicals to Quantmap clusters. The deep learning methods explored included both convolutional and recurrent neural networks. Transfer learning/pretraining based approaches and data augmentation methods were also investigated. The best performing model, among those considered, was the Seq2seq model (a recurrent neural network containing two joint networks, a perceiver and an interpreter network) without pretraining, but including data augmentation. Deep Learning Machine Learning Deep Neural Network Convolutional Neural Network Recurrent Neural Network Drug classification Drug-biological function Pharmaceutical Biotechnology Läkemedelsbioteknik Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
153	Investigating the impact of dose banding and oral formulations of paracetamol in pediatrics: A pharmacokinetic simulation-based safety assessment study / Formulerings- och doseringeringseffekter på paracetamol i barn: en farmakokinetisk simuleringsstudie Rosenqvist, Julia January 2024 (has links) Paracetamol är ett vanligt använt läkemedel med analgesisk och antipyretisk effekt. Läkemedlet finns tillgängligt i ett flertal beredningsformer och doseringsstyrkor för användning både receptfritt och i sjukhusvården. Syftet med detta projekt var att undersöka påverkan av alternativ, off-label, dosering av paracetamol i pediatrisk vård, med hjälp av fysiologiskt baserad farmakokinetisk (PBPK) modellering. Modellen utvecklades först för en vuxen population genom integrering av in vitro, in vivo och in silico data för paracetamol. Efter detta extrapolerades concentrationskurvor till en pediatrisk population med hjälp av ontogeni-information. Modellen validerades i både vuxna och barn, och var tillförlitlig för både peroral och intravenös dosering. Efter valideringen utfördes simuleringar för nio olika åldersgrupper baserat på rekommenderade doseringsprotokoll i Sverige. Simuleringarna visade att perorala tablettdoseringen var jämförbar med formulering i lösningsform, med snarlika maximumkoncentrationer och area-under-kurvan (AUC) för exponering. Hastigheten av magtömning influerade maximumkoncentrationer men inte AUC. Ytterligare testades modellens förmåga att prediktera plasmakoncentrationer i blodet efter överdosering med paracetamol. Dessa prediktioner fungerade bättre när läkemedelsmetaboliserande enzymer lämnades oförändrade, eller ökade något i aktivitet. Slutligen, den utvecklade PBPK-modellen kan användas för att säkert undersöka olika doseringsprotokoll och för design av pediatriska kliniska studier. / Paracetamol, a widely used analgesic and antipyretic drug, can be found in various formulations and doses for both home and hospital use. The aim of this study was to investigate the impact of off-label dosing of paracetamol in pediatric clinical practice using physiologically based pharmacokinetic (PBPK) modeling. The model was initially developed for adults by integrating relevant in vitro, in vivo and in silico data of paracetamol, after which the model was extrapolated for pediatrics by adding ontogeny information. The model was successfully validated in both adult and pediatric populations, and it showed accuracy for both oral and intravenous administration routes. After validation, simulations were conducted across nine different age groups following the recommended doses in Sweden. These simulations showed that tablet dose is comparable to solution dosing, resulting in nearly identical maximum concentrations and area under the curve (AUC) values. Furthermore, it was observed that gastric emptying time, which reflects the fed state of individuals, significantly influences the maximum concentration, with longer gastric emptying times resulting in lower and delayed peak concentrations. However, the gastric emptying time had no effect on the AUC values. Lastly, the model’s performance on overdose data was evaluated, and it turned out that it performs better when liver enzymes were not affected, or they were only slightly elevated. Finally, the developed PBPK model can be further used for safe and effective way of exploring dose banding and designing clinical trials in pediatrics. paracetamol dose banding formulation off-label paracetamol läkemedelsdosering farmaceutisk formulering off label-användning Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
154	Protein-drug binding affinity prediction with machine learning : Assessing the impact of features from molecular dynamic simulations Guttormsson, Guðmundur Andri, Le Gallo, Léa January 2024 (has links) The development of medicine is generally a long and costly process, and one big factor is estimating the affinity of protein-drug binding. Leveraging machine learning in this field is a promising approach as it can streamline the prediction process and reduce the need for expensive experimental methods. Machine learning methods have already enabled significant advances in predicting protein-drug binding affinity, yet there remains room for improvement. The primary challenge is the quality of data used for these machine learning models. In this work, two ensemble machine learning models, Random Forest and Extreme Gradient Boosting Machine, have been tested and compared with a recent database of protein-ligand complex features calculated from molecular dynamics simulation. Additional features were also extracted from the PDB database through PLIP (Protein-Ligand interaction Profiler), aiming to improve the predictions further. The results indicate that while the features from the PDB database provided strong predictive power, including features from molecular dynamic simulations did not improve the models’ performance. machine learning ensemble models binding affinity molecular dynamics simulations scoring function Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Biochemistry and Molecular Biology Biokemi och molekylärbiologi Computer Sciences Datavetenskap (datalogi)
155	Horseshoe RuleFit : Learning Rule Ensembles via Bayesian Regularization Nalenz, Malte January 2016 (has links) This work proposes Hs-RuleFit, a learning method for regression and classiﬁcation, which combines rule ensemble learning based on the RuleFit algorithm with Bayesian regularization through the horseshoe prior. To this end theoretical properties and potential problems of this combination are studied. A second step is the implementation, which utilizes recent sampling schemes to make the Hs-RuleFit computationally feasible. Additionally, changes to the RuleFit algorithm are proposed such as Decision Rule post-processing and the usage of Decision rules generated via Random Forest. Hs-RuleFit addresses the problem of ﬁnding highly accurate and yet interpretable models. The method shows to be capable of ﬁnding compact sets of informative decision rules that give a good insight in the data. Through the careful choice of prior distributions the horse-shoe prior shows to be superior to the Lasso in this context. In an empirical evaluation on 16 real data sets Hs-RuleFit shows excellent performance in regression and outperforms the popular methods Random Forest, BART and RuleFit in terms of prediction error. The interpretability is demonstrated on selected data sets. This makes the Hs-RuleFit a good choice for science domains in which interpretability is desired. Problems are found in classiﬁcation, regarding the usage of the horseshoe prior and rule ensemble learning in general. A simulation study is performed to isolate the problems and potential solutions are discussed. Arguments are presented, that the horseshoe prior could be a good choice in other machine learning areas, such as artiﬁcial neural networks and support vector machines. Bayesian Statistics Regularization Ensemble Learning Decision Rules Horseshoe prior Machine Learning Knowledge Discovery Probability Theory and Statistics Sannolikhetsteori och statistik Computer Sciences Datavetenskap (datalogi) Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Other Computer and Information Science Annan data- och informationsvetenskap
156	DNA methylation correlation networks in overweight and normal-weight adolescents reveal differential coordination Bringeland, Nathalie January 2013 (has links) Multiple health issues are associated with obesity and numerous factors are causative of the disease. The role of genetic factors is well established, as is the knowledge that dietary and sedentary behavior promotes weight gain. Although there is strong suspicion towards the role of epigenetics as a driving force toward disease, this field remains l in the context of obesity. DNA methylation correlation networks were profiled from blood samples of 69 adolescents of two distinct weight-classes; obese (n=35) and normal-weight (n=34). The network analysis revealed major differences in the organization of the networks where the network of the obese had less modularity compared to normal-weight. This is manifested by more and smaller clusters in the obese, pertaining to genes of related functions and pathways, than the network of the normal-weight. Consequently, this suggests that biological pathways have a lower order of coordination between each other in means of DNA methylation in obese than normal-weight. Analysis of highly connected genes, hubs, in the two networks suggests that the difference in coordination between biological pathways may be derived by changes of the methylation pattern of these hubs; highly connected genes in one network had an intriguingly low connectivity in the other. In conclusion, the results suggest differential regulation of transcription through changes in the coordination of DNA methylation in overweight and normal weighted individuals. The findings of this study are a major step towards understanding the role of DNA methylation in obesity and provide potential biomarkers for diagnosing and predicting obesity. DNA methylation network DNA methylation network methylation network correlation network coordination differential coordination overweight obese obesity normal-weight lean adolescents children methylome complex disease comorbidities enrichment analysis Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
157	Computational Studies of Protein Synthesis on the Ribosome and Ligand Binding to Riboswitches Lind, Christoffer January 2017 (has links) The ribosome is a macromolecular machine that produces proteins in all kingdoms of life. The proteins, in turn, control the biochemical processes within the cell. It is thus of extreme importance that the machine that makes the proteins works with high precision. By using three dimensional structures of the ribosome and homology modelling, we have applied molecular dynamics simulations and free-energy calculations to study the codon specificity of protein synthesis in initiation and termination on an atomistic level. In addition, we have examined the binding of small molecules to riboswitches, which can change the expression of an mRNA. The relative affinities on the ribosome between the eukaryotic initiator tRNA to the AUG start codon and six near-cognate codons were determined. The free-energy calculations show that the initiator tRNA has a strong preference for the start codon, but requires assistance from initiation factors 1 and 1A to uphold discrimination against near-cognate codons. When instead a stop codon (UAA, UGA or UAG) is positioned in the ribosomal A-site, a release factor binds and terminates protein synthesis by hydrolyzing the nascent peptide chain. However, vertebrate mitochondria have been thought to have four stop codons, namely AGA and AGG in addition to the standard UAA and UAG codons. Furthermore, two release factors have been identified, mtRF1 and mtRF1a. Free-energy calculations were used to determine if any of these two factors could bind to the two non-standard stop codons, and thereby terminate protein synthesis. Our calculations showed that the mtRF’s have similar stop codon specificity as bacterial RF1 and that it is highly unlikely that the mtRF’s are responsible for terminating at the AGA and AGG stop codons. The eukaryotic release factor 1, eRF1, on the other hand, can read all three stop codons singlehandedly. We show that eRF1 exerts a high discrimination against near-cognate codons, while having little preference for the different cognate stop codons. We also found an energetic mechanism for avoiding misreading of the UGG codon and could identify a conserved cluster of hydrophobic amino acids which prevents excessive solvent molecules to enter the codon binding site. The linear interaction energy method was used to examine binding of small molecules to the purine riboswitch and the FEP method was employed to explicitly calculate the LIE b-parameters. We show that the purine riboswitches have a remarkably high degree of electrostatic preorganization for their cognate ligands which is fundamental for discriminating against different purine analogs. Binding free energy Ribosome Codon reading Translation initiation Translation termination Mitochondrial translation Release factor Purine riboswitch Molecular Dynamics Free Energy Perturbation Biochemistry and Molecular Biology Biokemi och molekylärbiologi Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
158	Modelling Low Dimensional Neural Activity / Modellering av lågdimensionell neural aktivitet Wärnberg, Emil January 2016 (has links) A number of recent studies have shown that the dimensionality of the neural activity in the cortex is low. However, what network structures are capable of producing such activity is not theoretically well understood. In this thesis, I discuss a few possible solutions to this problem, and demonstrate that a network with a multidimensional attractor can give rise to such low dimensional activity. The network is created using the Neural Engineering Framework, and exhibits several biologically plausible features, including a log-normal distribution of the synaptic weights. / Ett antal nyligen publicerade studier has visat att dimensionaliten för neural aktivitet är låg. Dock är det inte klarlagt vilka nätverksstrukturer som kan uppbringa denna typ av aktivitet. I denna uppsats diskuterar jag möjliga lösningsförslag, och demonstrerar att ett nätverk med en flerdimensionell attraktor ger upphov till lågdimensionell aktivitet. Nätverket skapas med hjälp av the Neural Engineering Framework, och uppvisar ett flertal biologiskt trovärdiga egenskaper. I synnerhet är fördelningen av synapsvikter log-normalt fördelad. neural networks low dimensional neural activity neural engineering framework NEF nef artificial neural network ann dimensionality log-normal log normal distribution synaptic geometric connectivity Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
159	Intersecting Graph Representation Learning and Cell Profiling : A Novel Approach to Analyzing Complex Biomedical Data Chamyani, Nima January 2023 (has links) In recent biomedical research, graph representation learning and cell profiling techniques have emerged as transformative tools for analyzing high-dimensional biological data. The integration of these methods, as investigated in this study, has facilitated an enhanced understanding of complex biological systems, consequently improving drug discovery. The research aimed to decipher connections between chemical structures and cellular phenotypes while incorporating other biological information like proteins and pathways into the workflow. To achieve this, machine learning models' efficacy was examined for classification and regression tasks. The newly proposed graph-level and bio-graph integrative predictors were compared with traditional models. Results demonstrated their potential, particularly in classification tasks. Moreover, the topology of the COVID-19 BioGraph was analyzed, revealing the complex interconnections between chemicals, proteins, and biological pathways. By combining network analysis, graph representation learning, and statistical methods, the study was able to predict active chemical combinations within inactive compounds, thereby exhibiting significant potential for further investigations. Graph-based generative models were also used for molecule generation opening up further research avenues in finding lead compounds. In conclusion, this study underlines the potential of combining graph representation learning and cell profiling techniques in advancing biomedical research in drug repurposing and drug combination. This integration provides a better understanding of complex biological systems, assists in identifying therapeutic targets, and contributes to optimizing molecule generation for drug discovery. Future investigations should optimize these models and validate the drug combination discovery approach. As these techniques continue to evolve, they hold the potential to significantly impact the future of drug screening, drug repurposing, and drug combinations. Graph representation learning Cell profiling Biological systems Network medicine Graphs Machine learning techniques Graph neural networks (GNNs) Protein-Compound-Pathway interactions Biomarkers Drug discovery Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi)
160	Developing Automated Cell Segmentation Models Intended for MERFISH Analysis of the Cardiac Tissue by Deploying Supervised Machine Learning Algorithms / Utveckling av automatiserade cellsegmenteringsmodeller avsedda för MERFISH-analys av hjärtvävnad genom användning av övervakade maskininlärningsalgoritmer Rune, Julia January 2023 (has links) Följande studie behandlar utvecklandet av automatiserade cellsegmenteringsmodeller med avsikt att identifiera gränser mellan celler i hjärtvävnad. Syftet är att möjliggöra analys av data genererad från multiplexed error-robust in situ hybridization (MERFISH). MERFISH är en spatial transcriptomics-teknik som till skillnad från exempelvis single-cell RNA sequencing (ScRNA-seq) och single molecule fluorescence in situ hybridization (smFISH), möjliggör profilering av hundratals RNA-sekvenser hos enskilda celler utan att förlora dess rumsliga kontext. I Kosuri laboratoriet på Salk Institute of Biological Studies i San Diego tillämpas MERFISH på mushjärtan. Syftet är att få en djupare insikt i hur celler är organiserade i friska hjärtan, och hur denna struktur ändras i och med åldring och sjukdom. Att extrahera meningsfull information från MERFISH medför dock en betydande utmaning - en exakt cellsegmentering. Studien bidrar följaktligen till utvecklandet av segmenteringsmodeller för att kringgå de utmaningar som står i vägen för all efterföljande analys. Då klassiska segmenteringsalgoritmer är otillräckliga för att segmentera den komplexa vävnad som hjärtat utgörs av, tillämpades några av dagens mest avancerade och framstående maskininlärningsalgoritmer inom fältet, kallade Cellpose och Omnipose. Givet den täta och heterogena hjärtvävnaden, som härstammar från en bred distribution av celltyper och geometrier, utvecklades två separata modeller; en för att täcka både mindre celler och kardiomyocyter skurna på tvärsnittet; och en för att enbart segmentera kardiomyocyter skurna i longitudinell riktning. Den förstnämnda modellen utvecklades och tränades i Cellpose, och uppnådde en träffsäkerhet på 91.2%. Modellen för longitudinella kardiomyocyter utvecklades istället både i Cellpose och Omnipose för att utvärdera vilket nätverk som är bäst lämpat för ändamålet. Ingen av nätverken lyckades uppnå en tillräckligt hög träffsäkerhet för att vara applicerbar, och är därmed i behov av fortsatt träning. Modellen genererad i Omnipose bedöms dock vara mest lovande, givet dess mer heltäckande segmentering. Ytterligare utvecklingsområden för framtiden innefattar segmentering av celler i fibros-täta regioner, samt att utveckla en 3D-segmentering av hela hjärtat för att uppnå en mer komplett MERFISH-analys. Sammanfattningsvis har de genererade segmenteringsmodellerna banat väg för möjliggörandet av en rigorös MERFISH-analys av hjärtat. Genom att avslöja några av de strukturella och funktionella orsakerna till hjärtsvikt på en cellulär nivå, kan vi således på sikt bidra till utvecklingen av mer effektiva terapeutiska strategier. / The following study delves into the development of automated cell segmentation models, with the intention of identifying boundaries between cells in the cardiac tissue for analysing spatial transcriptomics data. Addressing the limitations of alternative techniques like single-cell RNA sequencing (ScRNA-seq) and single molecule fluorescence in situ hybridization (smFISH), the study underscores the innovative use of multiplexed error-robust fluorescence in situ hybridization (MERFISH) deployed by the Kosuri Lab at Salk Institute for Biological Studies. This advanced imaging-based technique allows for a single-cell transcriptome profiling of hundreds of different transcripts while retaining the spatial context of the tissue. The technique can accordingly reveal how the organization of cells within a healthy heart is altered during disease. However, the extraction of meaningful data from MERFISH poses a significant challenge - accurate cell segmentation. This thesis therefore presents the development of a robust model for cell boundary identification within cardiac tissue, leveraging some of the advanced supervised machine learning algorithms in the field, named Cellpose and Omnipose. Due to the dense and highly heterogeneous tissue- stemming from a wide distribution of cell types and shapes- two separate models had to be developed; one that covers the smaller cells and the cross-sectioned cardiomyocytes, and correspondingly one to cover the longitudinal cardiomyocytes. The cross-section model was successfully developed to achieve an accuracy of 91.2%, whereas the longitudinal model still needs further improvements before being implemented. The thesis acknowledges potential areas for improvement, emphasizing the need to further improve the segmentation of longitudinal cardiomyocytes, tackle the challenges with segmenting cells within fibrotic regions of the diseased heart, as well as achieving a precise 3D cell segmentation. Nonetheless, the generated models have paved the way towards enabling efficient downstream MERFISH analysis to ultimately understand the structural and functional dynamics of heart failure at a cellular level, aiding the development of more effective therapeutic strategies. Cell Segmentation Cellpose Supervised Machine Learning MERFISH Heart Failure Cellsegmentering Cellpose Övervakad Maskininlärning MERFISH Hjärtsvikt Bioinformatics (Computational Biology) Bioinformatik (beräkningsbiologi) Bioinformatics and Systems Biology Bioinformatik och systembiologi Cell and Molecular Biology Cell- och molekylärbiologi Cardiac and Cardiovascular Systems Kardiologi

Search results