• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 170
  • 8
  • Tagged with
  • 182
  • 182
  • 182
  • 172
  • 172
  • 35
  • 34
  • 20
  • 19
  • 18
  • 18
  • 17
  • 16
  • 12
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Iterative full-genome phasing and imputation using neural networks

Rydin, Lotta January 2022 (has links)
In this project, a model based on a convolutional neural network have been developed with the aim of imputing missing genotype data. This model was based on an already existing autoencoder that was modified into a U-Net structure. The network was trained and used iteratively with the intention that the result would improve in each iteration. In order to do this, the output of the model was used as the input in the next iteration. The data used in this project was diploid genotype data, which was phased into haploids and then run separately through the network. In each iteration, the new haploids were generated based on the output haploids. These were used as in input in the next iteration. The result showed that the accuracy of the imputation improved slightly in every iteration. However, it did not surpass the same model that was trained for one single iteration. Further work is needed to make the model more useful.
112

Using ADME/PK models to improve generative molecular design with reinforcement learning

Pop, Cristian-Catalin January 2024 (has links)
An adequate ADME/PK (absorption, distribution, metabolism, excretion, pharmacokinetics) profile is an essential quality for a drug. As part of the drug discovery process, leads are iteratively designed and optimized in order to simultaneously satisfy various properties such as appropriate ADME/PK levels and high biological activity for a target. The drug discovery process can be accelerated by improving the likelihood that a designed compound fulfils the necessary pharmacologic properties, and thus reducing the number of needed iterations. A promising technique is de novo drug design, where molecules are computationally generated based on a set of desired attributes. Our project aimed to benchmark the effectiveness of the ANDROMEDA ADME/PK conformal prediction models in guiding the generation of compounds toward an area of chemical space with good ADME/PK properties. For this, we used the REINVENT reinforcement learning framework built by the Molecular AI team at AstraZeneca. Here, we integrated 4 out the 14 available ANDROMEDA models (fabs , fdiss, CLint and Vss) as oracles in the scoring component of the generative model. Oral bioavailability (F) is a secondary parameter that was computed with the help of the aforementioned models and fu(unbound fraction in plasma), and serves as the fifth ADME/PK oracle in our analysis. We aimed to rediscover DRD2 bioactives with a good ADME/PK profile. Our results show that the ANDROMEDA models have a slight influence on the predicted ADME/PK properties of the generated compounds. The results do not show an increased likelihood of generating DRD2 ligands in the case of the primary ANDROMEDA models. However, when using the oral bioavailability oracle, the sampling likelihood increases for some of the approved DRD2 ligands. In conclusion, the oral bioavailability ANDROMEDA model can be a promising option for guiding the generation of novel compounds towards an area of chemical space with good ADME/PK properties.
113

Contributions to Small Area Estimation : Using Random Effects Growth Curve Model

Ngaruye, Innocent January 2017 (has links)
This dissertation considers Small Area Estimation with a main focus on estimation and prediction for repeated measures data. The demand of small area statistics is for both cross-sectional and repeated measures data. For instance, small area estimates for repeated measures data may be useful for public policy makers for different purposes such as funds allocation, new educational or health programs, etc, where decision makers might be interested in the trend of estimates for a specic characteristic of interest for a given category of the target population as a basis of their planning. It has been shown that the multivariate approach for model-based methods in small area estimation may achieve substantial improvement over the usual univariate approach. In this work, we consider repeated surveys taken on the same subjects at different time points. The population from which a sample has been drawn is partitioned into several non-overlapping subpopulations and within all subpopulations there is the same number of group units. The aim is to propose a model that borrows strength across small areas and over time with a particular interest of growth profiles over time. The model accounts for repeated surveys, group individuals and random effects variations. Firstly, a multivariate linear model for repeated measures data is formulated under small area estimation settings. The estimation of model parameters is discussed within a likelihood based approach, the prediction of random effects and the prediction of small area means across timepoints, per group units and for all time points are obtained. In particular, as an application of the proposed model, an empirical study is conducted to produce district level estimates of beans in Rwanda during agricultural seasons 2014 which comprise two varieties, bush beans and climbing beans. Secondly, the thesis develops the properties of the proposed estimators and discusses the computation of their first and second moments. Through a method based on parametric bootstrap, these moments are used to estimate the mean-squared errors for the predicted small area means. Finally, a particular case of incomplete multivariate repeated measures data that follow a monotonic sample pattern for small area estimation is studied. By using a conditional likelihood based approach, the estimators of model parameters are derived. The prediction of random effects and predicted small area means are also produced.
114

STAIRS : Data reduction strategy on genomics

Ferrer, Samuel January 2019 (has links)
Background. An enormous accumulation of genomic data has been taking place over the last ten years. This makes the activities of visualization and manual inspection, key steps in trying to understand large datasets containing DNA sequences with millions of letters. This situation has created a gap between data complexity and qualified personnel due to the need of trading between visualization, reduction capacity and exploratory functions, features rarely achieved by existing tools, such as SRA toolkit (https://www.ncbi.nlm.nih.gov/sra/docs/toolkitsoft/), for instance. A novel approach to the problem of genomic analysis and visualization was pursued in this project, by means of STrAtified Interspersed Reduction Structures (STAIRS). Result. Ten weeks of intense work resulted in novel algorithms to compress data, transform it into stairs vectors and align them. Smith–Waterman and Needleman–Wunsch algorithms have been specially modified for this purpose and the application brought about statistical performance and behavioural charts.
115

A canonical correlation analysis- based approach to identify causal genes in atherosclerosis

Sizyoogno, Crisencia January 2018 (has links)
Genome-wide associations studies (GWASs) have identified hundreds of loci that are strongly associated with coronary artery disease and its risk factors. However, the causal variants and genes remain unknown for the vast majority of the identified loci. Zebrafish model systems coupled with clustered regularly interspaced short palindromic repeats-C–associated 9 (CRISPR Cas-9) mutagenesis have enabled the possibility to systematically characterize candidate genes in GWAS-identified loci. In this thesis, canonical correlation analysis (CCA) was used to identify putative causal genes in multiplexed genetic screens for atherogenic traits in zebrafish larvae in an efficient manner. The two datasets used in this thesis contained genes and phenotypes obtained through sequencing and high-throughput imaging of fish larvae. Dataset 1 contained (7 genes, 11 phenotypes, n = 384) and dataset 2 (4 genes, 11 phenotypes, n = 384). CCA’s multiple genes vs. multiple phenotype analysis in dataset 1 identified the genes met, pepd, timd4 and vegfa to have an association with the total cholesterol, triglycerides, glucose, corrected lipid disposition, as well as co- localization of (macrophage and lipid deposition,) (neutrophils and lipid deposition) and (macrophage and neutrophils). In dataset 2, CCA found previously reported correlation of genes apobb1 and apoea with total cholesterol, low-density lipoprotein and triglycerides as well as co localization of neutrophils and lipids. In comparison with hierarchical linear model, CCA represents a powerful and promising tool to identify causal genes for cardiovascular diseases in data from zebrafish model systems.
116

PePIP : a Pipeline for Peptide-Protein Interaction-site Prediction / PePIP : en Pipeline for Förutsägelse av Peptid-Protein Bindnings-site

Johansson-Åkhe, Isak January 2017 (has links)
Protein-peptide interactions play a major role in several biological processes, such as cellproliferation and cancer cell life-cycles. Accurate computational methods for predictingprotein-protein interactions exist, but few of these method can be extended to predictinginteractions between a protein and a particularly small or intrinsically disordered peptide. In this thesis, PePIP is presented. PePIP is a pipeline for predicting where on a given proteina given peptide will most probably bind. The pipeline utilizes structural aligning to perusethe Protein Data Bank for possible templates for the interaction to be predicted, using thelarger chain as the query. The possible templates are then evaluated as to whether they canrepresent the query protein and peptide using a Random Forest classifier machine learningalgorithm, and the best templates are found by using the evaluation from the Random Forest in combination with hierarchical clustering. These final templates are then combined to givea prediction of binding site. PePIP is proven to be highly accurate when testing on a set of 502 experimentally determinedprotein-peptide structures, suggesting a binding site on the correct part of the protein- surfaceroughly 4 out of 5 times.
117

A fast protein-ligand docking method

Genheden, Samuel January 2006 (has links)
In this dissertation a novel approach to protein-ligand docking is presented. First an existing method to predict putative active sites is employed. These predictions are then used to cut down the search space of an algorithm that uses the fast Fourier transform to calculate the geometrical and electrostatic complementarity between a protein and a small organic ligand. A simplified hydrophobicity score is also calculated for each active site. The docking method could be applied either to dock ligands in a known active site or to rank several putative active sites according to their biological feasibility. The method was evaluated on a set of 310 protein-ligand complexes. The results show that with respect to docking the method with its initial parameter settings is too coarse grained. The results also show that with respect to ranking of putative active sites the method works quite well.
118

Graph neural networks for spatial gene expression analysis of the developing human heart

Yuan, Xiao January 2020 (has links)
Single-cell RNA sequencing and in situ sequencing were combined in a recent study of the developing human heart to explore the transcriptional landscape at three developmental stages. However, the method used in the study to create the spatial cellular maps has some limitations. It relies on image segmentation of the nuclei and cell types defined in advance by single-cell sequencing. In this study, we applied a new unsupervised approach based on graph neural networks on the in situ sequencing data of the human heart to find spatial gene expression patterns and detect novel cell and sub-cell types. In this thesis, we first introduce some relevant background knowledge about the sequencing techniques that generate our data, machine learning in single-cell analysis, and deep learning on graphs. We have explored several graph neural network models and algorithms to learn embeddings for spatial gene expression. Dimensionality reduction and cluster analysis were performed on the embeddings for visualization and identification of biologically functional domains. Based on the cluster gene expression profiles, locations of the clusters in the heart sections, and comparison with cell types defined in the previous study, the results of our experiments demonstrate that graph neural networks can learn meaningful representations of spatial gene expression in the human heart. We hope further validations of our clustering results could give new insights into cell development and differentiation processes of the human heart.
119

Computational Methods for the structural and dynamical understanding of GPCR-RAMP interactions

Bahena, Silvia January 2020 (has links)
Protein-protein interaction dominates all major biology processes in living cells. Recent studies suggestthat the surface expression and activity of G protein-coupled receptors (GPCRs), which are the largestfamily of receptors in human cells, can be modulated by receptor activity–modifying proteins (RAMPs). Computational tools are essential to complement experimental approaches for the understanding ofmolecular activity of living cells and molecular dynamics simulations are well suited to providemolecular details of proteins function and structure. The classical atom-level molecular modeling ofbiological systems is limited to small systems and short time scales. Therefore, its application iscomplicated for systems such as protein-protein interaction in cell-surface membrane. For this reason, coarse-grained (CG) models have become widely used and they represent an importantstep in the study of large biomolecular systems. CG models are computationally more effective becausethey simplify the complexity of the protein structure allowing simulations to have longer timescales. The aim of this degree project was to determine if the applications of coarse-grained molecularsimulations were suitable for the understanding of the dynamics and structural basis of the GPCRRAMP interactions in a membrane environment. Results indicate that the study of protein-proteininteractions using CG needs further improvement with a more accurate parameterization that will allowthe study of complex systems.
120

Modeling receptor induced signaling in MSNs : Interaction between molecules involved in striatal synaptic plasticity

Nair, Anu G. January 2014 (has links)
Basal Ganglia are evolutionarily conserved brain nuclei involved in several physiologically important animal behaviors like motor control and reward learning. Striatum, which is the input nuclei of basal ganglia, integrates inputs from several neurons, like cortical and thalamic glutamatergic input and local GABAergic inputs. Several neuromodulators, such as dopamine, accetylcholine and serotonin modulate the functional properties of striatal neurons. Aberrations in the intracellular signaling of these neurons lead to several debilitating neurodegenerative diseases, like Parkinson’s disease. In order to understand these aberrations we should first identify the role of different molecular players in the normal physiology. The long term goal of this research is to understand the molecular mechanisms responsible for the integration of different neuromodulatory signals by striatal medium spiny neurons (MSN). This signal integration is known to play important role in learning. This is manifested via changes in the synaptic weights between different neurons. The group of synpases taken into consideration for the current work is the corticostriatal one, which are synapses between the cortical projection neurons and MSNs. One of the molecular processes of considerable interest is the interaction between dopaminergic and cholinergic inputs. In this thesis I have investigated the interactions between the biochemical cascades triggered by dopaminergic, cholinergic (ACh) and glutamatergic inputs to the striatal MSN. The dopamine induced signaling increases the levels of cAMP in the striatonigral MSNs. The sources of dopamine and acetylcholine are dopaminergic neurons (DAN) from midbrain and tonically active cholinergic interneurons (TAN) of striatum, respectively. A sub-second burst activity in DAN along with a simultaneous pause in TAN is a characteristic effect elicited by a salient stimulus. This, in turn, leads to a dopamine peak and, possibly, an acetylcholine (ACh) dip in striatum. I have looked into the possibility of sensing this ACh dip and the dopamine peak at striatonigral MSNs. These neurons express D1 dopamine receptor (D1R) coupled to Golf and M4 Muscarinic receptor (M4R) coupled to Gi/o . These receptors are expressed significantly in the dendritic spines of these neurons where the Adenylate Cyclase 5 (AC5) is a point of convergence for these two signals. Golf stimulates the production of cAMP by AC5 whereas Gi/o inhibits the Golf mediated cAMP production. I have performed a kinetic-modeling exercise to explore how dopamine and ACh interacts with each other via these receptors and what are the effects on the downstream signaling events. The results of model simulation suggest that the striatonigral MSNs are able to sense the ACh dip via M4R. They integrate the dip with the dopamine peak to activate AC5 synergistically. We also found that the ACh tone may act as a potential noise filter against noisy dopamine signals. The parameters for the G-protein GTPase activity indicate towards an important role of GTPase Activating Proteins (GAPs), like RGS, in this process. Besides this we also hypothesize that M4R may have therapeutic potential. / <p>QC 20140325</p>

Page generated in 1.4705 seconds