581 |
Algebraic dynamic programming over general data structuresHöner zu Siederdissen, Christian, Prohaska, Sonja J., Stadler, Peter F. January 2016 (has links)
Background: Dynamic programming algorithms provide exact solutions to many problems in computational biology, such as sequence alignment, RNA folding, hidden Markov models (HMMs), and scoring of phylogenetic trees. Structurally analogous algorithms compute optimal solutions, evaluate score distributions, and perform stochastic sampling. This is explained in the theory of Algebraic Dynamic Programming (ADP) by a strict separation of state space traversal (usually represented by a context free grammar), scoring (encoded as an algebra), and choice rule. A key ingredient in this theory is the use of yield parsers that operate on the ordered input data structure, usually strings or ordered trees. The computation of ensemble properties, such as a posteriori probabilities of HMMs or partition functions in RNA folding, requires the combination of two distinct, but intimately related algorithms, known as the inside and the outside recursion. Only the inside recursions are covered by the classical ADP theory. Results: The ideas of ADP are generalized to a much wider scope of data structures by relaxing the concept of parsing. This allows us to formalize the conceptual complementarity of inside and outside variables in a natural way. We demonstrate that outside recursions are generically derivable from inside decomposition schemes. In addition to rephrasing the well-known algorithms for HMMs, pairwise sequence alignment, and RNA folding we show how the TSP and the shortest Hamiltonian path problem can be implemented efficiently in the extended ADP framework. As a showcase application we investigate the ancient evolution of HOX gene clusters in terms of shortest Hamiltonian paths. Conclusions: The generalized ADP framework presented here greatly facilitates the development and implementation of dynamic programming algorithms for a wide spectrum of applications.
|
582 |
Applications of Deep Neural Networks in Computer-Aided Drug DesignAhmadreza Ghanbarpour Ghouchani (10137641) 01 March 2021 (has links)
<div>Deep neural networks (DNNs) have gained tremendous attention over the recent years due to their outstanding performance in solving many problems in different fields of science and technology. Currently, this field is of interest to many researchers and growing rapidly. The ability of DNNs to learn new concepts with minimal instructions facilitates applying current DNN-based methods to new problems. Here in this dissertation, three methods based on DNNs are discussed, tackling different problems in the field of computer-aided drug design.</div><div><br></div><div>The first method described addresses the problem of prediction of hydration properties from 3D structures of proteins without requiring molecular dynamics simulations. Water plays a major role in protein-ligand interactions and identifying (de)solvation contributions of water molecules can assist drug design. Two different model architectures are presented for the prediction the hydration information of proteins. The performance of the methods are compared with other conventional methods and experimental data. In addition, their applications in ligand optimization and pose prediction is shown.</div><div><br></div><div>The design of de novo molecules has always been of interest in the field of drug discovery. The second method describes a generative model that learns to derive features from protein sequences to design de novo compounds. We show how the model can be used to generate molecules similar to the known for the targets the model have not seen before and compare with benchmark generative models.</div><div><br></div><div>Finally, it is demonstrated how DNNs can learn to predict secondary structure propensity values derived from NMR ensembles. Secondary structure propensities are important in identifying flexible regions in proteins. Protein flexibility has a major role in drug-protein binding, and identifying such regions can assist in development of methods for ligand binding prediction. The prediction performance of the method is shown for several proteins with two or more known secondary structure conformations.</div>
|
583 |
Adaptive Evolution of Long Non-Coding RNAsWalter Costa, Maria Beatriz 07 December 2018 (has links)
Chimpanzee is the closest living species to modern humans. Although the differences in phenotype are striking between these two species, the difference in genomic sequences is surprisingly small. Species specific changes and positive selection have been mostly found in proteins, but ncRNAs are also involved, including the largely uncharacterized class of long ncRNAs (lncRNAs). A notable example is the Human Accelerated Region 1 (HAR1), the region in the human genome with the highest number of human specific substitutions: 18 in 118 nucleotides. HAR1 is located in a pair of overlapping lncRNAs that are expressed in a crucial period for brain development. Importantly, structural rather then sequence constraints lead to evolution of many ncRNAs. Different methods have been developed for detecting negative selection in ncRNA structures, but none thus far for positive selection.
This motivated us to develop a novel method: the SSS-test (Selection on the Secondary Structure test). This novel method uses an excess of structure changing changes as a means of identifying positive selection. This is done using reports from RNAsnp, a tool that quantifies the structural effect of SNPs on RNA structures, and by applying multiple correction on the observations to generate selection scores. Insertions and deletions (indels) are dealt with separately using rank statistics and a background model. The scores for SNPs and indels are combined to calculate a final selection score for each of the input sequences, indicating the type of selection. We benchmarked the SSS-test with biological and synthetic datasets, obtaining coherent signals. We then applied it to a lncRNA database and obtained a set of 110 human lncRNAs as candidates for having evolved under adaptive evolution in humans.
Although lncRNAs have poor sequence conservation, they have conserved splice sites, which provide ideal guides for orthology annotation. To provide an alternative method for assigning orthology for lncRNAs, we developed the 'buildOrthologs' tool. It uses as input a map of ortholog splice sites created by the SpliceMap tool and applies a greedy algorithm to reconstruct valid ortholog transcripts. We applied this novel approach to create a well-curated catalog of lncRNA orthologs for primate species.
Finally, to understand the structural evolution of ncRNAs in full detail, we added a temporal aspect to the analysis. What was the order of mutations of a structure since its origin? This is a combinatorial problem, in which the exact mutations between ancestral and extant sequences must be put in order. For this, we developed the 'mutationOrder' tool using dynamic programming. It calculates every possible order of mutations and assigns probabilities to every path. We applied this novel tool to HAR1 as a case study and saw that the co-optimal paths that are equally likely to have occured share qualitatively comparable features. In general, they lead to stabilization of the human structure since the ancestral. We propose that this stabilization was caused by adaptive evolution.
With the new methods we developed and our analysis of primate databases, we gained new knowledge about adaptive evolution of human lncRNAs.
|
584 |
Novel concepts for lipid identification from shotgun mass spectra using a customized query languageHerzog, Ronny 30 May 2012 (has links)
Lipids are the main component of semipermeable cell membranes and linked to several important physiological processes. Shotgun lipidomics relies on the direct infusion of total lipid extracts from cells, tissues or organisms into the mass spectrometer and is a powerful tool to elucidate their molecular composition. Despite the technical advances in modern mass spectrometry the currently available software underperforms in several aspects of the lipidomics pipeline. This thesis addresses these issues by presenting a new concept for lipid identification using a customized query language for mass spectra in combination with efficient spectra alignment algorithms which are implemented in the open source kit “LipidXplorer”.
|
585 |
Evaluating Response Images From Protein QuantificationEngström, Mathias, Olby, Erik January 2020 (has links)
Gyros Protein Technologies develops instruments for automated immunoassays. Fluorescent antibodies are added to samples and excited with a laser. This results in a 16-bit image where the intensity is correlated to concentration of bound antibody. Artefacts may appear on the images due to dust, fibers or other problems, which affect the quantification. This project seeks to automatically detect such artifacts by classifying the images as good or bad using Deep Convolutional Neural Networks (DCNNs). To augment the dataset a simulation approach is used and a simulation program is developed that generates images based on developed simulation models. Several classification models are tested as well as different techniques used for training. The highest performing classifier is a VGG16 DCNN, pre-trained on simulated images, which reaches 94.8% accuracy. There are many sub-classes in the bad class, and many of these are very underrepresented in both the training and test datasets. This means that not much can be said of the classification power of these sub-classes. The conclusion is therefore that until more of this rare data can be collected, focus should lie on classifying the other more common examples. Using the approaches from this project, we believe this could result in a high performing product.
|
586 |
Tumörspridning med artificiell evolution : Warburgeffekten och cancercellers metabolismNäsström, David, Medhage, Marcus January 2022 (has links)
Denna rapport syftar till att implementera en metod för att simulera cancerceller och skapa en ökad förståelse för hur Warburgeffekten, vilket är cancercellers användning av anaerob metabolism under aeroba förhållanden, påverkar cancerceller. Detta undersöks genom att simulera i en dator hur syrehalten påverkar andelen anaeroba cancerceller i en tumör och dess spridning. I studien undersöks fem olika syrenivåer. Simuleringen görs med en Cellular Automaton-modell och startar med ett mindre antal cancerceller i mitten av ett 200x200-rutnät, omgivna av friska celler. Cancercellerna och deras beslutsmekanismer modelleras med artificiella neurala nätverk och friska celler med fastställda regler. Cancercellerna kan vid delning muteras och ge upphov till nya beteenden som sedan blir en del av selektionsprocessen. Simuleringarna visar att cancercellerna, oberoende av syrehalten, sprider sig på ett likartat vis. Genom att vissa av cancercellerna övergår från aerob till anaerob metabolism så försurar cancertumören sin omgivning, vilket dödar friska celler. Syrehaltens påverkan på andelen anaeroba celler hos tumören visar sig ha betydelse, men det är främst hos den lägsta syrehalten en markant ökning av andelen anaeroba celler noteras. Noterbart är även att andelen anaeroba celler i den här studien, för alla syrehalter, är avsevärt lägre än de 60 % som påvisats i vissa studier av Warburgeffekten gjorda på levande celler.
|
587 |
Generative Modelling and Probabilistic Inference of Growth Patterns of Individual MicrobesNagarajan, Shashi January 2022 (has links)
The fundamental question of how cells maintain their characteristic size remains open. Cell size measurements made through microscopic time-lapse imaging of microfluidic single cell cultivations have posed serious challenges to classical cell growth models and are supporting the development of newer, nuanced models that explain empirical findings better. Yet current models are limited, either to specific types of cells and/or to cell growth under specific microenvironmental conditions. Together with the fact that tools for robust analysis of said time-lapse images are not widely available as yet, the above-mentioned point presents an opportunity to progress the cell growth and size homeostasis discourse through generative, probabilistic modeling and analysis of the utility of different statistical estimation and inference techniques in recovering the parameters of the same. In this thesis, I present a novel Model Framework for simulating microfluidic single-cell cultivations with 36 different simulation modalities, each integrating dominant cell growth theories and generative modelling techniques. I also present a comparative analysis of how different Frequentist and Bayesian probabilistic inference techniques such as Nuisance Variable Elimination and Variational Inference work in the context of a case study of the estimation of a single model describing a microfluidic cell cultivation.
|
588 |
Catch the dream Wave : Propagation of Cortical Slow Oscillation to the Striatum in anaesthetised miceFerreira, Tiago January 2014 (has links)
Under anaesthesia or in deep sleep, different parts of the brain have a distinctive slow oscillatory activity, characterised by states of high membrane potential and intensive spiking activity, the Up-states; followed by hyperpolarisation and quiescence, the Down-states. This activity has been previously described in vitro and in vivo in the cortex and the striatum, across several species. Here, we look into it, during anaesthesia, in the mouse brain. Using whole-cell patch-clamp recordings of cortical cells, it was possible to compare different signal processing methods used to extract the Up-and- Down states in extracellular recordings of the cortex. Our results show that the method based on the Multi-Unit Activity (> 200Hz) have better ac- curacy than High-Gamma Range (20 100Hz) or wavelet decomposition (< 2Hz band). After establishing the most robust method, this was used to compare the intracellular recordings of striatal cells to different parts of the cortex. The results obtained here support a functional connection between the dorsolateral striatal neurons and the ipsilateral barrel field. They also support a functional connection between dorsomedial striatal cells and the primary visual cortex. The analysis of delay between recordings allowed to establish temporal relationships between the contralateral barrel field, the ipsilateral barrel field, and the dorsolateral striatum; and between the ipsilateral barrel field, the ipsilateral primary visual field and the dorsomedial striatum. / <p>External Advisor: Dr. Ramon Reig, from Karolinska Institutet</p>
|
589 |
Integrative Analysis of Multimodal Biomedical Data with Machine LearningZhi Huang (11170170) 23 July 2021 (has links)
<div>With the rapid development in high-throughput technologies and the next generation sequencing (NGS) during the past decades, the bottleneck for advances in computational biology and bioinformatics research has shifted from data collection to data analysis. As one of the central goals in precision health, understanding and interpreting high-dimensional biomedical data is of major interest in computational biology and bioinformatics domains. Since significant effort has been committed to harnessing biomedical data for multiple analyses, this thesis is aiming for developing new machine learning approaches to help discover and interpret the complex mechanisms and interactions behind the high dimensional features in biomedical data. Moreover, this thesis also studies the prediction of post-treatment response given histopathologic images with machine learning.</div><div><br></div><div>Capturing the important features behind the biomedical data can be achieved in many ways such as network and correlation analyses, dimensionality reduction, image processing, etc. In this thesis, we accomplish the computation through co-expression analysis, survival analysis, and matrix decomposition in supervised and unsupervised learning manners. We use co-expression analysis as upfront feature engineering, implement survival regression in deep learning to predict patient survival and discover associated factors. By integrating Cox proportional hazards regression into non-negative matrix factorization algorithm, the latent clusters of human genes are uncovered. Using machine learning and automatic feature extraction workflow, we extract thirty-six image features from histopathologic images, and use them to predict post-treatment response. In addition, a web portal written by R language is built in order to bring convenience to future biomedical studies and analyses.</div><div><br></div><div>In conclusion, driven by machine learning algorithms, this thesis focuses on the integrative analysis given multimodal biomedical data, especially the supervised cancer patient survival prognosis, the recognition of latent gene clusters, and the application of predicting post-treatment response from histopathologic images. The proposed computational algorithms present its superiority comparing to other state-of-the-art models, provide new insights toward the biomedical and cancer studies in the future.</div>
|
590 |
Place and Route Algorithms for a Neuromorphic Communication Network SimulatorPettersson, Fredrik January 2021 (has links)
In recent years, neural networks have seen increased interest from both the cognitive computing and computation neuroscience fields. Neuromorphic computing systems simulate neural network efficiently, but have not yet reached the amount of neurons that a mammal has. Increasing this quantity is an aspiration, but more neurons will also increase the traffic load of the system. The placement of the neurons onto the neuromorphic computing system has a significant effect on the network load. This thesis introduces algorithms for placing a large amount of neurons in an efficient and agile way. First, an analysis of placement algorithms for very large scale integration design is done, displaying that computing complexity of these algorithms is high. When using the predefined underlying structure of the neural network, more rapid algorithms can be used. The results show that the population placement algorithm has high computing speed as well as providing exceptional result.
|
Page generated in 0.1438 seconds