Spelling suggestions: "subject:"data discretization"" "subject:"data iscretization""
1 |
Polynomial Models for Systems Biology: Data Discretization and Term Order Effect on DynamicsDimitrova, Elena Stanimirova 12 September 2006 (has links)
Systems biology aims at system-level understanding of biological systems, in particular cellular networks. The milestones of this understanding are knowledge of the structure of the system, understanding of its dynamics, effective control methods, and powerful prediction capability. The complexity of biological systems makes it inevitable to consider mathematical modeling in order to achieve these goals.
The enormous accumulation of experimental data representing the activities of the living cell has triggered an increasing interest in the reverse engineering of biological networks from data. In particular, construction of discrete models for reverse engineering of biological networks is receiving attention, with the goal of providing a coarse-grained description of such networks. In this dissertation we consider the modeling framework of polynomial dynamical systems over finite fields constructed from experimental data. We present and propose solutions to two problems inherent in this modeling method: the necessity of appropriate discretization of the data and the selection of a particular polynomial model from the set of all models that fit the data.
Data discretization, also known as binning, is a crucial issue for the construction of discrete models of biological networks. Experimental data are however usually continuous, or, at least, represented by computer floating point numbers. A major challenge in discretizing biological data, such as those collected through microarray experiments, is the typically small samples size. Many methods for discretization are not applicable due to the insufficient amount of data. The method proposed in this work is a first attempt to develop a discretization tool that takes into consideration the issues and limitations that are inherent in short data time courses. Our focus is on the two characteristics that any discretization method should possess in order to be used for dynamic modeling: preservation of dynamics and information content and inhibition of noise.
Given a set of data points, of particular importance in the construction of polynomial models for the reverse engineering of biological networks is the collection of all polynomials that vanish on this set of points, the so-called ideal of points. Polynomial ideals can be represented through a special finite generating set, known as Gröbner basis, that possesses some desirable properties. For a given ideal, however, the Gröbner basis may not be unique since its computation depends on the choice of leading terms for the multivariate polynomials in the ideal. The correspondence between data points and uniqueness of Gröbner bases is studied in this dissertation. More specifically, an algorithm is developed for finding all minimal sets of points that, added to the given set, have a corresponding ideal of points with a unique Gröbner basis. This question is of interest in itself but the main motivation for studying it was its relevance to the construction of polynomial dynamical systems.
This research has been partially supported by NIH Grant Nr. RO1GM068947-01. / Ph. D.
|
2 |
Vers une approche hybride mêlant arbre de classification et treillis de Galois pour de l'indexation d'images / Towards an hybrid model between decision trees and Galois lattice for image indexing and classificationGirard, Nathalie 05 July 2013 (has links)
La classification d'images s'articule généralement autour des deux étapes que sont l'étape d'extraction de signatures suivie de l'étape d'analyse des données extraites, ces dernières étant généralement quantitatives. De nombreux modèles de classification ont été proposés dans la littérature, le choix du modèle le plus adapté est souvent guidé par les performances en classification ainsi que la lisibilité du modèle. L'arbre de classification et le treillis de Galois sont deux modèles symboliques connus pour leur lisibilité. Dans sa thèse [Guillas 2007], Guillas a utilisé efficacement les treillis de Galois pour la classification d'images, et des liens structurels forts avec les arbres de classification ont été mis en évidence. Les travaux présentés dans ce manuscrit font suite à ces résultats, et ont pour but de définir un modèle hybride entre ces deux modèles, qui réunissent leurs avantages (leur lisibilité respective, la robustesse du treillis et le faible espace mémoire de l'arbre). A ces fins, l'étude des liens existants entre les deux modèles a permis de mettre en avant leurs différences. Tout d'abord, le type de discrétisation, les arbres utilisent généralement une discrétisation locale tandis que les treillis, initialement définis pour des données binaires, utilisent une discrétisation globale. A partir d'une étude des propriétés des treillis dichotomiques (treillis définis après une discrétisation), nous proposons une discrétisation locale pour les treillis permettant d'améliorer ses performances en classification et de diminuer sa complexité structurelle. Puis, le processus de post-élagage mis en œuvre dans la plupart des arbres a pour objectif de diminuer la complexité de ces derniers, mais aussi d'augmenter leurs performances en généralisation. Les simplifications de la structure de treillis (exponentielle en la taille de données dans les pires cas), quant à elles, sont motivées uniquement par une diminution de la complexité structurelle. En combinant ces deux simplifications, nous proposons une simplification de la structure du treillis obtenue après notre discrétisation locale et aboutissant à un modèle de classification hybride qui profite de la lisibilité des deux modèles tout en étant moins complexe que le treillis mais aussi performant que celui-ci. / Image classification is generally based on two steps namely the extraction of the image signature, followed by the extracted data analysis. Image signature is generally numerical. Many classification models have been proposed in the literature, among which most suitable choice is often guided by the classification performance and the model readability. Decision trees and Galois lattices are two symbolic models known for their readability. In her thesis {Guillas 2007}, Guillas efficiently used Galois lattices for image classification. Strong structural links between decision trees and Galois lattices have been highlighted. Accordingly, we are interested in comparing models in order to design a hybrid model between those two. The hybrid model will combine the advantages (robustness of the lattice, low memory space of the tree and readability of both). For this purpose, we study the links between the two models to highlight their differences. Firstly, the discretization type where decision trees generally use a local discretization while Galois lattices, originally defined for binary data, use a global discretization. From the study of the properties of dichotomic lattice (specific lattice defined after discretization), we propose a local discretization for lattice that allows us to improve its classification performances and reduces its structural complexity. Then, the process of post-pruning implemented in most of the decision trees aims to reduce the complexity of the latter, but also to improve their classification performances. Lattice filtering is solely motivated by a decrease in the structural complexity of the structures (exponential in the size of data in the worst case). By combining these two processes, we propose a simplification of the lattice structure constructed after our local discretization. This simplification leads to a hybrid classification model that takes advantage of both decision trees and Galois lattice. It is as readable as the last two, while being less complex than the lattice but also efficient.
|
3 |
Algorithms for modeling and simulation of biological systems; applications to gene regulatory networksVera-Licona, Martha Paola 27 June 2007 (has links)
Systems biology is an emergent field focused on developing a system-level understanding of biological systems. In the last decade advances in genomics, transcriptomics and proteomics have gathered a remarkable amount data enabling the possibility of a system-level analysis to be grounded at a molecular level. The reverse-engineering of biochemical networks from experimental data has become a central focus in systems biology. A variety of methods have been proposed for the study and identification of the system's structure and/or dynamics.
The objective of this dissertation is to introduce and propose solutions to some of the challenges inherent in reverse-engineering of biological systems.
First, previously developed reverse engineering algorithms are studied and compared using data from a simulated network. This study draws attention to the necessity for a uniform benchmark that enables an ob jective comparison and performance evaluation of reverse engineering methods.
Since several reverse-engineering algorithms require discrete data as input (e.g. dynamic Bayesian network methods, Boolean networks), discretization methods are being used for this purpose. Through a comparison of the performance of two network inference algorithms that use discrete data (from several different discretization methods) in this work, it has been shown that data discretization is an important step in applying network inference methods to experimental data.
Next, a reverse-engineering algorithm is proposed within the framework of polynomial dynamical systems over finite fields. This algorithm is built for the identification of the underlying network structure and dynamics; it uses as input gene expression data and, when available, a priori knowledge of the system. An evolutionary algorithm is used as the heuristic search method for an exploration of the solution space. Computational algebra tools delimit the search space, enabling also a description of model complexity. The performance and robustness of the algorithm are explored via an artificial network of the segment polarity genes in the D. melanogaster.
Once a mathematical model has been built, it can be used to run simulations of the biological system under study. Comparison of simulated dynamics with experimental measurements can help refine the model or provide insight into qualitative properties of the systems dynamical behavior. Within this work, we propose an efficient algorithm to describe the phase space, in particular to compute the number and length of all limit cycles of linear systems over a general finite field.
This research has been partially supported by NIH Grant Nr. RO1GM068947-01. / Ph. D.
|
4 |
Aproksimativna diskretizacija tabelarno organizovanih podataka / Approximative Discretization of Table-Organized DataOgnjenović Višnja 27 September 2016 (has links)
<p>Disertacija se bavi analizom uticaja raspodela podataka na rezultate algoritama diskretizacije u okviru procesa mašinskog učenja. Na osnovu izabranih baza i algoritama diskretizacije teorije grubih skupova i stabala odlučivanja, istražen je uticaj odnosa raspodela podataka i tačaka reza određene diskretizacije.<br />Praćena je promena konzistentnosti diskretizovane tabele u zavisnosti od položaja redukovane tačke reza na histogramu. Definisane su fiksne tačke reza u zavisnosti od segmentacije multimodal raspodele, na osnovu kojih je moguće raditi redukciju preostalih tačaka reza. Za određivanje fiksnih tačaka konstruisan je algoritam FixedPoints koji ih određuje u skladu sa grubom segmentacijom multimodal raspodele.<br />Konstruisan je algoritam aproksimativne diskretizacije APPROX MD za redukciju tačaka reza, koji koristi tačke reza dobijene algoritmom maksimalne razberivosti i parametre vezane za procenat nepreciznih pravila, ukupni procenat klasifikacije i broj tačaka redukcije. Algoritam je kompariran u odnosu na algoritam maksimalne razberivosti i u odnosu na algoritam maksimalne razberivosti sa aproksimativnim rešenjima za α=0,95.</p> / <p>This dissertation analyses the influence of data distribution on the results of discretization algorithms within the process of machine learning. Based on the chosen databases and the discretization algorithms within the rough set theory and decision trees, the influence of the data distribution-cuts relation within certain discretization has been researched.<br />Changes in consistency of a discretized table, as dependent on the position of the reduced cut on the histogram, has been monitored. Fixed cuts have been defined, as dependent on the multimodal segmentation, on basis of which it is possible to do the reduction of the remaining cuts. To determine the fixed cuts, an algorithm FixedPoints has been constructed, determining these points in accordance with the rough segmentation of multimodal distribution.<br />An algorithm for approximate discretization, APPROX MD, has been constructed for cuts reduction, using cuts obtained through the maximum discernibility (MD-Heuristic) algorithm and the parametres related to the percent of imprecise rules, the total classification percent and the number of reduction cuts. The algorithm has been compared to the MD algorithm and to the MD algorithm with approximate solutions for α=0,95.</p>
|
Page generated in 0.0925 seconds