Global ETD Search

11	Querying semistructured data based on schema matching Bergholz, André 24 January 2000 (has links) Daten werden noch immer groesstenteils in Dateien und nicht in Datenbanken gespeichert. Dieser Trend wird durch den Internetboom der 90er Jahre nur noch verstaerkt. Daraus ist das Forschungsgebiet der semistrukturierten Daten entstanden. Semistrukturierte Daten sind Daten, die meist in Dokumenten gespeichert sind und eine implizite und irregulaere Struktur aufweisen. HTML- oder BibTeX-Dateien oder in ASCII-Dateien gespeicherte Genomdaten sind Beispiele. Traditionelles Datenbankmanagement erfordert Design und sichert Deklarativitaet zu. Dies ist im Umfeld der semistrukturierten Daten nicht gegeben, ein flexiblerer Ansatz wird gebraucht. In dieser Arbeit wird ein neuer Ansatz des Abfragens semistrukturierter Daten praesentiert. Wir schlagen vor, semistrukturierte Daten durch eine Menge von partiellen Schemata zu beschreiben, anstatt zu versuchen, ein globales Schema zu definieren. Letzteres ist zwar geeignet, einen effizienten Zugriff auf Daten zu ermoeglichen; ein globales Schema fuer semistrukturierte Daten leidet aber zwangslaeufig an der Irregularitaet der Struktur der Daten. Wegen der vielen Ausnahmen vom intendierten Schema wird ein globales Schema schnell sehr gross und wenig repraesentativ. Damit wird dem Nutzer ein verzerrtes Bild ueber die Daten gegeben. Hingegen koennen partielle Schemata eher ein repraesentatives Bild eines Teils der Daten darstellen. Mit Hilfe statistischer Methoden kann die Guete eines partiellen Schemas bewertet werden, ebenso koennen irrelevante Teile der Datenbank identifiziert werden. Ein Datenbanksystem, das auf partiellen Schemata basiert, ist flexibler und reflektiert den Grad der Strukturierung auf vielen Ebenen. Seine Benutzbarkeit und seine Performanz steigen mit einem hoeheren Grad an Struktur und mit seiner Nutzungsdauer. Partielle Schemata koennen auf zwei Arten gewonnen werden. Erstens koennen sie durch einen Datenbankdesigner bereitgestellt werden. Es ist so gut wie unmoeglich, eine semistrukturierte Datenbank komplett zu modellieren, das Modellieren gewisser Teile ist jedoch denkbar. Zweitens koennen partielle Schemata aus Benutzeranfragen gewonnen werden, wenn nur die Anfragesprache entsprechend entworfen und definiert wird. Wir schlagen vor, eine Anfrage in einen ``Was''- und einen ``Wie''-Teil aufzuspalten. Der ``Was''-Teil wird durch partielle Schemata repraesentiert. Partielle Schemata beinhalten reiche semantische Konzepte, wie Variablendefinitionen und Pfadbeschreibungen, die an Konzepte aus Anfragesprachen angelehnt sind. Mit Variablendefinitionen koennen verschiedene Teile der Datenbank miteinander verbunden werden. Pfadbeschreibungen helfen, durch das Zulassen einer gewissen Unschaerfe, die Irregularitaet der Struktur der Daten zu verdecken. Das Finden von Stellen der Datenbank, die zu einem partiellen Schema passen, bildet die Grundlage fuer alle Arten von Anfragen. Im ``Wie''-Teil der Anfrage werden die gefundenen Stellen der Datenbank fuer die Antwort modifiziert. Dabei koennen Teile der gefundenen Entsprechungen des partiellen Schemas ausgeblendet werden oder auch die Struktur der Antwort voellig veraendert werden. Wir untersuchen die Ausdrucksstaerke unserer Anfragesprache, in dem wir einerseits die Operatoren der relationalen Algebra abbilden und andererseits das Abfragen von XML-Dokumenten demonstrieren. Wir stellen fest, dass das Finden der Entsprechungen eines Schemas (wir nennen ein partielles Schema in der Arbeit nur Schema) den aufwendigsten Teil der Anfragebearbeitung ausmacht. Wir verwenden eine weitere Abstraktionsebene, die der Constraint Satisfaction Probleme, um die Entsprechungen eines Schemas in einer Datenbank zu finden. Constraint Satisfaction Probleme bilden eine allgemeine Klasse von Suchproblemen. Fuer sie existieren bereits zahlreiche Optimierungsalgorithmen und -heuristiken. Die Grundidee besteht darin, Variablen mit zugehoerigen Domaenen einzufuehren und dann die Werte, die verschiedene Variablen gleichzeitig annehmen koennen, ueber Nebenbedingungen zu steuern. In unserem Ansatz wird das Schema in Variablen ueberfuehrt, die Domaenen werden aus der Datenbank gebildet. Nebenbedingungen ergeben sich aus den im Schema vorhandenen Praedikaten, Variablendefinitionen und Pfadbeschreibungen sowie aus der Graphstruktur des Schemas. Es werden zahlreiche Optimierungstechniken fuer Constraint Satisfaction Probleme in der Arbeit vorgestellt. Wir beweisen, dass die Entsprechungen eines Schemas in einer Datenbank ohne Suche und in polynomialer Zeit gefunden werden koennen, wenn das Schema ein Baum ist, keine Variablendefinitionen enthaelt und von der Anforderung der Injektivitaet einer Einbettung abgesehen wird. Zur Optimierung wird das Enthaltensein von Schemata herangezogen. Das Enthaltensein von Schemata kann auf zwei Weisen, je nach Richtung der Enthaltenseinsbeziehung, genutzt werden: Entweder kann der Suchraum fuer ein neues Schema reduziert werden oder es koennen die ersten passenden Stellen zu einem neuen Schema sofort praesentiert werden. Der gesamte Anfrageansatz wurde prototypisch zunaechst in einem Public-Domain Prolog System, spaeter im Constraintsystem ECLiPSe implementiert und mit Anfragen an XML-Dokumente getestet. Dabei wurden die Auswirkungen verschiedener Optimierungen getestet. Ausserdem wird eine grafische Benutzerschnittstelle zur Verfuegung gestellt. / Most of today's data is still stored in files rather than in databases. This fact has become even more evident with the growth of the World Wide Web in the 1990s. Because of that observation, the research area of semistructured data has evolved. Semistructured data is typically stored in documents and has an irregular, partial, and implicit structure. The thesis presents a new framework for querying semistructured data. Traditional database management requires design and ensures declarativity. The possibilities to design are limited in the field of semistructured data, thus, a more flexible approach is needed. We argue that semistructured data should be represented by a set of partial schemata rather than by one complete schema. Because of irregularities of the data, a complete schema would be very large and not representative. Instead, partial schemata can serve as good representations of parts of the data. While finding a complete schema turns out to be difficult, a database designer may be able to provide partial schemata for the database. Also, partial schemata can be extracted from user queries if the query language is designed appropriately. We suggest to split the notion of query into a ``What''- and a ``How''-part. Partial schemata represent the ``What''-part. They cover semantically richer concepts than database schemata traditionally do. Among these concepts are predicates, variable definitions, and path descriptions. Schemata can be used for query optimization, but they also give users hints on the content of the database. Finding the occurrences (matches) of such a schema forms the most important part of query execution. All queries of our approach, such as the focus query or the transformation query, are based on this matching. Query execution can be optimized using knowledge about containment relationships between different schemata. Our approach and the optimization techniques are conceptually modeled and implemented as a prototype on the basis of Constraint Satisfaction Problems (CSPs). CSPs form a general class of search problems for which many techniques and heuristics exist. A CSP consists of variables that have a domain associated to them. Constraints restrict the values that variables can simultaneously take. We transform the problem of finding the matches of a schema in a database to a CSP. We prove that under certain conditions the matches of a schema can be found without any search and in polynomial time. For optimization purposes the containment relationship between schemata is explored. We formulate a sufficient condition for schema containment and test it again using CSP techniques. The containment relationship can be used in two ways depending on the direction of the containment: It is either possible to reduce the search space when looking for matches of a schema, or it is possible to present the first few matches immediately without any search. Our approach has been implemented into the constraint system ECLiPSe and tested using XML documents. Semistrukturierte Daten Anfragesprachen Anfragebearbeitung Constraint Satisfaction Probleme Semistructured data Query languages Query processing Constraint Satisfaction Problems 510 Mathematik 27 Mathematik ST 270 ddc:510
12	Acquisition de contraintes par apprentissage de structures / Learning and Using Structures for Constraint Acquisition Daoudi, Abderrazak 10 May 2016 (has links) La Programmation par contraintes est un cadre général utilisé pour modéliser et résoudre des problèmes combinatoires complexes. Cependant, la modélisation d'un problème sous forme d’un réseau de contraintes nécessite une bonne expertise dans le domaine. Ce niveau d'expertise est un obstacle majeur pour une large diffusion de la programmation de contraintes. Pour remédier à ce problème, plusieurs systèmes d'acquisition de contraintes ont été proposés pour aider l'utilisateur dans la tâche de modélisation. Dans ces systèmes, l'utilisateur ne répond qu'à des questions très simples. L'inconvénient est que lorsqu'aucune connaissance de base n’est fournie, l'utilisateur peut avoir besoin de répondre à un grand nombre de questions pour apprendre toutes les contraintes. Dans cette thèse, nous montrons que l'utilisation de la structure du problème peut améliorer considérablement le processus d'acquisition. Pour ce faire, nous proposons plusieurs techniques. Tout d'abord, nous introduisons le concept de requête de généralisation basée sur une agrégation de variables sous forme detypes. Deuxièmement, pour faire face aux requêtes de généralisation, nous proposons un algorithme de généralisation de contraintes, nommé GENACQ, ainsi que plusieurs stratégies. Troisièmement, pour rendre la construction de requêtes de généralisation totalement indépendante de l'utilisateur, nous proposons l'algorithme MINE&ASK, qui est en mesure d'apprendre la structure au cours du processus d'acquisition de contraintes, et d'utiliser la structure apprise pour générer des requêtes de généralisation. Quatrièmement, pour aller vers un concept générique de requête, nous introduisons la requête de recommandation basée sur la prédiction de liens dans le graphe de contraintes apprises jusqu’à présent. Cinquièmement, nous proposons un algorithme de recommandation de contraintes, ppelé PREDICT&ASK, qui demande à l’utilisateur de classifier des requêtes de recommandation chaque fois que la structure du graphe courant a été modifiée. Enfin, nous intégrons toutes ces nouvelles techniques dans l’algorithme QUACQ, menant à trois nouvelles versions, à savoir G-QUACQ, M- QUACQ, et P-QUACQ. Pour évaluer toutes ces techniques, nous avons fait des expérimentations sur plusieurs jeux de données. Les résultats montrent que les versions étendues améliorent considérablement le QUACQ de base. / Constraint Programming is a general framework used to model and solve complex combinatorial problems.However, modeling a problem as a constraint network requires significant expertise in the field.Such level of expertise is a bottleneck to the broader uptake of the constraint technology.To alleviate this issue, several constraint acquisition systems have been proposed to assist thenon-expert user in the modeling task. Nevertheless, in these systems the user is only asked to answervery basic questions. The drawback is that when no background knowledge is provided,the user may need to answer a large number of such questions to learn all the constraints.In this thesis, we show that using the structure of the problem under consideration may improvethe acquisition process a lot. To this aim, we propose several techniques.Firstly, we introduce the concept of generalization query based on an aggregation of variables into types.Secondly, to deal with generalization queries, we propose a constraint generalization algorithm, named GENACQ, together with several strategies. Thirdly, to make the build of generalization queries totally independent of the user, we propose the algorithm MINE&ASK, which is able to learn the structure, during the constraint acquisition process, and to use the learned structure to generate generalization queries. Fourthly, toward a generic concept of query, we introduce the recommendation query based on the link prediction on the current constraint graph. Fifthly, we propose a constraint recommender algorithm, called PREDICT&ASK, that asks recommendation queries, each time the structure of the current graph has been modified. Finally, we incorporate all these new generic techniques into QUACQ algorithm leading to three boosted versions, G-QUACQ, M- QUACQ, and P-QUACQ. To evaluate all these techniques, we have made experiments on several benchmarks. The results show that the extended versions improve drastically the basic QUACQ. Programmation par contraintes Acquisition de contraintes Structures Apprentissage automatique Intelligence artificielle Constraint Programming Constraint Satisfaction Problems Constraint Acquisition Structures Machine Learning Artificiel Intelligence
13	Inferência de redes de regulação gênica utilizando o paradigma de crescimento de sementes / Inference of gene regulatory networks using the seed growing paradigm Higa, Carlos Henrique Aguena 17 February 2012 (has links) Um problema importante na área de Biologia Sistêmica é o de inferência de redes de regulação gênica. Os avanços científicos e tecnológicos nos permitem analisar a expressão gênica de milhares de genes simultaneamente. Por \"expressão gênica\'\', estamos nos referindo ao nível de mRNA dentro de uma célula. Devido a esta grande quantidade de dados, métodos matemáticos, estatísticos e computacionais têm sido desenvolvidos com o objetivo de elucidar os mecanismos de regulação gênica presentes nos organismos vivos. Para isso, modelos matemáticos de redes de regulação gênica têm sido propostos, assim como algoritmos para inferir estas redes. Neste trabalho, focamos nestes dois aspectos: modelagem e inferência. Com relação à modelagem, estudamos modelos existentes para o ciclo celular da levedura (Saccharomyces cerevisiae). Após este estudo, propomos um modelo baseado em redes Booleanas probabilísticas sensíveis ao contexto, e em seguida, um aprimoramento deste modelo, utilizando cadeias de Markov não homogêneas. Mostramos os resultados, comparando os nossos modelos com os modelos estudados. Com relação à inferência, propomos um novo algoritmo utilizando o paradigma de crescimento de semente de genes. Neste contexto, uma semente é um pequeno subconjunto de genes de interesse. Nosso algoritmo é baseado em dois passos: passo de crescimento de semente e passo de amostragem. No primeiro passo, o algoritmo adiciona outros genes à esta semente, seguindo algum critério. No segundo, o algoritmo realiza uma amostragem de redes, definindo como saída um conjunto de redes potencialmente interessantes. Aplicamos o algoritmo em dados artificiais e dados biológicos de células HeLa, mostrando resultados satisfatórios. / A key problem in Systems Biology is the inference of gene regulatory networks. The scientific and technological advancement allow us to analyze the gene expression of thousands of genes, simultaneously. By \"gene expression\'\' we refer to the mRNA concentration level inside a cell. Due to this large amount of data, mathematical, statistical and computational methods have been developed in order to elucidate the gene regulatory mechanisms that take part of every living organism. To this end, mathematical models of gene regulatory networks have been proposed, along with algorithms to infer these networks. In this work, we focus in two aspects: modeling and inference. Regarding the modeling, we studied existing models for the yeast (Saccharomyces cerevisiae) cell cycle. After that, we proposed a model based on context sensitive probabilistic Boolean networks, and then, an improvement of this model, using nonhomogeneous Markov chain. We show the results, comparing our models against the studied models. Regarding the inference, we proposed a new algorithm using the seed growing paradigm. In this context, a seed is a small subset of genes. Our algorithm is based in two main steps: seed growing step and sampling step. In the first step, the algorithm adds genes into the seed, according to some criterion. In the second step, the algorithm performs a sampling process on the space of networks, defining as its output a set of potentially interesting networks. We applied the algorithm on artificial and biological HeLa cells data, showing satisfactory results. Boolean networks cadeia de Markov constraint satisfaction problems feature selection gene regulatory networks inference inferência de redes Markov chain redes Booleanas redes de regulação gênica seleção de características
14	Razonamiento espacial cualitativo con relaciones cardinales basado en problemas de satisfacción de restricciones y lógicas modales Morales Nicolás, Antonio 18 June 2010 (has links) El objetivo de esta tesis es proponer mejoras en modelos existentes de razonamiento espacial cualitativo con relaciones cardinales, y proponer nuevos modelos y técnicas de razonamiento utilizando algunos resultados previos del razonamiento temporal cualitativo. Los modelos propuestos se basan en dos formalismos muy utilizados para razonamiento cualitativo: los Problemas de Satisfacción de Restricciones y las Lógicas Modales. / The main goal of this PhD Thesis is to propose improvements to existing models for qualitative spatial reasoning with cardinal direction relations, and to propose new models and reasoning techniques using some previous results from qualitative temporal reasoning. The proposed models are based on two widely used formalisms for Qualitative Reasoning: Constraint Satisfaction Problems and Modal Logics. Modal Logics Constraint Satisfaction Problems Razonamiento Espacial Cualitativo Inteligencia Artificial 004 16 512
15	Statistical Physics of Sparse and Dense Models in Optimization and Inference / Physique statistique des modèles épars et denses en optimisation et inférence Schmidt, Hinnerk Christian 10 October 2018 (has links) Une donnée peut avoir diverses formes et peut provenir d'un large panel d'applications. Habituellement, une donnée possède beaucoup de bruit et peut être soumise aux effets du hasard. Les récents progrès en apprentissage automatique ont relancé les recherches théoriques sur les limites des différentes méthodes probabilistes de traitement du signal. Dans cette thèse, nous nous intéressons aux questions suivantes : quelle est la meilleure performance possible atteignable ? Et comment peut-elle être atteinte, i.e., quelle est la stratégie algorithmique optimale ?La réponse dépend de la forme des données. Les sujets traités dans cette thèse peuvent tous être représentés par des modèles graphiques. Les propriétés des données déterminent la structure intrinsèque du modèle graphique correspondant. Les structures considérées ici sont soit éparses, soit denses. Les questions précédentes peuvent être étudiées dans un cadre probabiliste, qui permet d'apporter des réponses typiques. Un tel cadre est naturel en physique statistique et crée une analogie formelle avec la physique des systèmes désordonnés. En retour, cela permet l'utilisation d'outils spécifiques à ce domaine et de résoudre des problèmes de satisfaction de contraintes et d'inférence statistique. La problématique de performance optimale est directement reliée à la structure des extrema de la fonction d'énergie libre macroscopique, tandis que les aspects algorithmiques proviennent eux de la minimisation de la fonction d'énergie libre microscopique (c'est-à-dire, dans la forme de Bethe).Cette thèse est divisée en quatre parties. Premièrement, nous aborderons par une approche de physique statistique le problème de la coloration de graphes aléatoires et mettrons en évidence un certain nombre de caractéristiques. Dans un second temps, nous calculerons une nouvelle limite supérieure de la taille de l'ensemble contagieux. Troisièmement, nous calculerons le diagramme de phase du modèle de Dawid et Skene dans la région dense en modélisant le problème par une factorisation matricielle de petit rang. Enfin, nous calculerons l'erreur optimale de Bayes pour une classe restreinte de l'estimation matricielle de rang élevé. / Datasets come in a variety of forms and from a broad range of different applications. Typically, the observed data is noisy or in some other way subject to randomness. The recent developments in machine learning have revived the need for exact theoretical limits of probabilistic methods that recover information from noisy data. In this thesis we are concerned with the following two questions: what is the asymptotically best achievable performance? And how can this performance be achieved, i.e., what is the optimal algorithmic strategy? The answer depends on the properties of the data. The problems in this thesis can all be represented as probabilistic graphical models. The generative process of the data determines the structure of the underlying graphical model. The structures considered here are either sparse random graphs or dense (fully connected) models. The above questions can be studied in a probabilistic framework, which leads to an average (or typical) case answer. Such a probabilistic formulation is natural to statistical physics and leads to a formal analogy with problems in disordered systems. In turn, this permits to harvest the methods developed in the study of disordered systems, to attack constraint satisfaction and statistical inference problems. The formal analogy can be exploited as follows. The optimal performance analysis is directly related to the structure of the extrema of the macroscopic free energy. The algorithmic aspects follow from the minimization of the microscopic free energy (that is, the Bethe free energy in this work) which is closely related to message passing algorithms. This thesis is divided into four contributions. First, a statistical physics investigation of the circular coloring problem is carried out that reveals several distinct features. Second, new rigorous upper bounds on the size of minimal contagious sets in random graphs, with bounded maximum degree, are obtained. Third, the phase diagram of the dense Dawid-Skene model is derived by mapping the problem onto low-rank matrix factorization. The associated approximate message passing algorithm is evaluated on real-world data. Finally, the Bayes optimal denoising mean square error is derived for a restricted class of extensive rank matrix estimation problems. Systèmes désordonnés Verres de spin Inférence bayésienne Graphe aléatoire Problème de satisfaction de contraintes Disordered systems Spin glasses Bayesian inference Random graphs Constraint satisfaction problems
16	Complexity issues in counting, polynomial evaluation and zero finding / Complexité de problèmes de comptage, d’évaluation et de recherche de racines de polynômes Briquel, Irénée 29 November 2011 (has links) Dans cette thèse, nous cherchons à comparer la complexité booléenne classique et la complexité algébrique, en étudiant des problèmes sur les polynômes. Nous considérons les modèles de calcul algébriques de Valiant et de Blum, Shub et Smale (BSS). Pour étudier les classes de complexité algébriques, il est naturel de partir des résultats et des questions ouvertes dans le cas booléen, et de regarder ce qu'il en est dans le contexte algébrique. La comparaison des résultats obtenus dans ces deux domains permet ainsi d'enrichir notre compréhension des deux théories. La première partie suit cette approche. En considérant un polynôme canoniquement associé à toute formule booléenne, nous obtenons un lien entre les questions de complexité booléenne sur la formule booléenne et les questions de complexité algébrique sur le polynôme. Nous avons étudié la complexité du calcul de ce polynôme dans le modèle de Valiant en fonction de la complexité de la formule booléenne, et avons obtenu des analogues algébriques à certains résultats booléens. Nous avons aussi pu utiliser des méthodes algébriques pour améliorer certains résultats booléens, en particulier de meilleures réductions de comptage. Une autre motivation aux modèles de calcul algébriques est d'offrir un cadre pour l‘analyse d’algorithmes continus. La seconde partie suit cette approche. Nous sommes partis d’algorithmes nouveaux pour la recherche de zéros approchés d'un système de n polynômes complexes à n inconnues. Jusqu'à présent il s'agissait d'algorithmes pour le modèle BSS. Nous avons étudié l'implémentabilité de ces algorithmes sur un ordinateur booléen et proposons un algorithme booléen. / In the present thesis, we try to compare the classical boolean complexity with the algebraic complexity, by studying problems related to polynomials. We consider the algebraic models from Valiant and from Blum, Shub and Smale (BSS). To study the algebraic complexity classes, one can start from results and open questions from the boolean case, and look at their translation in the algebraic context. The comparison of the results obtained in the two settings will then boost our understanding of both complexity theories. The first part follows this framework. By considering a polynomial canonically associated to a boolean formula, we get a link between boolean complexity issues on the formula and algebraic complexity problems on the polynomial. We studied the complexity of computing the polynomial in Valiant's model, as a function of the complexity of the boolean formula. We found algebraic counterparts to some boolean results. Along the way, we could also use some algebraic methods to improve boolean results, in particular by getting better counting reductions. Another motivation for algebraic models of computation is to offer an elegant framework to the study of numerical algorithms. The second part of this thesis follows this approach. We started from new algorithms for the search of approximate zeros of complex systems of n polynomials in n variables. Up to now, those were BSS machine algorithms. We studied the implementation of these algorithms on digital computers, and propose an algorithm using floating arithmetic for this problem. Complexité algorithmique Complexité algébrique Modèle de Valiant Machine BSS Algorithmic complexity Algebraic complexity Valiant's model BSS machine Constraint Satisfaction Problems Floating point arithmetic
17	Inferência de redes de regulação gênica utilizando o paradigma de crescimento de sementes / Inference of gene regulatory networks using the seed growing paradigm Carlos Henrique Aguena Higa 17 February 2012 (has links) Um problema importante na área de Biologia Sistêmica é o de inferência de redes de regulação gênica. Os avanços científicos e tecnológicos nos permitem analisar a expressão gênica de milhares de genes simultaneamente. Por \"expressão gênica\'\', estamos nos referindo ao nível de mRNA dentro de uma célula. Devido a esta grande quantidade de dados, métodos matemáticos, estatísticos e computacionais têm sido desenvolvidos com o objetivo de elucidar os mecanismos de regulação gênica presentes nos organismos vivos. Para isso, modelos matemáticos de redes de regulação gênica têm sido propostos, assim como algoritmos para inferir estas redes. Neste trabalho, focamos nestes dois aspectos: modelagem e inferência. Com relação à modelagem, estudamos modelos existentes para o ciclo celular da levedura (Saccharomyces cerevisiae). Após este estudo, propomos um modelo baseado em redes Booleanas probabilísticas sensíveis ao contexto, e em seguida, um aprimoramento deste modelo, utilizando cadeias de Markov não homogêneas. Mostramos os resultados, comparando os nossos modelos com os modelos estudados. Com relação à inferência, propomos um novo algoritmo utilizando o paradigma de crescimento de semente de genes. Neste contexto, uma semente é um pequeno subconjunto de genes de interesse. Nosso algoritmo é baseado em dois passos: passo de crescimento de semente e passo de amostragem. No primeiro passo, o algoritmo adiciona outros genes à esta semente, seguindo algum critério. No segundo, o algoritmo realiza uma amostragem de redes, definindo como saída um conjunto de redes potencialmente interessantes. Aplicamos o algoritmo em dados artificiais e dados biológicos de células HeLa, mostrando resultados satisfatórios. / A key problem in Systems Biology is the inference of gene regulatory networks. The scientific and technological advancement allow us to analyze the gene expression of thousands of genes, simultaneously. By \"gene expression\'\' we refer to the mRNA concentration level inside a cell. Due to this large amount of data, mathematical, statistical and computational methods have been developed in order to elucidate the gene regulatory mechanisms that take part of every living organism. To this end, mathematical models of gene regulatory networks have been proposed, along with algorithms to infer these networks. In this work, we focus in two aspects: modeling and inference. Regarding the modeling, we studied existing models for the yeast (Saccharomyces cerevisiae) cell cycle. After that, we proposed a model based on context sensitive probabilistic Boolean networks, and then, an improvement of this model, using nonhomogeneous Markov chain. We show the results, comparing our models against the studied models. Regarding the inference, we proposed a new algorithm using the seed growing paradigm. In this context, a seed is a small subset of genes. Our algorithm is based in two main steps: seed growing step and sampling step. In the first step, the algorithm adds genes into the seed, according to some criterion. In the second step, the algorithm performs a sampling process on the space of networks, defining as its output a set of potentially interesting networks. We applied the algorithm on artificial and biological HeLa cells data, showing satisfactory results. cadeia de Markov inferência de redes redes Booleanas redes de regulação gênica seleção de características Boolean networks constraint satisfaction problems feature selection gene regulatory networks inference Markov chain
18	Learning during search / Apprendre durant la recherche combinatoire Arbelaez Rodriguez, Alejandro 31 May 2011 (has links) La recherche autonome est un nouveau domaine d'intérêt de la programmation par contraintes, motivé par l'importance reconnue de l'utilisation de l'apprentissage automatique pour le problème de sélection de l'algorithme le plus approprié pour une instance donnée, avec une variété d'applications, par exemple: Planification, Configuration d'horaires, etc. En général, la recherche autonome a pour but le développement d'outils automatiques pour améliorer la performance d'algorithmes de recherche, e.g., trouver la meilleure configuration des paramètres pour un algorithme de résolution d'un problème combinatoire. Cette thèse présente l'étude de trois points de vue pour l'automatisation de la résolution de problèmes combinatoires; en particulier, les problèmes de satisfaction de contraintes, les problèmes d'optimisation de combinatoire, et les problèmes de satisfiabilité (SAT).Tout d'abord, nous présentons domFD, une nouvelle heuristique pour le choix de variable, dont l'objectif est de calculer une forme simplifiée de dépendance fonctionnelle, appelée dépendance-relaxée. Ces dépendances-relaxées sont utilisées pour guider l'algorithme de recherche à chaque point de décision.Ensuite, nous révisons la méthode traditionnelle pour construire un portefeuille d'algorithmes pour le problème de la prédiction de la structure des protéines. Nous proposons un nouveau paradigme de recherche-perpétuelle dont l'objectif est de permettre à l'utilisateur d'obtenir la meilleure performance de son moteur de résolution de contraintes. La recherche-perpétuelle utilise deux modes opératoires: le mode d'exploitation utilise le modèle en cours pour solutionner les instances de l'utilisateur; le mode d'exploration réutilise ces instances pour s'entraîner et améliorer la qualité d'un modèle d'heuristiques par le biais de l'apprentissage automatique. Cette deuxième phase est exécutée quand l'unit\'e de calcul est disponible (idle-time). Finalement, la dernière partie de cette thèse considère l'ajout de la coopération au cours d'exécution d'algorithmes de recherche locale parallèle. De cette façon, on montre que si on partage la meilleure configuration de chaque algorithme dans un portefeuille parallèle, la performance globale peut être considérablement amélioré. / Autonomous Search is a new emerging area in Constraint Programming, motivated by the demonstrated importance of the application of Machine Learning techniques to the Algorithm Selection Problem, and with potential applications ranging from planning and configuring to scheduling. This area aims at developing automatic tools to improve the performance of search algorithms to solve combinatorial problems, e.g., selecting the best parameter settings for a constraint solver to solve a particular problem instance. In this thesis, we study three different points of view to automatically solve combinatorial problems; in particular Constraint Satisfaction, Constraint Optimization, and SAT problems.First, we present domFD, a new Variable Selection Heuristic whose objective is to heuristically compute a simplified form of functional dependencies called weak dependencies. These weak dependencies are then used to guide the search at each decision point. Second, we study the Algorithm Selection Problem from two different angles. On the one hand, we review a traditional portfolio algorithm to learn offline a heuristics model for the Protein Structure Prediction Problem. On the other hand, we present the Continuous Search paradigm, whose objective is to allow any user to eventually get his constraint solver to achieve a top performance on their problems. Continuous Search comes in two modes: the functioning mode solves the user's problem instances using the current heuristics model; the exploration mode reuses these instances to training and improve the heuristics model through Machine Learning during the computer idle time. Finally, the last part of the thesis, considers the question of adding a knowledge-sharing layer to current portfolio-based parallel local search solvers for SAT. We show that by sharing the best configuration of each algorithm in the parallel portfolio on regular basis and aggregating this information in special ways, the overall performance can be greatly improved. Portfolio d'algorithmes Apprentissage Automatique Apprentissage Supervisée SAT Recherche locale Portfolio Algorithms Machine Learning Supervised Learning Constraint Satisfaction Problems Constraint Optimization Problems SAT Local Search
19	Vers un couplage des processus de conception de systèmes et de planification de projets : formalisation de connaissances méthodologiques et de connaissances métier / Towards a coupling of system design and project planning processes : formalization of methodological knowledge and business knowledge Abeille, Joël 06 July 2011 (has links) Les travaux présentés dans cette thèse s'inscrivent dans une problématique d'aide à la conception de systèmes, à la planification de leur projet de développement et à leur couplage. L'aide à la conception et à la planification repose sur la formalisation de deux grands types de connaissances : les connaissances méthodologiques utilisables quel que soit le projet de conception et, les connaissances métier spécifiques à un type de conception et/ou de planification donné. Le premier chapitre de la thèse propose un état de l'art concernant les travaux sur le couplage des processus de conception de systèmes et de planification des projets associés et expose la problématique de nos travaux. Deux partie traitent ensuite, d'une part, des connaissances méthodologiques et, d'autre part, des connaissances métier. La première partie expose trois types de couplages méthodologiques. Le couplage structurel propose de formaliser les entités de conception et de planification puis permet leur création et leur association. Le couplage informationnel définit les attributs de faisabilité et de vérification pour ces entités et synchronise les états de ces dernières vis-à-vis de ces attributs. Enfin, le couplage décisionnel consiste à proposer, dans un même espace et sous forme de tableau de bord, les informations nécessaires et suffisantes à la prise de décision par les acteurs du projet de conception. La seconde partie propose de formaliser, d'exploiter et de capitaliser la connaissance métier. Après avoir formalisé ces connaissances sous forme d'une ontologie de concepts, deux mécanismes sont exploités : un mécanisme de réutilisation de cas permettant de réutiliser, en les adaptant, les projets de conception passés et un mécanisme de propagation de contraintes permettant de propager des décisions de la conception vers la planification et réciproquement. / The work presented in this thesis deals with aiding system design, development project planning and its coupling. Aiding design and planning is based on the formalization of two kind of knowledge: methodological knowledge that can be used in all kind of design projects and business knowledge that are dedicated to a particular kind of design and/or planning. The first chapter presents a state of the art about coupling system design process and project planning process and gives the problem of our work. Then, two parts deal with design and planning coupling thanks to, on one hand, methodological knowledge, and on the other hand, business knowledge. The first part presents three types of methodological coupling. The structural coupling defines design and planning entities and permits its simultaneous creation of and its association. The informational coupling defines feasibility and verification attributes for these entities and synchronizes its attribute states. Finally, the decisional coupling consists in proposing, in a single dashboard, the necessary and sufficient information to make a decision by the design project actors. The second part proposes to formalize, to exploit and to capitalize business knowledge. This knowledge is formalized with ontology of concepts. Then, two mechanisms are exploited: a case reuse mechanism that permits to reuse and adapt former design projects and a constraint propagation mechanism that allows propagating decisions from design to planning and reciprocally. Ingénierie Système Conception Planification Couplage conception/planification Raisonnement à Partir de Cas Ingénierie des Connaissances System Engineering Design Planning Design/Planning Coupling Case-Based Reasoning Constraint Satisfaction Problems Knowledge Engineering
20	Statistical physics of constraint satisfaction problems Lamouchi, Elyes 10 1900 (has links) La technique des répliques est une technique formidable prenant ses origines de la physique statistique, comme un moyen de calculer l'espérance du logarithme de la constante de normalisation d'une distribution de probabilité à haute dimension. Dans le jargon de physique, cette quantité est connue sous le nom de l’énergie libre, et toutes sortes de quantités utiles, telle que l’entropie, peuvent être obtenue de là par des dérivées. Cependant, ceci est un problème NP-difficile, qu’une bonne partie de statistique computationelle essaye de résoudre, et qui apparaît partout; de la théorie des codes, à la statistique en hautes dimensions, en passant par les problèmes de satisfaction de contraintes. Dans chaque cas, la méthode des répliques, et son extension par (Parisi et al., 1987), se sont prouvées fortes utiles pour illuminer quelques aspects concernant la corrélation des variables de la distribution de Gibbs et la nature fortement nonconvexe de son logarithme negatif. Algorithmiquement, il existe deux principales méthodologies adressant la difficulté de calcul que pose la constante de normalisation: a). Le point de vue statique: dans cette approche, on reformule le problème en tant que graphe dont les nœuds correspondent aux variables individuelles de la distribution de Gibbs, et dont les arêtes reflètent les dépendances entre celles-ci. Quand le graphe en question est localement un arbre, les procédures de message-passing sont garanties d’approximer arbitrairement bien les probabilités marginales de la distribution de Gibbs et de manière équivalente d'approximer la constante de normalisation. Les prédictions de la physique concernant la disparition des corrélations à longues portées se traduise donc, par le fait que le graphe soit localement un arbre, ainsi permettant l’utilisation des algorithmes locaux de passage de messages. Ceci va être le sujet du chapitre 4. b). Le point de vue dynamique: dans une direction orthogonale, on peut contourner le problème que pose le calcul de la constante de normalisation, en définissant une chaîne de Markov le long de laquelle, l’échantillonnage converge à celui selon la distribution de Gibbs, tel qu’après un certain nombre d’itérations (sous le nom de temps de relaxation), les échantillons sont garanties d’être approximativement générés selon elle. Afin de discuter des conditions dans lesquelles chacune de ces approches échoue, il est très utile d’être familier avec la méthode de replica symmetry breaking de Parisi. Cependant, les calculs nécessaires sont assez compliqués, et requièrent des notions qui sont typiquemment étrangères à ceux sans un entrainement en physique statistique. Ce mémoire a principalement deux objectifs : i) de fournir une introduction a la théorie des répliques, ses prédictions, et ses conséquences algorithmiques pour les problèmes de satisfaction de constraintes, et ii) de donner un survol des méthodes les plus récentes adressant la transition de phase, prédite par la méthode des répliques, dans le cas du problème k−SAT, à partir du point de vu statique et dynamique, et finir en proposant un nouvel algorithme qui prend en considération la transition de phase en question. / The replica trick is a powerful analytic technique originating from statistical physics as an attempt to compute the expectation of the logarithm of the normalization constant of a high dimensional probability distribution known as the Gibbs measure. In physics jargon this quantity is known as the free energy, and all kinds of useful quantities, such as the entropy, can be obtained from it using simple derivatives. The computation of this normalization constant is however an NP-hard problem that a large part of computational statistics attempts to deal with, and which shows up everywhere from coding theory, to high dimensional statistics, compressed sensing, protein folding analysis and constraint satisfaction problems. In each of these cases, the replica trick, and its extension by (Parisi et al., 1987), have proven incredibly successful at shedding light on keys aspects relating to the correlation structure of the Gibbs measure and the highly non-convex nature of − log(the Gibbs measure()). Algorithmic speaking, there exists two main methodologies addressing the intractability of the normalization constant: a) Statics: in this approach, one casts the system as a graphical model whose vertices represent individual variables, and whose edges reflect the dependencies between them. When the underlying graph is locally tree-like, local messagepassing procedures are guaranteed to yield near-exact marginal probabilities or equivalently compute Z. The physics predictions of vanishing long range correlation in the Gibbs measure, then translate into the associated graph being locally tree-like, hence permitting the use message passing procedures. This will be the focus of chapter 4. b) Dynamics: in an orthogonal direction, we can altogether bypass the issue of computing the normalization constant, by defining a Markov chain along which sampling converges to the Gibbs measure, such that after a number of iterations known as the relaxation-time, samples are guaranteed to be approximately sampled according to the Gibbs measure. To get into the conditions in which each of the two approaches is likely to fail (strong long range correlation, high energy barriers, etc..), it is very helpful to be familiar with the so-called replica symmetry breaking picture of Parisi. The computations involved are however quite involved, and come with a number of prescriptions and prerequisite notions (s.a. large deviation principles, saddle-point approximations) that are typically foreign to those without a statistical physics background. The purpose of this thesis is then twofold: i) to provide a self-contained introduction to replica theory, its predictions, and its algorithmic implications for constraint satisfaction problems, and ii) to give an account of state of the art methods in addressing the predicted phase transitions in the case of k−SAT, from both the statics and dynamics points of view, and propose a new algorithm takes takes these into consideration. k-SAT transition de phase méthode des replicas replica-symmetry-breaking chaînes de Markov Monte Carlo marche aléatoire constraint satisfaction problems phase transitions replica trick Markov chain Monte Carlo self-avoiding-walk

Search results