Global ETD Search

11	Inverse multi-objective combinatorial optimization Roland, Julien 12 November 2013 (has links) The initial question addressed in this thesis is how to take into account the multi-objective aspect of decision problems in inverse optimization. The most straightforward extension consists of finding a minimal adjustment of the objective functions coefficients such that a given feasible solution becomes efficient. However, there is not only a single question raised by inverse multi-objective optimization, because there is usually not a single efficient solution. The way we define inverse multi-objective<p>optimization takes into account this important aspect. This gives rise to many questions which are identified by a precise notation that highlights a large collection of inverse problems that could be investigated. In this thesis, a selection of inverse problems are presented and solved. This selection is motivated by their possible applications and the interesting theoretical questions they can rise in practice. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Informatique générale Combinatorial optimization Optimisation combinatoire Multi-Objective Optimization Inverse Problems Combinatorial Optimization
12	Guarded structural indexes: theory and application to relational RDF databases Picalausa, Francois 20 September 2013 (has links) Ces dernières années ont vu un regain d’intérêt dans l’utilisation de données semi-structurées, grâce à la standardisation de formats d’échange de données sur le Web tels que XML et RDF. On notera en particulier le Linking Open Data Project qui comptait plus de 31 milliard de triplets RDF à la fin de l’année 2011. XML reste, pour sa part, l’un des formats de données privilégié de nombreuses bases de données de grandes tailles dont Uniprot, Open Government Initiative et Penn Treebank. <p><p>Cet accroissement du volume de données semi-structurées a suscité un intérêt croissant pour le développement de bases de données adaptées. Parmi les différentes approches proposées, on peut distinguer les approches relationnelles et les approches graphes, comme détaillé au Chapitre 3. Les premières visent à exploiter les moteurs de bases de données relationnelles existants, en y intégrant des techniques spécialisées. Les secondes voient les données semistructurées comme des graphes, c’est-à-dire un ensemble de noeuds liés entre eux par des arêtes étiquetées, dont elles exploitent la structure. L’une des techniques de ce domaine, connue sous le nom d’indexation structurelle, vise à résumer les graphes de données, de sorte à pouvoir identifier rapidement les données utiles au traitement d’une requête.<p><p>Les index structurels classiques sont construits sur base des notions de simulation et de bisimulation sur des graphes. Ces notions, qui sont d’usage dans de nombreux domaines tels que la vérification, la sécurité, et le stockage de données, sont des relations sur les noeuds des graphes. Fondamentalement, ces notions caractérisent le fait que deux noeuds partagent certaines caractéristiques telles qu’un même voisinage. <p><p>Bien que les approches graphes soient efficaces en pratique, elles présentent des limitations dans le cadre de RDF et son langage de requêtes SPARQL. Les étiquettes sont, dans cette optique, distinctes des noeuds du graphe .Dans le modèle décrit par RDF et supporté par SPARQL, les étiquettes et noeuds font néanmoins partie du même ensemble. C’est pourquoi, les approches graphes ne supportent qu’un sous-ensemble des requêtes SPARQL. Au contraire, les approches relationnelles sont fidèles au modèle RDF, et peuvent répondre au différentes requêtes SPARQL. <p><p>La question à laquelle nous souhaitons répondre dans cette thèse est de savoir si les approches relationnelles et graphes sont incompatible, ou s’il est possible de les combiner de manière avantageuse. En particulier, il serait souhaitable de pouvoir conserver la performance des approches graphe, et la généralité des approches relationnelles. Dans ce cadre, nous réalisons un index structurel adapté aux données relationnelles. <p><p>Nous nous basons sur une méthodologie décrite par Fletcher et ses coauteurs pour la conception d’index structurels. Cette méthodologie repose sur trois composants principaux. Un premier composant est une caractérisation dite structurelle du langage de requêtes à supporter. Il s’agit ici de pouvoir identifier les données qui sont retournées en même temps par n’importe quelle requête du langage aussi précisément que possible. Un second composant est un algorithme qui doit permettre de grouper efficacement les données qui sont retournées en même temps, d’après la caractérisation structurelle. Le troisième composant est l’index en tant que tel. Il s’agit d’une structure de données qui doit permettre d’identifier les groupes de données, générés par l’algorithme précédent pour répondre aux requêtes. <p><p>Dans un premier temps, il faut remarquer que le langage SPARQL pris dans sa totalité ne se prête pas à la réalisation d’index structurels efficaces. En effet, le fondement des requêtes SPARQL se situe dans l’expression de requêtes conjonctives. La caractérisation structurelle des requêtes conjonctives est connue, mais ne se prête pas à la construction d’algorithmes efficaces pour le groupement. Néanmoins, l’étude empirique des requêtes SPARQL posées en pratique que nous réalisons au Chapitre 5 montre que celles-ci sont principalement des requêtes conjonctives acycliques. Les requêtes conjonctives acycliques sont connues dans la littérature pour admettre des algorithmes d’évaluation efficaces. <p><p>Le premier composant de notre index structurel, introduit au Chapitre<p>6, est une caractérisation des requêtes conjonctives acycliques. Cette<p>caractérisation est faite en termes de guarded simulation. Pour les graphes la<p>notion de simulation est une version restreinte de la notion de bisimulation.<p>Similairement, nous introduisons la notion de guarded simulation comme une<p>restriction de la notion de guarded bisimulation, une extension connue de la<p>notion de bisimulation aux données relationelles. <p><p>Le Chapitre 7 offre un second composant de notre index structurel. Ce composant est une structure de données appelée guarded structural index qui supporte le traitement de requêtes conjonctives quelconques. Nous montrons que, couplé à la caractérisation structurelle précédente, cet index permet d’identifier de manière optimale les données utiles au traitement de requêtes conjonctives acycliques. <p><p>Le Chapitre 8 constitue le troisième composant de notre index structurel et propose des méthodes efficaces pour calculer la notion de guarded simulation. Notre algorithme consiste essentiellement en une transformation d’une base de données en un graphe particulier, sur lequel les notions de simulation et guarded simulation correspondent. Il devient alors possible de réutiliser les algorithmes existants pour calculer des relations de simulation. <p><p>Si les chapitres précédents définissent une base nécessaire pour un index structurel visant les données relationnelles, ils n’intègrent pas encore cet index dans le contexte d’un moteur de bases de données relationnelles. C’est ce que propose le Chapitre 9, en développant des méthodes qui permettent de prendre en compte l’index durant le traitement d’une requête SPARQL. Des résultats expérimentaux probants complètent cette étude. <p><p>Ce travail apporte donc une première réponse positive à la question de savoir s’il est possible de combiner de manière avantageuse les approches relationnelles et graphes de stockage de données RDF.<p> / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Informatique générale Relational databases Bases de données relationnelles Database RDF Guarded Simulation Structural Index
13	Population-based heuristic algorithms for continuous and mixed discrete-continuous optimization problems Liao, Tianjun 28 June 2013 (has links) Continuous optimization problems are optimization problems where all variables<p>have a domain that typically is a subset of the real numbers; mixed discrete-continuous<p>optimization problems have additionally other types of variables, so<p>that some variables are continuous and others are on an ordinal or categorical<p>scale. Continuous and mixed discrete-continuous problems have a wide range<p>of applications in disciplines such as computer science, mechanical or electrical<p>engineering, economics and bioinformatics. These problems are also often hard to<p>solve due to their inherent difficulties such as a large number of variables, many<p>local optima or other factors making problems hard. Therefore, in this thesis our<p>focus is on the design, engineering and configuration of high-performing heuristic<p>optimization algorithms.<p>We tackle continuous and mixed discrete-continuous optimization problems<p>with two classes of population-based heuristic algorithms, ant colony optimization<p>(ACO) algorithms and evolution strategies. In a nutshell, the main contributions<p>of this thesis are that (i) we advance the design and engineering of ACO algorithms to algorithms that are competitive or superior to recent state-of-the-art<p>algorithms for continuous and mixed discrete-continuous optimization problems,<p>(ii) we improve upon a specific state-of-the-art evolution strategy, the covariance<p>matrix adaptation evolution strategy (CMA-ES), and (iii) we extend CMA-ES to<p>tackle mixed discrete-continuous optimization problems.<p>More in detail, we propose a unified ant colony optimization (ACO) framework<p>for continuous optimization (UACOR). This framework synthesizes algorithmic<p>components of two ACO algorithms that have been proposed in the literature<p>and an incremental ACO algorithm with local search for continuous optimization,<p>which we have proposed during my doctoral research. The design of UACOR<p>allows the usage of automatic algorithm configuration techniques to automatically<p>derive new, high-performing ACO algorithms for continuous optimization. We also<p>propose iCMAES-ILS, a hybrid algorithm that loosely couples IPOP-CMA-ES, a<p>CMA-ES variant that uses a restart schema coupled with an increasing population<p>size, and a new iterated local search (ILS) algorithm for continuous optimization.<p>The hybrid algorithm consists of an initial competition phase, in which IPOP-CMA-ES and the ILS algorithm compete for further deployment during a second<p>phase. A cooperative aspect of the hybrid algorithm is implemented in the form<p>of some limited information exchange from IPOP-CMA-ES to the ILS algorithm<p>during the initial phase. Experimental studies on recent benchmark functions<p>suites show that UACOR and iCMAES-ILS are competitive or superior to other<p>state-of-the-art algorithms.<p>To tackle mixed discrete-continuous optimization problems, we extend ACOMV <p>and propose CESMV, an ant colony optimization algorithm and a covariance matrix adaptation evolution strategy, respectively. In ACOMV and CESMV ,the decision variables of an optimization problem can be declared as continuous, ordinal, or categorical, which allows the algorithm to treat them adequately. ACOMV and<p>CESMV include three solution generation mechanisms: a continuous optimization<p>mechanism, a continuous relaxation mechanism for ordinal variables, and a categorical optimization mechanism for categorical variables. Together, these mechanisms allow ACOMV and CESMV to tackle mixed variable optimization problems.<p>We also propose a set of artificial, mixed-variable benchmark functions, which can<p>simulate discrete variables as ordered or categorical. We use them to automatically tune ACOMV and CESMV's parameters and benchmark their performance.<p>Finally we test ACOMV and CESMV on various real-world continuous and mixed-variable engineering optimization problems. Comparisons with results from the<p>literature demonstrate the effectiveness and robustness of ACOMV and CESMV<p>on mixed-variable optimization problems.<p>Apart from these main contributions, during my doctoral research I have accomplished a number of additional contributions, which concern (i) a note on the<p>bound constraints handling for the CEC'05 benchmark set, (ii) computational results for an automatically tuned IPOP-CMA-ES on the CEC'05 benchmark set and<p>(iii) a study of artificial bee colonies for continuous optimization. These additional<p>contributions are to be found in the appendix to this thesis.<p> / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Informatique générale Sciences de l'ingénieur Heuristic algorithms Algorithmes heuristiques Heuristic Algorithms Mixed Discrete-Continuous Optimization Continuous Optimization
14	Towards multivariant pathogenicity predictions: Using machine-learning to directly predict and explore disease-causing oligogenic variant combinations Papadimitriou, Sofia 15 September 2020 (has links) (PDF) The emergence of statistical and predictive methods able to analyse genomic data has revolutionised the field of medical genetics, allowing the identification of disease-causing gene variants (i.e. mutations) for several human genetic diseases. Although these approaches have greatly improved our understanding of Mendelian «one gene – one phenotype» genetic models, studying diseases related to more intricate models that involve causative variants in several genes (i.e. oligogenic diseases) still remains a challenge, either due to the lack of sufficient methodologies and disease-specific cohorts to study or the phenotypic complexity associated with such diseases. This situation makes it difficult to not only understand the genetic mechanisms of the disease, but to also offer proper counseling and support to the patient. Until recently, no specialized predictive methods existed to directly predict causative variant combinations for oligogenic diseases. However, with the advent of data on variant combinations in gene pairs (i.e. bilocus variant combinations) leading to disease, collected at the Digenic Diseases Database (DIDA), we hypothesized that the transition from single to variant combination pathogenicity predictors is now possible.To investigate this hypothesis, we organised our research on two main routes. At first, we developed an interpretable variant combination pathogenicity predictor, called VarCoPP, for gene pairs. For this goal, we trained multiple Random Forest algorithms on pathogenic bilocus variant combinations from DIDA against neutral data from the 1000 Genomes Project and investigated the contribution of the incorporated variant, gene and gene pair features to the prediction outcome. In the second part, we explored the usefulness of different gene pair burden scores based on this novel predictive method, in discovering oligogenic signatures in neurodevelopmental diseases, which involve a spectrum of monogenic to polygenic cases. We performed a preliminary analysis on the Deciphering Developmental Diseases (DDD) project containing exome data of 4195 families and assessed the capability of our scores in supporting already diagnosed monogenic cases, discovering significant pairs compared to control cases and linking patients in communities based on the genetic burden they share, using the Leiden community detection algorithm.The performance of VarCoPP shows that it is possible to predict disease-causing bilocus variant combinations with good accuracy both during cross-validation and when testing on new cases. We also show its relevance for disease-related gene panels, and enhanced its clinical applicability by defining confidence zones that guarantee with 95\% or 99\% probability that a prediction is indeed a true positive, guiding clinical researchers towards the most relevant results. This method and additional biological annotations are incorporated in an online platform called ORVAL that allows the prediction and exploration of candidate disease-causing oligogenic variant combinations with predicted gene networks, based on patient variant data. Our preliminary analysis on the DDD cohort shows that - although all bi-locus burden scores show advantages, disadvantages and certain types of biases - taking the maximum pathogenicity score present inside a gene pair seems to provide, at the moment, the most unbiased results. We also show that our predictive methods enable us to detect patient communities inside DDD, based exclusively on the shared pathogenic bi-locus burden between patients, with more than half of these communities containing enriched phenotypic and molecular pathway information. Our predictive method is also able to bring to the surface genes not officially known to be involved in disease, but nevertheless, with a biological relevance, as well as a few examples of potential oligogenicity inside the network, paving the way for further exploration of oligogenic signatures for neurodevelopmental diseases. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Génétique clinique Informatique médicale bioinformatics machine-learning oligogenic diseases neurodevelopmental diseases community detection
15	The Design of Vague Spatial Data Warehouses Lopes Siqueira, Thiago Luis 07 December 2015 (has links) (PDF) Spatial data warehouses (SDW) and spatial online analytical processing (SOLAP) enhance decision making by enabling spatial analysis combined with multidimensional analytical queries. A SDW is an integrated and voluminous multidimensional database containing both conventional and spatial data. SOLAP allows querying SDWs with multidimensional queries that select spatial data that satisfy a given topological relationship and that aggregate spatial data. Existing SDW and SOLAP applications mostly consider phenomena represented by spatial data having exact locations and sharp boundaries. They neglect the fact that spatial data may be affected by imperfections, such as spatial vagueness, which prevents distinguishing an object from its neighborhood. A vague spatial object does not have a precisely defined boundary and/or interior. Thus, it may have a broad boundary and a blurred interior, and is composed of parts that certainly belong to it and parts that possibly belong to it. Although several real-world phenomena are characterized by spatial vagueness, no approach in the literature addresses both spatial vagueness and the design of SDWs nor provides multidimensional analysis over vague spatial data. These shortcomings motivated the elaboration of this doctoral thesis, which addresses both vague spatial data warehouses (vague SDWs) and vague spatial online analytical processing (vague SOLAP). A vague SDW is a SDW that comprises vague spatial data, while vague SOLAP allows querying vague SDWs. The major contributions of this doctoral thesis are: (i) the Vague Spatial Cube (VSCube) conceptual model, which enables the creation of conceptual schemata for vague SDWs using data cubes; (ii) the Vague Spatial MultiDim (VSMultiDim) conceptual model, which enables the creation of conceptual schemata for vague SDWs using diagrams; (iii) guidelines for designing relational schemata and integrity constraints for vague SDWs, and for extending the SQL language to enable vague SOLAP; (iv) the Vague Spatial Bitmap Index (VSB-index), which improves the performance to process queries against vague SDWs. The applicability of these contributions is demonstrated in two applications of the agricultural domain, by creating conceptual schemata for vague SDWs, transforming these conceptual schemata into logical schemata for vague SDWs, and efficiently processing queries over vague SDWs. / Les entrepôts de données spatiales (EDS) et l'analyse en ligne spatiale (ALS) améliorent la prise de décision en permettant l'analyse spatiale combinée avec des requêtes analytiques multidimensionnelles. Un EDS est une base de données multidimensionnelle intégrée et volumineuse qui contient des données classiques et des données spatiales. L'ALS permet l'interrogation des EDS avec des requêtes multidimensionnelles qui sélectionnent des données spatiales qui satisfont une relation topologique donnée et qui agrègent les données spatiales. Les EDS et l'ALS considèrent essentiellement des phénomènes représentés par des données spatiales ayant une localisation exacte et des frontières précises. Ils négligent que les données spatiales peuvent être affectées par des imperfections, comme l'imprécision spatiale, ce qui empêche de distinguer précisément un objet de son entourage. Un objet spatial vague n'a pas de frontière et/ou un intérieur précisément définis. Ainsi, il peut avoir une frontière large et un intérieur flou, et est composé de parties qui lui appartiennent certainement et des parties qui lui appartiennent éventuellement. Bien que plusieurs phénomènes du monde réel sont caractérisés par l'imprécision spatiale, il n'y a pas dans la littérature des approches qui adressent en même temps l'imprécision spatiale et la conception d'EDS ni qui fournissent une analyse multidimensionnelle des données spatiales vagues. Ces lacunes ont motivé l'élaboration de cette thèse de doctorat, qui adresse à la fois les entrepôts de données spatiales vagues (EDS vagues) et l'analyse en ligne spatiale vague (ALS vague). Un EDS vague est un EDS qui comprend des données spatiales vagues, tandis que l'ALS vague permet d'interroger des EDS vagues. Les contributions majeures de cette thèse de doctorat sont: (i) le modèle conceptuel Vague Spatial Cube (VSCube), qui permet la création de schémas conceptuels pour des EDS vagues à l'aide de cubes de données; (ii) le modèle conceptuel Vague Spatial MultiDim (VSMultiDim), qui permet la création de schémas conceptuels pour des EDS vagues à l'aide de diagrammes; (iii) des directives pour la conception de schémas relationnels et des contraintes d'intégrité pour des EDS vagues, et pour l'extension du langage SQL pour permettre l'ALS vague; (iv) l'indice Vague Spatial Bitmap (VSB-index) qui améliore la performance pour traiter les requêtes adressées à des EDS vagues. L'applicabilité de ces contributions est démontrée dans deux applications dans le domaine agricole, en créant des schémas conceptuels des EDS vagues, la transformation de ces schémas conceptuels en schémas logiques pour des EDS vagues, et le traitement efficace des requêtes sur des EDS vagues. / O data warehouse espacial (DWE) é um banco de dados multidimensional integrado e volumoso que armazena dados espaciais e dados convencionais. Já o processamento analítico-espacial online (SOLAP) permite consultar o DWE, tanto pela seleção de dados espaciais que satisfazem um relacionamento topológico, quanto pela agregação dos dados espaciais. Deste modo, DWE e SOLAP beneficiam o suporte a tomada de decisão. As aplicações de DWE e SOLAP abordam majoritarimente fenômenos representados por dados espaciais exatos, ou seja, que assumem localizações e fronteiras bem definidas. Contudo, tais aplicações negligenciam dados espaciais afetados por imperfeições, tais como a vagueza espacial, a qual interfere na identificação precisa de um objeto e de seus vizinhos. Um objeto espacial vago não tem sua fronteira ou seu interior precisamente definidos. Além disso, é composto por partes que certamente pertencem a ele e partes que possivelmente pertencem a ele. Apesar de inúmeros fenômenos do mundo real serem caracterizados pela vagueza espacial, na literatura consultada não se identificaram trabalhos que considerassem a vagueza espacial no projeto de DWE e nem para consultar o DWE. Tal limitação motivou a elaboração desta tese de doutorado, a qual introduz os conceitos de DWE vago e de SOLAP vago. Um DWE vago é um DWE que armazena dados espaciais vagos, enquanto que SOLAP vago provê os meios para consultar o DWE vago. Nesta tese, o projeto de DWE vago é abordado e as principais contribuições providas são: (i) o modelo conceitual VSCube que viabiliza a criação de um cubos de dados multidimensional para representar o esquema conceitual de um DWE vago; (ii) o modelo conceitual VSMultiDim que permite criar um diagrama para representar o esquema conceitual de um DWE vago; (iii) diretrizes para o projeto lógico do DWE vago e de suas restrições de integridade, e para estender a linguagem SQL visando processar as consultas de SOLAP vago no DWE vago; e (iv) o índice VSB-index que aprimora o desempenho do processamento de consultas no DWE vago. A aplicabilidade dessas contribuições é demonstrada em dois estudos de caso no domínio da agricultura, por meio da criação de esquemas conceituais de DWE vago, da transformação dos esquemas conceituais em esquemas lógicos de DWE vago, e do processamento de consultas envolvendo as regiões vagas do DWE vago. / Doctorat en Sciences de l'ingénieur et technologie / Location of the public defense: Universidade Federal de São Carlos, São Carlos, SP, Brazil. / info:eu-repo/semantics/nonPublished Informatique administrative Informatique de gestion Systèmes d'information géographique Informatique générale spatial data warehouses spatial vagueness conceptual modeling logical design indexing
16	Analyzing molecular network perturbations in human cancer: application to mutated genes and gene fusions involved in acute lymphoblastic leukemia Hajingabo, Leon 30 January 2015 (has links) Le séquençage du génome humain et l'émergence de nouvelles technologies de génomique à haut débit, ont initié de nouveaux modèles d'investigation pour l'analyse systématique des maladies humaines. Actuellement, nous pouvons tenter de comprendre les maladies tel que le cancer avec une perspective plus globale, en identifiant des gènes responsables des cancers et en étudiant la manière dont leurs produits protéiques fonctionnent dans un réseau d’interactions moléculaires. Dans ce contexte, nous avons collecté les gènes spécifiquement liés à la Leucémie Lymphoblastique Aiguë (LLA), et identifié de nouveaux partenaires d'interaction qui relient ces gènes clés associés à la LLA tels que NOTCH1, FBW7, KRAS et PTPN11, dans un réseau d’interactions. Nous avons également tenté de prédire l’impact fonctionnel des variations génomiques tel que des fusions de gènes impliquées dans LLA. En utilisant comme modèles trois différentes translocations chromosomiques ETV6-RUNX1 (TEL-AML1), BCR-ABL1, et E2A-PBX1 (TCF3-PBX1) fréquemment identifiées dans des cellules B LLA, nous avons adapté une approche de prédiction d’oncogènes afin de prédire des perturbations moléculaires dans la LLA. Nous avons montré que les circuits transcriptomiques dépendant de Myc et JunD sont spécifiquement dérégulés suite aux fusions de gènes TEL-AML1 et TCF3-PBX1, respectivement. Nous avons également identifié le mécanisme de transport des ARNm dépendant du facteur NXF1 comme une cible directe de la protéine de fusion TCF3-PBX1. Grâce à cette approche combinant les données interactomiques et les analyses d'expression génique, nous avons fourni un nouvel aperçu à la compréhension moléculaire de la Leucémie Lymphoblastique Aiguë. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Bioinformatics Bio-informatique Dysregulated network prediction
17	Bioinformatic discovery of novel exons expressed in human brain and their association with neurodevelopmental disorders Reggiani, Claudio 16 March 2018 (has links) An important quest in genomics since the publication of the first complete human genome in 2003 has been its functional annotation. DNA holds the instructions to the production of the components necessary for the life of cells and organisms. A complete functional catalog of genomic regions will help the understanding of the cell body and its dynamics, thus creating links between genotype and phenotypic traits. The need for annotations prompted the development of several bioinformatic methods. In the context of promoter and first exon predictors, the majority of models relies principally on structural and chemical properties of the DNA sequence. Some of them integrate information from epigenomic and transcriptomic data as secondary features. Current genomic research asserts that reference genome annotations are far from being fully annotated (human organism included).Physicians rely on reference genome annotations and functional databases to understand disorders with genetic basis, and missing annotations may lead to unresolved cases. Because of their complexity, neurodevelopmental disorders are under study to figure out all genetic regions that are involved. Besides functional validation on model organisms, the search for genotype-phenotype association is supported by statistical analysis, which is typically biased towards known functional regions.This thesis addresses the use of an in-silico integrative analysis to improve reference genome annotations and discover novel functional regions associated with neurodevelopemental disorders. The contributions outlined in this document have practical applications in clinical settings. The presented bioinformatic method is based on epigenomic and transcriptomic data, thus excluding features from DNA sequence. Such integrative approach applied on brain data allowed the discovery of two novel promoters and coding first exons in the human DLG2 gene, which were also found to be statistically associated with neurodevelopmental disorders and intellectual disability in particular. The application of the same methodology to the whole genome resulted in the discovery of other novel exons expressed in brain. Concerning the in-silico method itself, the research demanded a high number of functional and clinical datasets to properly support and validate our discoveries.This work describes a bioinformatic method for genome annotation, in the specific area of promoter and first exons. So far the method has been applied on brain data, and the extension to the whole body data would be a logical by-product. We will leverage distributed frameworks to tackle the even higher amount of data to analyse, a task that has already begun. Another interesting research direction that came up from this work is the temporal enrichment analysis of epigenomics data across different developmental stages, in which changes of epigenomic enrichment suggest time-specific and tissue-specific functional gene and gene isoforms regulation. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Sciences bio-médicales et agricoles Informatique générale Bioinformatics Functional genomics Intellectual disability Neurodevelopmental disorders Promoters Machine Learning DLG2
18	Coupling ant colony system with local search Gambardella, Luca Maria 24 June 2015 (has links) In the last decades there has been a lot of interest in computational models and metaheuristics algorithms capable to solve combinatorial optimization problems. The recent trend is to define these algorithms taking inspiration by the observation of natural systems. In this thesis the Ant Colony System (ACS) is presented which has been inspired by the observation of real ant colonies. ACS is initially proposed to solve the symmetric and asymmetric travelling salesman problems where it is shown to be competitive with other metaheuristics. Although this is an interesting and promising result, it was immediately clear that ACS, as well as other metaheuristics, in many cases cannot compete with specialized local search methods. An interesting trend is therefore to couple metaheuristics with a local optimizer, giving birth to so-called hybrid methods. Along this line, the thesis investigates MACS-VRPTW (Multiple ACS for the Vehicle Routing Problem with Time Windows) and HAS-SOP: Hybrid Ant System for the Sequential Ordering Problem (SOP). In the second part the thesis introduces some modifications of the original ACS algorithm. These modifications are able to speed up the method and to make it more competitive in case of large problem instances. The resulting framework, called Enhanced Ant Colony System is tested for the SOP. Finally the thesis presents the application of ACS to solve real-life vehicle routing problems where additional constraints and stochastic information are included. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished Informatique générale Heuristic algorithms Ant algorithms Algorithmes heuristiques Algorithmes de colonies de fourmis metaheuristics ant colony optimization local search
19	A machine learning approach for automatic and generic side-channel attacks Lerman, Liran 10 June 2015 (has links) L'omniprésence de dispositifs interconnectés amène à un intérêt massif pour la sécurité informatique fournie entre autres par le domaine de la cryptographie. Pendant des décennies, les spécialistes en cryptographie estimaient le niveau de sécurité d'un algorithme cryptographique indépendamment de son implantation dans un dispositif. Cependant, depuis la publication des attaques d'implantation en 1996, les attaques physiques sont devenues un domaine de recherche actif en considérant les propriétés physiques de dispositifs cryptographiques. Dans notre dissertation, nous nous concentrons sur les attaques profilées. Traditionnellement, les attaques profilées appliquent des méthodes paramétriques dans lesquelles une information a priori sur les propriétés physiques est supposée. Le domaine de l'apprentissage automatique produit des modèles automatiques et génériques ne nécessitant pas une information a priori sur le phénomène étudié.<p><p>Cette dissertation apporte un éclairage nouveau sur les capacités des méthodes d'apprentissage automatique. Nous démontrons d'abord que les attaques profilées paramétriques surpassent les méthodes d'apprentissage automatique lorsqu'il n'y a pas d'erreur d'estimation ni d'hypothèse. En revanche, les attaques fondées sur l'apprentissage automatique sont avantageuses dans des scénarios réalistes où le nombre de données lors de l'étape d'apprentissage est faible. Par la suite, nous proposons une nouvelle métrique formelle d'évaluation qui permet (1) de comparer des attaques paramétriques et non-paramétriques et (2) d'interpréter les résultats de chaque méthode. La nouvelle mesure fournit les causes d'un taux de réussite élevé ou faible d'une attaque et, par conséquent, donne des pistes pour améliorer l'évaluation d'une implantation. Enfin, nous présentons des résultats expérimentaux sur des appareils non protégés et protégés. La première étude montre que l'apprentissage automatique a un taux de réussite plus élevé qu'une méthode paramétrique lorsque seules quelques données sont disponibles. La deuxième expérience démontre qu'un dispositif protégé est attaquable avec une approche appartenant à l'apprentissage automatique. La stratégie basée sur l'apprentissage automatique nécessite le même nombre de données lors de la phase d'apprentissage que lorsque celle-ci attaque un produit non protégé. Nous montrons également que des méthodes paramétriques surestiment ou sous-estiment le niveau de sécurité fourni par l'appareil alors que l'approche basée sur l'apprentissage automatique améliore cette estimation. <p><p>En résumé, notre thèse est que les attaques basées sur l'apprentissage automatique sont avantageuses par rapport aux techniques classiques lorsque la quantité d'information a priori sur l'appareil cible et le nombre de données lors de la phase d'apprentissage sont faibles. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Cryptography Machine learning Cryptographie Apprentissage automatique time series classification cryptanalysis cryptography side-channel attack machine learning power analysis
20	Qualitative analysis of probabilistic synchronizing systems / Analyse qualitative des systèmes probabilistes synchronisants Shirmohammadi, Mahsa 10 December 2014 (has links) Markov decision processes (MDPs) are finite-state probabilistic systems with both strategic and random choices, hence well-established to model the interactions between a controller and its randomly responding environment. An MDP can be mathematically viewed as a one and half player stochastic game played in rounds when the controller chooses an action, and the environment chooses a successor according to a fixed probability distribution.<p><p>There are two incomparable views on the behavior of an MDP, when the strategic choices are fixed. In the traditional view, an MDP is a generator of sequence of states, called the state-outcome; the winning condition of the player is thus expressed as a set of desired sequences of states that are visited during the game, e.g. Borel condition such as reachability. The computational complexity of related decision problems and memory requirement of winning strategies for the state-outcome conditions are well-studied.<p><p>Recently, MDPs have been viewed as generators of sequences of probability distributions over states, called the distribution-outcome. We introduce synchronizing conditions defined on distribution-outcomes, which intuitively requires that the probability mass accumulates in some (group of) state(s), possibly in limit. A probability distribution is p-synchronizing if the probability mass is at least p in some state, and a sequence of probability distributions is always, eventually, weakly, or strongly p-synchronizing if respectively all, some, infinitely many, or all but finitely many distributions in the sequence are p-synchronizing.<p><p>For each synchronizing mode, an MDP can be (i) sure winning if there is a strategy that produces a 1-synchronizing sequence; (ii) almost-sure winning if there is a strategy that produces a sequence that is, for all epsilon > 0, a (1-epsilon)-synchronizing sequence; (iii) limit-sure winning if for all epsilon > 0, there is a strategy that produces a (1-epsilon)-synchronizing sequence.<p><p>We consider the problem of deciding whether an MDP is winning, for each synchronizing and winning mode: we establish matching upper and lower complexity bounds of the problems, as well as the memory requirement for optimal winning strategies.<p><p>As a further contribution, we study synchronization in probabilistic automata (PAs), that are kind of MDPs where controllers are restricted to use only word-strategies; i.e. no ability to observe the history of the system execution, but the number of choices made so far. The synchronizing languages of a PA is then the set of all synchronizing word-strategies: we establish the computational complexity of the emptiness and universality problems for all synchronizing languages in all winning modes.<p><p>We carry over results for synchronizing problems from MDPs and PAs to two-player turn-based games and non-deterministic finite state automata. Along with the main results, we establish new complexity results for alternating finite automata over a one-letter alphabet. In addition, we study different variants of synchronization for timed and weighted automata, as two instances of infinite-state systems. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished Informatique générale Sciences exactes et naturelles Stochastic processes Probabilistic automata Processus stochastiques Automates probabilistes synchronising words probabilistic automata markov Decision Process

Search results