• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 66
  • 43
  • Tagged with
  • 109
  • 109
  • 81
  • 45
  • 40
  • 40
  • 40
  • 38
  • 34
  • 22
  • 18
  • 17
  • 17
  • 17
  • 17
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

A BPMN-based conceptual language for designing ETL processes

El Akkaoui, Zineb 27 June 2014 (has links)
Business Intelligence (BI) is the set of techniques and technologies that support the decision-making process by providing an aggregated insight on data in the organization. Due to the numerous potentially useful data hold by the events and applications running in the organization, the BI market calls for new technologies able to suitably exploit it for analysis wherever it is available. In particular, the Extract, Transform, and Load (ETL) processes, the fundamental BI technology responsible for integrating and cleansing organization data, must respond to these requirements.<p><p>However, the development of ETL processes is still considered to be very complex and time-consuming, to such a point that roughly 80% of the BI project effort is dedicated to the ETL development. Among the phases of ETL development life cycle, ETL modeling is a critical and laborious task. Actually, this phase produces<p>the first effective formal representation of the ETL process, i.e. ETL model, that is completely reused and refined in the subsequent phases of the development.<p><p>Typically, the ETL processes are modeled using vendor-specific ETL tools from the very beginning of development. However, these tools are unsuitable for business users since they induce overwhelming fine-grained models.<p><p>As an attempt to provide more appropriate tools to business users, vendor-independent ETL modeling languages have been proposed in the literature. Nevertheless, they still remain immature. In order to get a precise view on these languages, we conduct a survey which: i) defines a set of criteria associated to major ETL<p>requirements identified in the literature; ii) compares the surveyed conceptual languages, issued from research work, to the physical languages, issued from prominent ETL tools; and iii) studies the whole methodologies of ETL development associated<p>to these modeling languages.<p><p>The analysis of our survey reveals several drawbacks in responding to the ETL requirements. Particularly, the conceptual languages have incomplete elements for ETL modeling with few or no formalization. Several languages are only descriptive with no ability to be automatically implemented into executable code, nor are they able to be automatically maintained according to changes over time.<p><p>To address these shortcomings, we present, in this thesis, a novel approach that tackles the whole development life cycle of ETL processes. <p><p>First, we propose a new vendor-independent language aiming at modeling ETL processes similar to typical business processes, the processes responsible for managing the operations in an organization. The rational behind this proposal is to provide ETL processes with better access to data in events and applications of the organization, including fresh data, and better design capabilities such as available analysis for any users. By using the standard representation mechanism denoted BPMN (Business Process Modeling and Notation) and a classification of ETL elements resulting from a study of the most used commercial and open source ETL tools, the language enables building agile and full-edged ETL processes. We name our language BPMN4ETL to refer to BPMN for ETL processes.<p><p>Second, we build a model-driven framework that provides automatic code generation capability and ameliorates maintenance support of our ETL language. We use the Model-Driven Development (MDD) technology as it helps in developing software, particularly in automating the transformation from one phase of the software development to another. We present a set of model-to-text transformations able to produce code for different business process engines and ETL engines. Also, we depict the model-to-model transformations that automatically update the ETL models with the aim of supporting the maintenance of the generated code according to data source evolution. A demonstration using a case study is conducted as an initial validation to show that the framework covering modeling, implementation and maintenance could be used in practice.<p><p> To illustrate new concepts introduced in the thesis, mainly the BPMN4ETL language, and the implementation and maintenance framework, we use a case study from the fictitious Northwind Traders company, a retailer company that imports and exports foods from around the world. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
32

Integration of multi-criteria tools in geographical information systems / Intégration d'outils multicritères dans les systèmes d'information géographique

Lidouh, Karim 13 January 2014 (has links)
For a little over twenty years, researchers have worked on integrating multi-criteria aggregation procedures (MCAP) to GIS. Several notable contributions have brought this field to what it is today. After studying the course of MCDA-GIS integration through several works, we question the future of such an attempt. Indeed most works that aim for an integration do not survive long after their direct purpose has been fulfilled. We end up understanding through a critical review of the existing systems that technical integration means nothing if it is not visible to the user on an operational level.<p><p>We therefore propose several contributions to improve the usability of MCDA methods in a geographic context. One of our works consists in adapting the PROMETHEE-GAIA methodology to be used on maps for spatially referenced problems. To do so, we define symbols/glyphs that display select parts of the results obtained through the PROMETHEE and GAIA methods. This allows for the comparison of alternatives' profiles and characteristics based on their geographic location which wasn't possible before. This adaptation helps us combine multicriteria and geographic aspects in an entirely new way.<p>We also propose some extensions of the GAIA method to improve the quality of the results and reduce the risk of wrong interpretations to be made due to losses of data. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
33

Causal inference and prior integration in bioinformatics using information theory

Olsen, Catharina 17 October 2013 (has links)
An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high variable to sample ratio which is due to a number of factors: a single experiment captures all genes while the number of experiments is restricted by the experiment’s cost, time and patient cohort size. The second problem is that these data sets typically exhibit high amounts of noise.<p><p>Another important problem in bioinformatics is the question of how the inferred networks’ quality can be evaluated. The current best practice is a two step procedure. In the first step, the highest scoring interactions are compared to known interactions stored in biological databases. The inferred networks passes this quality assessment if there is a large overlap with the known interactions. In this case, a second step is carried out in which unknown but high scoring and thus promising new interactions are validated ’by hand’ via laboratory experiments. Unfortunately when integrating prior knowledge in the inference procedure, this validation procedure would be biased by using the same information in both the inference and the validation. Therefore, it would no longer allow an independent validation of the resulting network.<p><p>The main contribution of this thesis is a complete computational framework that uses experimental knock down data in a cross-validation scheme to both infer and validate directed networks. Its components are i) a method that integrates genomic data and prior knowledge to infer directed networks, ii) its implementation in an R/Bioconductor package and iii) a web application to retrieve prior knowledge from PubMed abstracts and biological databases. To infer directed networks from genomic data and prior knowledge, we propose a two step procedure: First, we adapt the pairwise feature selection strategy mRMR to integrate prior knowledge in order to obtain the network’s skeleton. Then for the subsequent orientation phase of the algorithm, we extend a criterion based on interaction information to include prior knowledge. The implementation of this method is available both as part of the prior retrieval tool Predictive Networks and as a stand-alone R/Bioconductor package named predictionet.<p><p>Furthermore, we propose a fully data-driven quantitative validation of such directed networks using experimental knock-down data: We start by identifying the set of genes that was truly affected by the perturbation experiment. The rationale of our validation procedure is that these truly affected genes should also be part of the perturbed gene’s childhood in the inferred network. Consequently, we can compute a performance score / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
34

Towards autonomous task partitioning in swarm robotics: experiments with foraging robots

Pini, Giovanni 14 June 2013 (has links)
In this thesis, we propose an approach to achieve autonomous task partitioning in swarms of robots. Task partitioning is the process by which tasks are decomposed into sub-tasks and it is often an advantageous way of organizing work in groups of individuals. Therefore, it is interesting to study its application to swarm robotics, in which groups of robots are deployed to collectively carry out a mission. The capability of partitioning tasks autonomously can enhance the flexibility of swarm robotics systems because the robots can adapt the way they decompose and perform their work depending on specific environmental conditions and goals. So far, few studies have been presented on the topic of task partitioning in the context of swarm robotics. Additionally, in all the existing studies, there is no separation between the task partitioning methods and the behavior of the robots and often task partitioning relies on characteristics of the environments in which the robots operate.<p>This limits the applicability of these methods to the specific contexts for which they have been built. The work presented in this thesis represents the first steps towards a general framework for autonomous task partitioning in swarms of robots. We study task partitioning in foraging, since foraging abstracts practical real-world problems. The approach we propose in this thesis is therefore studied in experiments in which the goal is to achieve autonomous task partitioning in foraging. However, in the proposed approach, the task partitioning process relies upon general, task-independent concepts and we are therefore confident that it is applicable in other contexts. We identify two main capabilities that the robots should have: i) being capable of selecting whether to employ task partitioning and ii) defining the sub-tasks of a given task. We propose and study algorithms that endow a swarm of robots with these capabilities. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
35

Spectral factor model for time series learning

Alexander Miranda, Abhilash 24 November 2011 (has links)
Today's computerized processes generate<p>massive amounts of streaming data.<p>In many applications, data is collected for modeling the processes. The process model is hoped to drive objectives such as decision support, data visualization, business intelligence, automation and control, pattern recognition and classification, etc. However, we face significant challenges in data-driven modeling of processes. Apart from the errors, outliers and noise in the data measurements, the main challenge is due to a large dimensionality, which is the number of variables each data sample measures. The samples often form a long temporal sequence called a multivariate time series where any one sample is influenced by the others.<p>We wish to build a model that will ensure robust generation, reviewing, and representation of new multivariate time series that are consistent with the underlying process.<p><p>In this thesis, we adopt a modeling framework to extract characteristics from multivariate time series that correspond to dynamic variation-covariation common to the measured variables across all the samples. Those characteristics of a multivariate time series are named its 'commonalities' and a suitable measure for them is defined. What makes the multivariate time series model versatile is the assumption regarding the existence of a latent time series of known or presumed characteristics and much lower dimensionality than the measured time series; the result is the well-known 'dynamic factor model'.<p>Original variants of existing methods for estimating the dynamic factor model are developed: The estimation is performed using the frequency-domain equivalent of the dynamic factor model named the 'spectral factor model'. To estimate the spectral factor model, ideas are sought from the asymptotic theory of spectral estimates. This theory is used to attain a probabilistic formulation, which provides maximum likelihood estimates for the spectral factor model parameters. Then, maximum likelihood parameters are developed with all the analysis entirely in the spectral-domain such that the dynamically transformed latent time series inherits the commonalities maximally.<p><p>The main contribution of this thesis is a learning framework using the spectral factor model. We term learning as the ability of a computational model of a process to robustly characterize the data the process generates for purposes of pattern matching, classification and prediction. Hence, the spectral factor model could be claimed to have learned a multivariate time series if the latent time series when dynamically transformed extracts the commonalities reliably and maximally. The spectral factor model will be used for mainly two multivariate time series learning applications: First, real-world streaming datasets obtained from various processes are to be classified; in this exercise, human brain magnetoencephalography signals obtained during various cognitive and physical tasks are classified. Second, the commonalities are put to test by asking for reliable prediction of a multivariate time series given its past evolution; share prices in a portfolio are forecasted as part of this challenge.<p><p>For both spectral factor modeling and learning, an analytical solution as well as an iterative solution are developed. While the analytical solution is based on low-rank approximation of the spectral density function, the iterative solution is based on the expectation-maximization algorithm. For the human brain signal classification exercise, a strategy for comparing similarities between the commonalities for various classes of multivariate time series processes is developed. For the share price prediction problem, a vector autoregressive model whose parameters are enriched with the maximum likelihood commonalities is designed. In both these learning problems, the spectral factor model gives commendable performance with respect to competing approaches.<p><p>Les processus informatisés actuels génèrent des quantités massives de flux de données. Dans nombre d'applications, ces flux de données sont collectées en vue de modéliser les processus. Les modèles de processus obtenus ont pour but la réalisation d'objectifs tels que l'aide à la décision, la visualisation de données, l'informatique décisionnelle, l'automatisation et le contrôle, la reconnaissance de formes et la classification, etc. La modélisation de processus sur la base de données implique cependant de faire face à d’importants défis. Outre les erreurs, les données aberrantes et le bruit, le principal défi provient de la large dimensionnalité, i.e. du nombre de variables dans chaque échantillon de données mesurées. Les échantillons forment souvent une longue séquence temporelle appelée série temporelle multivariée, où chaque échantillon est influencé par les autres. Notre objectif est de construire un modèle robuste qui garantisse la génération, la révision et la représentation de nouvelles séries temporelles multivariées cohérentes avec le processus sous-jacent.<p><p>Dans cette thèse, nous adoptons un cadre de modélisation capable d’extraire, à partir de séries temporelles multivariées, des caractéristiques correspondant à des variations - covariations dynamiques communes aux variables mesurées dans tous les échantillons. Ces caractéristiques sont appelées «points communs» et une mesure qui leur est appropriée est définie. Ce qui rend le modèle de séries temporelles multivariées polyvalent est l'hypothèse relative à l'existence de séries temporelles latentes de caractéristiques connues ou présumées et de dimensionnalité beaucoup plus faible que les séries temporelles mesurées; le résultat est le bien connu «modèle factoriel dynamique». Des variantes originales de méthodes existantes pour estimer le modèle factoriel dynamique sont développées :l'estimation est réalisée en utilisant l'équivalent du modèle factoriel dynamique au niveau du domaine de fréquence, désigné comme le «modèle factoriel spectral». Pour estimer le modèle factoriel spectral, nous nous basons sur des idées relatives à la théorie des estimations spectrales. Cette théorie est utilisée pour aboutir à une formulation probabiliste, qui fournit des estimations de probabilité maximale pour les paramètres du modèle factoriel spectral. Des paramètres de probabilité maximale sont alors développés, en plaçant notre analyse entièrement dans le domaine spectral, de façon à ce que les séries temporelles latentes transformées dynamiquement héritent au maximum des points communs.<p><p>La principale contribution de cette thèse consiste en un cadre d'apprentissage utilisant le modèle factoriel spectral. Nous désignons par apprentissage la capacité d'un modèle de processus à caractériser de façon robuste les données générées par le processus à des fins de filtrage par motif, classification et prédiction. Dans ce contexte, le modèle factoriel spectral est considéré comme ayant appris une série temporelle multivariée si la série temporelle latente, une fois dynamiquement transformée, permet d'extraire les points communs de façon fiable et maximale. Le modèle factoriel spectral sera utilisé principalement pour deux applications d'apprentissage de séries multivariées :en premier lieu, des ensembles de données sous forme de flux venant de différents processus du monde réel doivent être classifiés; lors de cet exercice, la classification porte sur des signaux magnétoencéphalographiques obtenus chez l'homme au cours de différentes tâches physiques et cognitives; en second lieu, les points communs obtenus sont testés en demandant une prédiction fiable d'une série temporelle multivariée étant donnée l'évolution passée; les prix d'un portefeuille d'actions sont prédits dans le cadre de ce défi.<p><p>À la fois pour la modélisation et pour l'apprentissage factoriel spectral, une solution analytique aussi bien qu'une solution itérative sont développées. Tandis que la solution analytique est basée sur une approximation de rang inférieur de la fonction de densité spectrale, la solution itérative est basée, quant à elle, sur l'algorithme de maximisation des attentes. Pour l'exercice de classification des signaux magnétoencéphalographiques humains, une stratégie de comparaison des similitudes entre les points communs des différentes classes de processus de séries temporelles multivariées est développée. Pour le problème de prédiction des prix des actions, un modèle vectoriel autorégressif dont les paramètres sont enrichis avec les points communs de probabilité maximale est conçu. Dans ces deux problèmes d’apprentissage, le modèle factoriel spectral atteint des performances louables en regard d’approches concurrentes. / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
36

Acquisition en temps réel, identification et mise en correspondance de données 3D

Engels, Laurent 29 September 2011 (has links)
Cette thèse décrit le développement et la mise en œuvre d'un système d'acquisition 3D ayant pour but la localisation temps réel en 3D et l'identification d'électrodes et des antennes utilisées lors d'un examen MEG/EEG. La seconde partie concerne la mise en correspondance de ces données avec les informations de la résonance magnétique. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
37

Visibly pushdown transducers

Servais, Frédéric 26 September 2011 (has links)
The present work proposes visibly pushdown transducers (VPTs) for defining transformations of documents with a nesting structure. We show that this subclass of pushdown transducers enjoy good properties. Notably, we show that functionality is decidable in PTime and k-valuedness in co-NPTime. While this class is not closed under composition and its type checking problem against visibly pushdown automata is undecidable, we identify a subclass, the well-nested VPTs, closed under composition and with a decidable type checking problem. Furthermore, we show that the class of VPTs is closed under look-ahead, and that the deterministic VPTs with look-ahead characterize the functional VPTs transductions. Finally, we investigate the resources necessary to perform transformations defined by VPTs. We devise a memory efficient algorithm. Then we show that it is decidable whether a VPT transduction can be performed with a memory that depends on the level of nesting of the input document but not on its length. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished
38

An integer programming approach to layer planning in communication networks / Une approche de programmation entière pour le problème de planification de couches dans les réseaux de communication

Ozsoy, Feyzullah Aykut 12 May 2011 (has links)
In this thesis, we introduce the Partitioning-Hub Location-Routing problem (PHLRP), which can be classified as a variant of the hub location problem.<p>PHLRP consists of partitioning a network into sub-networks, locating at least one hub in each subnetwork and routing the traffic within the network such that all inter-subnetwork traffic is routed through the hubs and all intra-subnetwork traffic stays within the sub-networks all the way from the source to the destination. Obviously, besides the hub location component, PHLRP also involves a graph partitioning component and a routing component. PHLRP finds applications in the strategic planning or deployment of the Intermediate System-Intermediate System (ISIS) Internet Protocol networks and the Less-than-truck load freight distribution systems.<p><p>First, we introduce three IP formulations for solving PHLRP. The hub location component and the graph partitioning components of PHLRP are<p>modeled in the same way in all three formulations. More precisely, the hub location component is represented by the p-median variables and constraints; and the graph partitioning component is represented by the size-constrained graph partitioning variables and constraints. The formulations differ from each other in the way the peculiar routing requirements of PHLRP are modeled.<p><p>We then carry out analytical and empirical comparisons of the three IP<p>formulations. Our thorough analysis reveals that one of the formulations is<p>provably the tightest of the three formulations. We also show analytically that the LP relaxations of the other two formulations do not dominate each other. On the other hand, our empirical comparison in a standard branch-and-cut framework that is provided by CPLEX shows that not the tightest but the most compact of the three formulations yield the best performance in terms of solution time. <p><p>From this point on, based on the insight gained from detailed analysis of the formulations, we focus our attention on a common sub-problem of the three formulations: the so-called size-constrained graph partitioning problem. We carry out a detailed polyhedral analysis of this problem. The main benefit from this polyhedral analysis is that the facets we identify for the size-constrained graph partitioning problem constitute strong valid inequalities for PHLRP.<p><p>And finally, we wrap up our efforts for solving PHLRP. Namely, we present<p>the results of our computational experiments, in which we employ some facets<p>of the size-constrained graph partitioning polytope in a branch-and-cut algorithm for solving PHLRP. Our experiments show that our approach brings<p>significant improvements to the solution time of PHLRP when compared with<p>the default branch-and-cut solver of XPress. <p><p>/<p><p>Dans cette thèse, nous introduisons le problème Partitionnement-Location des Hubs et Acheminement (PLHA), une variante du problème de location de hubs. Le problème PLHA partitionne un réseau afin d'obtenir des sous-réseaux, localise au moins un hub dans chaque sous-réseau et achemine le traffic dans le réseau de la maniére suivante :le traffic entre deux<p>sous-réseaux distincts doit être éxpedié au travers des hubs tandis que le traffic entre deux noeuds d'un même sous-réseau ne doit pas sortir de celui-ci. PLHA possède des applications dans le planning stratégique, ou déploiement, d'un certain protocole de communication utilisé<p>dans l'Internet, Intermediate System - Intermediate System, ainsi que dans la distribution des frets.<p><p>Premièrement, nous préesentons trois formulations linéaires en variables entières pour résoudre PLHA. Le partitionnement du graphe et la localisation des hubs sont modélisées de la même maniére dans les trois formulations. Ces formulations diffèrent les unes des autres dans la maniére dont l'acheminement du traffic est traité.<p><p>Deuxièmement, nous présentons des comparaisons analytiques et empiriques des trois formulations. Notre comparaison analytique démontre que l'une des formulations est plus forte que les autres. Néanmoins, la comparaison empirique des formulations, via le solveur CPLEX, montre que la formulation la plus compacte (mais pas la plus forte) obtient les meilleures performances en termes de temps de résolution du problème.<p><p>Ensuite, nous nous concentrons sur un sous-problème, à savoir, le partitionnement des graphes sous contrainte de taille. Nous étudions le polytope des solutions réalisables de ce sous-problème. Les facettes de ce polytope constituent des inégalités valides fortes pour<p>PLHA et peuvent être utilisées dans un algorithme de branch-and-cut pour résoudre PLHA.<p><p>Finalement, nous présentons les résultats d'un algorithme de branch-and-cut que nous avons développé pour résoudre PLHA. Les résultats démontrent que la performance de notre méthode est meilleure que celle de l'algorithme branch-and-cut d'Xpress.<p> / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
39

Models and algorithms for network design problems

Poss, Michaël 22 February 2011 (has links)
Dans cette thèse, nous étudions différents modèles, déterministes et stochastiques, pour les problèmes de dimensionnement de réseaux. Nous examinons également le problème du sac-à-dos stochastique ainsi que, plus généralement, les contraintes de capacité en probabilité.<p>\ / Doctorat en Sciences / info:eu-repo/semantics/nonPublished
40

Estimation-based metaheuristics for stochastic combinatorial optimization: case studies in sochastic routing problems

Balaprakash, Prasanna 26 January 2010 (has links)
Stochastic combinatorial optimization problems are combinatorial optimization problems where part of the problem data are probabilistic. The focus of this thesis is on stochastic routing problems, a class of stochastic combinatorial optimization problems that arise in distribution management. Stochastic routing problems involve finding the best solution to distribute goods across a logistic network. In the problems we tackle, we consider a setting in which the cost of a solution is described by a random variable; the goal is to find the solution that minimizes the expected cost. Solving such stochastic routing problems is a challenging task because of two main factors. First, the number of possible solutions grows exponentially with the instance size. Second, computing the expected cost of a solution is computationally very expensive. <p><br><p>To tackle stochastic routing problems, stochastic local search algorithms such as iterative improvement algorithms and metaheuristics are quite promising because they offer effective strategies to tackle the combinatorial nature of these problems. However, a crucial factor that determines the success of these algorithms in stochastic settings is the trade-off between the computation time needed to search for high quality solutions in a large search space and the computation time spent in computing the expected cost of solutions obtained during the search. <p><br><p>To compute the expected cost of solutions in stochastic routing problems, two classes of approaches have been proposed in the literature: analytical computation and empirical estimation. The former exactly computes the expected cost using closed-form expressions; the latter estimates the expected cost through Monte Carlo simulation.<p><br><p>Many previously proposed metaheuristics for stochastic routing problems use the analytical computation approach. However, in a large number of practical stochastic routing problems, due to the presence of complex constraints, the use of the analytical computation approach is difficult, time consuming or even impossible. Even for the prototypical stochastic routing problems that we consider in this thesis, the adoption of the analytical computation approach is computationally expensive. Notwithstanding the fact that the empirical estimation approach can address the issues posed by the analytical computation approach, its adoption in metaheuristics to tackle stochastic routing problems has never been thoroughly investigated. <p><br><p>In this thesis, we study two classical stochastic routing problems: the probabilistic traveling salesman problem (PTSP) and the vehicle routing problem with stochastic demands and customers (VRPSDC). The goal of the thesis is to design, implement, and analyze effective metaheuristics that use the empirical estimation approach to tackle these two problems. The main results of this thesis are: <p>1) The empirical estimation approach is a viable alternative to the widely-adopted analytical computation approach for the PTSP and the VRPSDC; <p>2) A principled adoption of the empirical estimation approach in metaheuristics results in high performing algorithms for tackling the PTSP and the VRPSDC. The estimation-based metaheuristics developed in this thesis for these two problems define the new state-of-the-art. / Doctorat en Sciences de l'ingénieur / info:eu-repo/semantics/nonPublished

Page generated in 0.4857 seconds