Global ETD Search

1	Connaissances expertes et modélisation pour l'exploitation d'images d'observation de la Terre à hautes résolutions spatiale, spectrale et temporelle / Expert knowledge and modeling for the exploitation of earth observation images with a high spatial, spectral and temporal resolution Osman, Julien 10 February 2015 (has links) Les futures missions spatiales d'observation de la Terre, Venµs et Sentinelle (1 et 2), fourniront un flot de données inédit en termes de résolution spatiale, revisite temporelle et richesse spectrale. Afin d'exploiter de façon efficace ces données pour la réalisation de cartes d'occupation des sols ou de détection de changements, des approches rapides, robustes et le moins supervisées possibles seront nécessaires. Un exemple d'utilisation de ces données pourrait être d'identifier, dès le mois de mai, les surfaces couvertes par du maïs dans tout le Sud-ouest de la France. Ou encore d'obtenir une carte d'occupation des sols mensuelle, dans un délai très court, à l'échelle de grandes régions. On constate que les images seules ne permettent pas d'obtenir de telles données. Nous avons cependant d'autres types d'informations à notre disposition, qui ont jusqu'alors été très peu exploitées. Ce travail de thèse a consisté à identifier les informations dites a priori disponibles, évaluer leur pertinence, et les introduire dans les chaînes de traitement déjà existantes pour chiffrer leur apport. Nous nous sommes intéressés en particulier au domaine du suivi de l'agriculture. Les informations que nous avons utilisées sont, entre autres, les connaissances sur les pratiques agricoles (rotations de culture, irrigation, alternances de catégories de cultures, etc.), les tailles des parcelles et la topographie. Nous avons principalement travaillé avec 2 sources de connaissances a priori : * Celles contenues dans des bases de données telles que le Registre Parcellaire Graphique (RPG). Nous avons utilisé des méthodes d'apprentissage automatique sur les données pour les extraire. * Celles fournies par des experts. Nous les avons modélisées à l'aide de règles de la logique de 1er ordre. Une des contributions de cette thèse est la sélection et l'évaluation d'un outil qui permette d'extraire l'information et de la traiter, de manière à ce qu'elle soit introduite de façon efficace dans les algorithmes de classification déjà existants. Pour cela, nous avons utilisé la Logique de Markov, un outil statistique capable de travailler à la fois sur des informations issues de bases de données, et sur des informations modélisées sous la forme de règles logiques. Nous avons montré que l'utilisation de ces données permet d'améliorer la qualité des cartes d'occupation du sol. Nous avons de plus montré que ces informations permettent d'obtenir des cartes en quasi-temps-réel, dont la qualité va crescendo avec l'arrivé de nouvelles informations. En conclusion de ce travail de thèse, nous donnons des pistes pour appliquer la même méthodologie à d'autres domaines, en particulier au suivi des forêts tropicales et à la cartographie générique de l'occupation du sol. / The future Earth observation space missions, Venµs and Sentinel (1 and 2), will provide us with a flow of data unseen in terms of spatial, spectral and temporal resolution. To use these data efficiently for the generation of land cover maps or change detection, we need fast, robust approaches that require as little supervision as possible. For instance, a concrete use of these data could be the identification, as early as May, of the area growing corn in all the South-West part of France. Or obtaining a monthly land cover map, in a slight delay, on large areas. Images alone don't allow us to reach such goals. Nevertheless, other information is available, which hasn't been really used. The main goal of this thesis is to identify available prior information, evaluate its revelance, and introduce it in preexisting processing chains to assess its contribution. We focused on agriculture monitoring. The information we used is knowledge on farming practices (crop rotations, irrigation, crop class alternation, etc) and the size and the topography of the fields. We mainly worked with 2 sources of prior knowledge: * Knowledge contained in databases such as the Registre Parcellaire Graphique (RPG). We used data mining methods to extract it. * Knowledge provided by experts. We modeled it with 1\up{st} order logic rules. One contribution of this thesis is the selection and assessment of a tool allowing us to extract and process information in a way that we can introduce it efficiently in preexisting classification algorithms: Markov Logic. Markov Logic is a statistical tool able to work with both information from databases and information modeled with logic rules. We show that using these data increases the quality of the land cover maps. We also show that this information allows us to obtain real time maps, whose quality increases with the arrival of new information. As a conclusion of this thesis work, we provide outlooks for applying the same methodology to other areas, such as the monitoring of tropical forests dans generic land cover mapping. Connaissances expertes Télédétection Markov logic
2	Statistical semantic processing using Markov logic Meza-Ruiz, Ivan Vladimir January 2009 (has links) Markov Logic (ML) is a novel approach to Natural Language Processing tasks [Richardson and Domingos, 2006; Riedel, 2008]. It is a Statistical Relational Learning language based on First Order Logic (FOL) and Markov Networks (MN). It allows one to treat a task as structured classification. In this work, we investigate ML for the semantic processing tasks of Spoken Language Understanding (SLU) and Semantic Role Labelling (SRL). Both tasks consist of identifying a semantic representation for the meaning of a given utterance/sentence. However, they differ in nature: SLU is in the field of dialogue systems where the domain is closed and language is spoken [He and Young, 2005], while SRL is for open domains and traditionally for written text [M´arquez et al., 2008]. Robust SLU is a key component of spoken dialogue systems. This component consists of identifying the meaning of the user utterances addressed to the system. Recent statistical approaches to SLU depend on additional resources (e.g., gazetteers, grammars, syntactic treebanks) which are expensive and time-consuming to produce and maintain. On the other hand, simple datasets annotated only with slot-values are commonly used in dialogue system development, and are easy to collect, automatically annotate, and update. However, slot-values leave out some of the fine-grained long distance dependencies present in other semantic representations. In this work we investigate the development of SLU modules with minimum resources with slot-values as their semantic representation. We propose to use the ML to capture long distance dependencies which are not explicitly available in the slot-value semantic representation. We test the adequacy of the ML framework by comparing against a set of baselines using state of the art approaches to semantic processing. The results of this research have been published in Meza-Ruiz et al. [2008a,b]. Furthermore, we address the question of scalability of the ML approach for other NLP tasks involving the identification of semantic representations. In particular, we focus on SRL: the task of identifying predicates and arguments within sentences, together with their semantic roles. The semantic representation built during SRL is more complex than the slot-values used in dialogue systems, in the sense that they include the notion of predicate/argument scope. SRL is defined in the context of open domains under the premises that there are several levels of extra resources (lemmas, POS tags, constituent or dependency parses). In this work, we propose a ML model of SRL and experiment with the different architectures we can describe for the model which gives us an insight into the types of correlations that the ML model can express [Riedel and Meza-Ruiz, 2008; Meza-Ruiz and Riedel, 2009]. Additionally, we tested our minimal resources setup in a state of the art dialogue system: the TownInfo system. In this case, we were given a small dataset of gold standard semantic representations which were system dependent, and we rapidly developed a SLU module used in the functioning dialogue system. No extra resources were necessary in order to reach state of the art results. 006.3
3	Integrating Linked Data search results using statistical relational learning approaches Al Shekaili, Dhahi January 2017 (has links) Linked Data (LD) follows the web in providing low barriers to publication, and in deploying web-scale keyword search as a central way of identifying relevant data. As in the web, searchesinitially identify results in broadly the form in which they were published, and the published form may be provided to the user as the result of a search. This will be satisfactory in some cases, but the diversity of publishers means that the results of the search may be obtained from many different sources, and described in many different ways. As such, there seems to bean opportunity to add value to search results by providing userswith an integrated representation that brings together features from different sources. This involves an on-the-fly and automated data integration process being applied to search results, which raises the question as to what technologies might bemost suitable for supporting the integration of LD searchresults. In this thesis we take the view that the problem of integrating LD search results is best approached by assimilating different forms ofevidence that support the integration process. In particular, thisdissertation shows how Statistical Relational Learning (SRL) formalisms (viz., Markov Logic Networks (MLN) and Probabilistic Soft Logic (PSL)) can beexploited to assimilate different sources of evidence in a principledway and to beneficial effect for users. Specifically, in this dissertation weconsider syntactic evidence derived from LD search results and from matching algorithms, semantic evidence derived from LD vocabularies, and user evidence,in the form of feedback. This dissertation makes the following key contributions: (i) a characterisation of key features of LD search results that are relevant to their integration, and a description of some initial experiences in the use of MLN for interpreting search results; (ii)a PSL rule-base that models the uniform assimilation of diverse kinds of evidence;(iii) an empirical evaluation of how the contributed MLN and PSL approaches perform in terms of their ability to infer a structure for integrating LD search results;and (iv) concrete examples of how populating such inferred structures for presentation to the end user is beneficial, as well as guiding the collection of feedbackwhose assimilation further improves search results presentation. 004
4	An Analysis and Reasoning Framework for Project Data Software Repositories Attarian, Ioanna Maria January 2012 (has links) As the requirements for software systems increase, their size, complexity and functionality consequently increases as well. This has a direct impact on the complexity of numerous artifacts related to the system such as specification, design, implementation and, testing models. Furthermore, as the software market becomes more and more competitive, the need for software products that are of high quality and require the least monetary, time and human resources for their development and maintenance becomes evident. Therefore, it is important that project managers and software engineers are given the necessary tools to obtain a more holistic and accurate perspective of the status of their projects in order to early identify potential risks, flaws, and quality issues that may arise during each stage of the software project life cycle. In this respect, practitioners and academics alike have recognized the significance of investigating new methods for supporting software management operations with respect to large software projects. The main target of this M.A.Sc. thesis is the design of a framework in terms of, first, a reference architecture for mining and analyzing of software project data repositories according to specific objectives and analytic knowledge, second, the techniques to model such analytic knowledge and, third, a reasoning methodology for verifying or denying hypotheses related to analysis objectives. Such a framework could assist project managers, team leaders and development teams towards more accurate prediction of project traits such as quality analysis, risk assessment, cost estimation and progress evaluation. More specifically, the framework utilizes goal models to specify analysis objectives as well as, possible ways by which these objectives can be achieved. Examples of such analysis objectives for a project could be to yield, high code quality, achieve low production cost or, cope with tight delivery deadlines. Such goal models are consequently transformed into collections of Markov Logic Network rules which are then applied to the repository data in order to verify or deny with a degree of probability, whether the particular project objectives can be met as the project evolves. The proposed framework has been applied, as a proof of concept, on a repository pertaining to three industrial projects with more that one hundred development tasks. Risk and Quality Assessment Software Data Repository Markov Logic Networks
5	Requirement-based Root Cause Analysis Using Log Data Zawawy, Hamzeh January 2012 (has links) Root Cause Analysis for software systems is a challenging diagnostic task due to complexity emanating from the interactions between system components. Furthermore, the sheer size of the logged data makes it often difficult for human operators and administrators to perform problem diagnosis and root cause analysis. The diagnostic task is further complicated by the lack of models that could be used to support the diagnostic process. Traditionally, this diagnostic task is conducted by human experts who create mental models of systems, in order to generate hypotheses and conduct the analysis even in the presence of incomplete logged data. A challenge in this area is to provide the necessary concepts, tools, and techniques for the operators to focus their attention to specific parts of the logged data and ultimately to automate the diagnostic process. The work described in this thesis aims at proposing a framework that includes techniques, formalisms, and algorithms aimed at automating the process of root cause analysis. In particular, this work uses annotated requirement goal models to represent the monitored systems' requirements and runtime behavior. The goal models are used in combination with log data to generate a ranked set of diagnostics that represent the combination of tasks that failed leading to the observed failure. In addition, the framework uses a combination of word-based and topic-based information retrieval techniques to reduce the size of log data by filtering out a subset of log data to facilitate the diagnostic process. The process of log data filtering and reduction is based on goal model annotations and generates a sequence of logical literals that represent the possible systems' observations. A second level of investigation consists of looking for evidence for any malicious (i.e., intentionally caused by a third party) activity leading to task failures. This analysis uses annotated anti-goal models that denote possible actions that can be taken by an external user to threaten a given system task. The framework uses a novel probabilistic approach based on Markov Logic Networks. Our experiments show that our approach improves over existing proposals by handling uncertainty in observations, using natively generated log data, and by providing ranked diagnoses. The proposed framework has been evaluated using a test environment based on commercial off-the-shelf software components, publicly available Java Based ATM machine, and the large publicly available dataset (DARPA 2000). Root cause analysis Log data Probabilistic reasoning Markov Logic Network Electrical and Computer Engineering
6	Improving the accuracy and scalability of discriminative learning methods for Markov logic networks Huynh, Tuyen Ngoc 01 June 2011 (has links) Many real-world problems involve data that both have complex structures and uncertainty. Statistical relational learning (SRL) is an emerging area of research that addresses the problem of learning from these noisy structured/relational data. Markov logic networks (MLNs), sets of weighted first-order logic formulae, are a simple but powerful SRL formalism that generalizes both first-order logic and Markov networks. MLNs have been successfully applied to a variety of real-world problems ranging from extraction knowledge from text to visual event recognition. Most of the existing learning algorithms for MLNs are in the generative setting: they try to learn a model that is equally capable of predicting the values of all variables given an arbitrary set of evidence; and they do not scale to problems with thousands of examples. However, many real-world problems in structured/relational data are discriminative--where the variables are divided into two disjoint sets input and output, and the goal is to correctly predict the values of the output variables given evidence data about the input variables. In addition, these problems usually involve data that have thousands of examples. Thus, it is important to develop new discriminative learning methods for MLNs that are more accurate and more scalable, which are the topics addressed in this thesis. First, we present a new method that discriminatively learns both the structure and parameters for a special class of MLNs where all the clauses are non-recursive ones. Non-recursive clauses arise in many learning problems in Inductive Logic Programming. To further improve the predictive accuracy, we propose a max-margin approach to learning weights for MLNs. Then, to address the issue of scalability, we present CDA, an online max-margin weight learning algorithm for MLNs. Ater [sic] that, we present OSL, the first algorithm that performs both online structure learning and parameter learning. Finally, we address an issue arising in applying MLNs to many real-world problems: learning in the presence of many hard constraints. Including hard constraints during training greatly increases the computational complexity of the learning problem. Thus, we propose a simple heuristic for selecting which hard constraints to include during training. Experimental results on several real-world problems show that the proposed methods are more accurate, more scalable (can handle problems with thousands of examples), or both more accurate and more scalable than existing learning methods for MLNs. / text Markov logic networks Statistical relational learning Structured prediction Logic networks Artificial intelligence Machine learning
7	Efficient prediction of relational structure and its application to natural language processing Riedel, Sebastian January 2009 (has links) Many tasks in Natural Language Processing (NLP) require us to predict a relational structure over entities. For example, in Semantic Role Labelling we try to predict the ’semantic role’ relation between a predicate verb and its argument constituents. Often NLP tasks not only involve related entities but also relations that are stochastically correlated. For instance, in Semantic Role Labelling the roles of different constituents are correlated: we cannot assign the agent role to one constituent if we have already assigned this role to another. Statistical Relational Learning (also known as First Order Probabilistic Logic) allows us to capture the aforementioned nature of NLP tasks because it is based on the notions of entities, relations and stochastic correlations between relationships. It is therefore often straightforward to formulate an NLP task using a First Order probabilistic language such as Markov Logic. However, the generality of this approach comes at a price: the process of finding the relational structure with highest probability, also known as maximum a posteriori (MAP) inference, is often inefficient, if not intractable. In this work we seek to improve the efficiency of MAP inference for Statistical Relational Learning. We propose a meta-algorithm, namely Cutting Plane Inference (CPI), that iteratively solves small subproblems of the original problem using any existing MAP technique and inspects parts of the problem that are not yet included in the current subproblem but could potentially lead to an improved solution. Our hypothesis is that this algorithm can dramatically improve the efficiency of existing methods while remaining at least as accurate. We frame the algorithm in Markov Logic, a language that combines First Order Logic and Markov Networks. Our hypothesis is evaluated using two tasks: Semantic Role Labelling and Entity Resolution. It is shown that the proposed algorithm improves the efficiency of two existing methods by two orders of magnitude and leads an approximate method to more probable solutions. We also give show that CPI, at convergence, is guaranteed to be at least as accurate as the method used within its inner loop. Another core contribution of this work is a theoretic and empirical analysis of the boundary conditions of Cutting Plane Inference. We describe cases when Cutting Plane Inference will definitely be difficult (because it instantiates large networks or needs many iterations) and when it will be easy (because it instantiates small networks and needs only few iterations). 005.1
8	An Analysis and Reasoning Framework for Project Data Software Repositories Attarian, Ioanna Maria January 2012 (has links) As the requirements for software systems increase, their size, complexity and functionality consequently increases as well. This has a direct impact on the complexity of numerous artifacts related to the system such as specification, design, implementation and, testing models. Furthermore, as the software market becomes more and more competitive, the need for software products that are of high quality and require the least monetary, time and human resources for their development and maintenance becomes evident. Therefore, it is important that project managers and software engineers are given the necessary tools to obtain a more holistic and accurate perspective of the status of their projects in order to early identify potential risks, flaws, and quality issues that may arise during each stage of the software project life cycle. In this respect, practitioners and academics alike have recognized the significance of investigating new methods for supporting software management operations with respect to large software projects. The main target of this M.A.Sc. thesis is the design of a framework in terms of, first, a reference architecture for mining and analyzing of software project data repositories according to specific objectives and analytic knowledge, second, the techniques to model such analytic knowledge and, third, a reasoning methodology for verifying or denying hypotheses related to analysis objectives. Such a framework could assist project managers, team leaders and development teams towards more accurate prediction of project traits such as quality analysis, risk assessment, cost estimation and progress evaluation. More specifically, the framework utilizes goal models to specify analysis objectives as well as, possible ways by which these objectives can be achieved. Examples of such analysis objectives for a project could be to yield, high code quality, achieve low production cost or, cope with tight delivery deadlines. Such goal models are consequently transformed into collections of Markov Logic Network rules which are then applied to the repository data in order to verify or deny with a degree of probability, whether the particular project objectives can be met as the project evolves. The proposed framework has been applied, as a proof of concept, on a repository pertaining to three industrial projects with more that one hundred development tasks. Risk and Quality Assessment Software Data Repository Markov Logic Networks
9	Requirement-based Root Cause Analysis Using Log Data Zawawy, Hamzeh January 2012 (has links) Root Cause Analysis for software systems is a challenging diagnostic task due to complexity emanating from the interactions between system components. Furthermore, the sheer size of the logged data makes it often difficult for human operators and administrators to perform problem diagnosis and root cause analysis. The diagnostic task is further complicated by the lack of models that could be used to support the diagnostic process. Traditionally, this diagnostic task is conducted by human experts who create mental models of systems, in order to generate hypotheses and conduct the analysis even in the presence of incomplete logged data. A challenge in this area is to provide the necessary concepts, tools, and techniques for the operators to focus their attention to specific parts of the logged data and ultimately to automate the diagnostic process. The work described in this thesis aims at proposing a framework that includes techniques, formalisms, and algorithms aimed at automating the process of root cause analysis. In particular, this work uses annotated requirement goal models to represent the monitored systems' requirements and runtime behavior. The goal models are used in combination with log data to generate a ranked set of diagnostics that represent the combination of tasks that failed leading to the observed failure. In addition, the framework uses a combination of word-based and topic-based information retrieval techniques to reduce the size of log data by filtering out a subset of log data to facilitate the diagnostic process. The process of log data filtering and reduction is based on goal model annotations and generates a sequence of logical literals that represent the possible systems' observations. A second level of investigation consists of looking for evidence for any malicious (i.e., intentionally caused by a third party) activity leading to task failures. This analysis uses annotated anti-goal models that denote possible actions that can be taken by an external user to threaten a given system task. The framework uses a novel probabilistic approach based on Markov Logic Networks. Our experiments show that our approach improves over existing proposals by handling uncertainty in observations, using natively generated log data, and by providing ranked diagnoses. The proposed framework has been evaluated using a test environment based on commercial off-the-shelf software components, publicly available Java Based ATM machine, and the large publicly available dataset (DARPA 2000). Root cause analysis Log data Probabilistic reasoning Markov Logic Network Electrical and Computer Engineering
10	Apprentissage statistique relationnel : apprentissage de structures de réseaux de Markov logiques / Statistical relational learning : Structure learning for Markov logic networks Dinh, Quang-Thang 28 November 2011 (has links) Un réseau logique de Markov est formé de clauses en logique du premier ordre auxquelles sont associés des poids. Cette thèse propose plusieurs méthodes pour l’apprentissage de la structure de réseaux logiques de Markov (MLN) à partir de données relationnelles. Ces méthodes sont de deux types, un premier groupe reposant sur les techniques de propositionnalisation et un second groupe reposant sur la notion de Graphe des Prédicats. L’idée sous-jacente aux méthodes à base de propositionnalisation consiste à construire un jeu de clauses candidates à partir de jeux de littéraux dépendants. Pour trouver de tels jeux, nous utilisons une méthode de propositionnalisation afin de reporter les informations relationnelles dans des tableaux booléens, qui serviront comme tables de contingence pour des test de dépendance. Nous avons proposé deux méthodes de propositionnalisation, pour lesquelles trois algorithmes ont été développés, qui couvrent les problèmes d’appprentissage génératif et discriminant. Nous avons ensuite défini le concept de Graphe des Prédicats qui synthétise les relations binaires entre les prédicats d’un domaine. Des clauses candidates peuvent être rapidement et facilement produites en suivant des chemins dans le graphe puis en les variabilisant. Nous avons développé deux algorithmes reposant sur les Graphes des Prédicats, qui couvrent les problèmes d’appprentissage génératif et discriminant. / A Markov Logic Network is composed of a set of weighted first-order logic formulas. In this dissertation we propose several methods to learn a MLN structure from a relational dataset. These methods are of two kinds: methods based on propositionalization and methods based on Graph of Predicates. The methods based on propositionalization are based on the idea of building a set of candidate clauses from sets of dependent variable literals. In order to find such sets of dependent variable literals, we use a propositionalization technique to transform relational information in the dataset into boolean tables, that are then provided as contingency tables for tests of dependence. Two propositionalization methods are proposed, from which three learners have been developed, that handle both generative and discriminative learning. We then introduce the concept of Graph of Predicates, which synthethises the binary relations between the predicates of a domain. Candidate clauses can be quickly and easily generated by simply finding paths in the graph and then variabilizing them. Based on this Graph, two learners have been developed, that handle both generative and discriminative learning. Réseaux logiques de Markov Apprentissage de structure Apprentissage statistique relationnel Markov logic networks Structure learning Statistical relational learning

Search results