Global ETD Search

1	Statistical semantic processing using Markov logic Meza-Ruiz, Ivan Vladimir January 2009 (has links) Markov Logic (ML) is a novel approach to Natural Language Processing tasks [Richardson and Domingos, 2006; Riedel, 2008]. It is a Statistical Relational Learning language based on First Order Logic (FOL) and Markov Networks (MN). It allows one to treat a task as structured classification. In this work, we investigate ML for the semantic processing tasks of Spoken Language Understanding (SLU) and Semantic Role Labelling (SRL). Both tasks consist of identifying a semantic representation for the meaning of a given utterance/sentence. However, they differ in nature: SLU is in the field of dialogue systems where the domain is closed and language is spoken [He and Young, 2005], while SRL is for open domains and traditionally for written text [M´arquez et al., 2008]. Robust SLU is a key component of spoken dialogue systems. This component consists of identifying the meaning of the user utterances addressed to the system. Recent statistical approaches to SLU depend on additional resources (e.g., gazetteers, grammars, syntactic treebanks) which are expensive and time-consuming to produce and maintain. On the other hand, simple datasets annotated only with slot-values are commonly used in dialogue system development, and are easy to collect, automatically annotate, and update. However, slot-values leave out some of the fine-grained long distance dependencies present in other semantic representations. In this work we investigate the development of SLU modules with minimum resources with slot-values as their semantic representation. We propose to use the ML to capture long distance dependencies which are not explicitly available in the slot-value semantic representation. We test the adequacy of the ML framework by comparing against a set of baselines using state of the art approaches to semantic processing. The results of this research have been published in Meza-Ruiz et al. [2008a,b]. Furthermore, we address the question of scalability of the ML approach for other NLP tasks involving the identification of semantic representations. In particular, we focus on SRL: the task of identifying predicates and arguments within sentences, together with their semantic roles. The semantic representation built during SRL is more complex than the slot-values used in dialogue systems, in the sense that they include the notion of predicate/argument scope. SRL is defined in the context of open domains under the premises that there are several levels of extra resources (lemmas, POS tags, constituent or dependency parses). In this work, we propose a ML model of SRL and experiment with the different architectures we can describe for the model which gives us an insight into the types of correlations that the ML model can express [Riedel and Meza-Ruiz, 2008; Meza-Ruiz and Riedel, 2009]. Additionally, we tested our minimal resources setup in a state of the art dialogue system: the TownInfo system. In this case, we were given a small dataset of gold standard semantic representations which were system dependent, and we rapidly developed a SLU module used in the functioning dialogue system. No extra resources were necessary in order to reach state of the art results. 006.3
2	An information theoretic approach to structured high-dimensional problems Das, Abhik Kumar 06 February 2014 (has links) A majority of the data transmitted and processed today has an inherent structured high-dimensional nature, either because of the process of encoding using high-dimensional codebooks for providing a systematic structure, or dependency of the data on a large number of agents or variables. As a result, many problem setups associated with transmission and processing of data have a structured high-dimensional aspect to them. This dissertation takes a look at two such problems, namely, communication over networks using network coding, and learning the structure of graphical representations like Markov networks using observed data, from an information-theoretic perspective. Such an approach yields intuition about good coding architectures as well as the limitations imposed by the high-dimensional framework. Th e dissertation studies the problem of network coding for networks having multiple transmission sessions, i.e., multiple users communicating with each other at the same time. The connection between such networks and the information-theoretic interference channel is examined, and the concept of interference alignment, derived from interference channel literature, is coupled with linear network coding to develop novel coding schemes off ering good guarantees on achievable throughput. In particular, two setups are analyzed – the first where each user requires data from only one user (multiple unicasts), and the second where each user requires data from potentially multiple users (multiple multicasts). It is demonstrated that one can achieve a rate equalling a signi ficant fraction of the maximal rate for each transmission session, provided certain constraints on the network topology are satisfi ed. Th e dissertation also analyzes the problem of learning the structure of Markov networks from observed samples – the learning problem is interpreted as a channel coding problem and its achievability and converse aspects are examined. A rate-distortion theoretic approach is taken for the converse aspect, and information-theoretic lower bounds on the number of samples, required for any algorithm to learn the Markov graph up to a pre-speci fied edit distance, are derived for ensembles of discrete and Gaussian Markov networks based on degree-bounded graphs. The problem of accurately learning the structure of discrete Markov networks, based on power-law graphs generated from the con figuration model, is also studied. The eff ect of power-law exponent value on the hardness of the learning problem is deduced from the converse aspect – it is shown that discrete Markov networks on power-law graphs with smaller exponent values require more number of samples to ensure accurate recovery of their underlying graphs for any learning algorithm. For the achievability aspect, an effi cient learning algorithm is designed for accurately reconstructing the structure of Ising model based on power-law graphs from the con figuration model; it is demonstrated that optimal number of samples su ffices for recovering the exact graph under certain constraints on the Ising model potential values. / text Networks Network coding Unicast Multicast Interference alignment Markov network Ising model Edit distance Power-law graph
3	On conditional random fields: applications, feature selection, parameter estimation and hierarchical modelling Tran, The Truyen January 2008 (has links) There has been a growing interest in stochastic modelling and learning with complex data, whose elements are structured and interdependent. One of the most successful methods to model data dependencies is graphical models, which is a combination of graph theory and probability theory. This thesis focuses on a special type of graphical models known as Conditional Random Fields (CRFs) (Lafferty et al., 2001), in which the output state spaces, when conditioned on some observational input data, are represented by undirected graphical models. The contributions of thesis involve both (a) broadening the current applicability of CRFs in the real world and (b) deepening the understanding of theoretical aspects of CRFs. On the application side, we empirically investigate the applications of CRFs in two real world settings. The first application is on a novel domain of Vietnamese accent restoration, in which we need to restore accents of an accent-less Vietnamese sentence. Experiments on half a million sentences of news articles show that the CRF-based approach is highly accurate. In the second application, we develop a new CRF-based movie recommendation system called Preference Network (PN). The PN jointly integrates various sources of domain knowledge into a large and densely connected Markov network. We obtained competitive results against well-established methods in the recommendation field. / On the theory side, the thesis addresses three important theoretical issues of CRFs: feature selection, parameter estimation and modelling recursive sequential data. These issues are all addressed under a general setting of partial supervision in that training labels are not fully available. For feature selection, we introduce a novel learning algorithm called AdaBoost.CRF that incrementally selects features out of a large feature pool as learning proceeds. AdaBoost.CRF is an extension of the standard boosting methodology to structured and partially observed data. We demonstrate that the AdaBoost.CRF is able to eliminate irrelevant features and as a result, returns a very compact feature set without significant loss of accuracy. Parameter estimation of CRFs is generally intractable in arbitrary network structures. This thesis contributes to this area by proposing a learning method called AdaBoost.MRF (which stands for AdaBoosted Markov Random Forests). As learning proceeds AdaBoost.MRF incrementally builds a tree ensemble (a forest) that cover the original network by selecting the best spanning tree at a time. As a result, we can approximately learn many rich classes of CRFs in linear time. The third theoretical work is on modelling recursive, sequential data in that each level of resolution is a Markov sequence, where each state in the sequence is also a Markov sequence at the finer grain. One of the key contributions of this thesis is Hierarchical Conditional Random Fields (HCRF), which is an extension to the currently popular sequential CRF and the recent semi-Markov CRF (Sarawagi and Cohen, 2004). Unlike previous CRF work, the HCRF does not assume any fixed graphical structures. / Rather, it treats structure as an uncertain aspect and it can estimate the structure automatically from the data. The HCRF is motivated by Hierarchical Hidden Markov Model (HHMM) (Fine et al., 1998). Importantly, the thesis shows that the HHMM is a special case of HCRF with slight modification, and the semi-Markov CRF is essentially a flat version of the HCRF. Central to our contribution in HCRF is a polynomial-time algorithm based on the Asymmetric Inside Outside (AIO) family developed in (Bui et al., 2004) for learning and inference. Another important contribution is to extend the AIO family to address learning with missing data and inference under partially observed labels. We also derive methods to deal with practical concerns associated with the AIO family, including numerical overflow and cubic-time complexity. Finally, we demonstrate good performance of HCRF against rivals on two applications: indoor video surveillance and noun-phrase chunking.
4	Lois de Wishart sur les cônes convexes / Wishart laws on convex cones Mamane, Salha 20 March 2017 (has links) En analyse multivariée de données de grande dimension, les lois de Wishart définies dans le contexte des modèles graphiques revêtent une grande importance car elles procurent parcimonie et modularité. Dans le contexte des modèles graphiques Gaussiens régis par un graphe G, les lois de Wishart peuvent être définies sur deux restrictions alternatives du cône des matrices symétriques définies positives : le cône PG des matrices symétriques définies positives x satisfaisant xij=0, pour tous sommets i et j non adjacents, et son cône dual QG. Dans cette thèse, nous proposons une construction harmonieuse de familles exponentielles de lois de Wishart sur les cônes PG et QG. Elle se focalise sur les modèles graphiques d'interactions des plus proches voisins qui présentent l'avantage d'être relativement simples tout en incluant des exemples de tous les cas particuliers intéressants: le cas univarié, un cas d'un cône symétrique, un cas d'un cône homogène non symétrique, et une infinité de cas de cônes non-homogènes. Notre méthode, simple, se fonde sur l'analyse sur les cônes convexes. Les lois de Wishart sur QAn sont définies à travers la fonction gamma sur QAn et les lois de Wishart sur PAn sont définies comme la famille de Diaconis- Ylvisaker conjuguée. Ensuite, les méthodes développées sont utilisées pour résoudre la conjecture de Letac- Massam sur l'ensemble des paramètres de la loi de Wishart sur QAn. Cette thèse étudie aussi les sousmodèles, paramétrés par un segment dans M, d'une famille exponentielle paramétrée par le domaine des moyennes M. / In the framework of Gaussian graphical models governed by a graph G, Wishart distributions can be defined on two alternative restrictions of the cone of symmetric positive definite matrices: the cone PG of symmetric positive definite matrices x satisfying xij=0 for all non-adjacent vertices i and j and its dual cone QG. In this thesis, we provide a harmonious construction of Wishart exponential families in graphical models. Our simple method is based on analysis on convex cones. The focus is on nearest neighbours interactions graphical models, governed by a graph An, which have the advantage of being relatively simple while including all particular cases of interest such as the univariate case, a symmetric cone case, a nonsymmetric homogeneous cone case and an infinite number of non-homogeneous cone cases. The Wishart distributions on QAn are constructed as the exponential family generated from the gamma function on QAn. The Wishart distributions on PAn are then constructed as the Diaconis- Ylvisaker conjugate family for the exponential family of Wishart distributions on QAn. The developed methods are then used to solve the Letac-Massam Conjecture on the set of parameters of type I Wishart distributions on QAn. Finally, we introduce and study exponential families of distributions parametrized by a segment of means with an emphasis on their Fisher information. The focus in on distributions with matrix parameters. The particular cases of Gaussian and Wishart exponential families are further examined. Wishart Riesz Modele graphique Reseau Markovien Loi gamma matricielle Graphical models Markov network Matrix-Variate gamma Exponential family 510

1

Page generated in 0.0532 seconds