Global ETD Search

1	Exact sampling and optimisation in statistical machine translation Aziz, Wilker Ferreira January 2014 (has links) In Statistical Machine Translation (SMT), inference needs to be performed over a high-complexity discrete distribution de ned by the intersection between a translation hypergraph and a target language model. This distribution is too complex to be represented exactly and one typically resorts to approximation techniques either to perform optimisation { the task of searching for the optimum translation { or sampling { the task of nding a subset of translations that is statistically representative of the goal distribution. Beam-search is an example of an approximate optimisation technique, where maximisation is performed over a heuristically pruned representation of the goal distribution. For inference tasks other than optimisation, rather than nding a single optimum, one is really interested in obtaining a set of probabilistic samples from the distribution. This is the case in training where one wishes to obtain unbiased estimates of expectations in order to t the parameters of a model. Samples are also necessary in consensus decoding where one chooses from a sample of likely translations the one that minimises a loss function. Due to the additional computational challenges posed by sampling, n-best lists, a by-product of optimisation, are typically used as a biased approximation to true probabilistic samples. A more direct procedure is to attempt to directly draw samples from the underlying distribution rather than rely on n-best list approximations. Markov Chain Monte Carlo (MCMC) methods, such as Gibbs sampling, o er a way to overcome the tractability issues in sampling, however their convergence properties are hard to assess. That is, it is di cult to know when, if ever, an MCMC sampler is producing samples that are compatible iii with the goal distribution. Rejection sampling, a Monte Carlo (MC) method, is more fundamental and natural, it o ers strong guarantees, such as unbiased samples, but is typically hard to design for distributions of the kind addressed in SMT, rendering an intractable method. A recent technique that stresses a uni ed view between the two types of inference tasks discussed here \| optimisation and sampling \| is the OS approach. OS can be seen as a cross between Adaptive Rejection Sampling (an MC method) and A optimisation. In this view the intractable goal distribution is upperbounded by a simpler (thus tractable) proxy distribution, which is then incrementally re ned to be closer to the goal until the maximum is found, or until the sampling performance exceeds a certain level. This thesis introduces an approach to exact optimisation and exact sampling in SMT by addressing the tractability issues associated with the intersection between the translation hypergraph and the language model. The two forms of inference are handled in a uni ed framework based on the OS approach. In short, an intractable goal distribution, over which one wishes to perform inference, is upperbounded by tractable proposal distributions. A proposal represents a relaxed version of the complete space of weighted translation derivations, where relaxation happens with respect to the incorporation of the language model. These proposals give an optimistic view on the true model and allow for easier and faster search using standard dynamic programming techniques. In the OS approach, such proposals are used to perform a form of adaptive rejection sampling. In rejection sampling, samples are drawn from a proposal distribution and accepted or rejected as a function of the mismatch between the proposal and the goal. The technique is adaptive in that rejected samples are used to motivate a re nement of the upperbound proposal that brings it closer to the goal, improving the rate of acceptance. Optimisation can be connected to an extreme form of sampling, thus the framework introduced here suits both exact optimisation and exact iv sampling. Exact optimisation means that the global maximum is found with a certi cate of optimality. Exact sampling means that unbiased samples are independently drawn from the goal distribution. We show that by using this approach exact inference is feasible using only a fraction of the time and space that would be required by a full intersection, without recourse to pruning techniques that only provide approximate solutions. We also show that the vast majority of the entries (n-grams) in a language model can be summarised by shorter and optimistic entries. This means that the computational complexity of our approach is less sensitive to the order of the language model distribution than a full intersection would be. Particularly in the case of sampling, we show that it is possible to draw exact samples compatible with distributions which incorporate a high-order language model component from proxy distributions that are much simpler. In this thesis, exact inference is performed in the context of both hierarchical and phrase-based models of translation, the latter characterising a problem that is NP-complete in nature. 418.020285635
2	Provable Guarantees of Learning with Incomplete and Latent Data Chuyang Ke (15337258) 21 April 2023 (has links) <p>Real-world datasets are rarely clean. This causes the discrepancy between the claimed performance of machine learning algorithms on paper, and their actual performance on real-world problems. When dealing with missing or hidden information in a dataset, researchers have been using heuristic imputation methods since the first day of machine learning. However, it is known that many imputation methods do not have theoretical guarantees in various machine learning tasks, including clustering, community detection, sparsity recovery, to name a few. On the other hand, theoretical machine learning papers often follow simplistic assumptions, which are rarely fulfilled in real-world datasets. My research focuses on developing statistically and computationally efficient learning algorithms with provable guarantees under novel incomplete and latent assumptions. We consider problems with arguably more realistic incomplete and latent assumptions.We provide analysis to community detection in various network models, inference with latent variables in an arbitrary planted model, federated myopic community detection, and high-order tensor models. We analyze the interaction between the missing or latent structures and the inference / recoverability conditions, and proposed algorithms to solve the problems efficiently. <br> <br> Our main contributions in this thesis are as follows.<br> </p> <ol> <li>We analyze the information-theoretic limits for the recovery of node labels in several network models. We analyze the information-theoretic limits for community detection. We carefully construct restricted ensembles for a subclass of network models, and provide a series of novel results. </li> <li>We analyze the necessary and sufficient conditions for exact inference of a latent model. We show that exact inference can be achieved using a semidefinite programming approach without knowing either the latent variables or their domain. Our analysis predicts the experimental correctness of SDP with high accuracy, showing the suitability of our focus on the Karush-Kuhn-Tucker conditions and the spectrum of a properly defined matrix.</li> <li>We study the problem of recovering the community structure of a network under federated myopic learning. Under this paradigm, we have several clients, each of them having a myopic view, i.e., observing a small subgraph of the network. Each client sends a censored evidence graph to a central server. We provide an efficient algorithm, which computes a consensus signed weighted graph from clients evidence, and recovers the underlying network structure in the central server. We analyze the topological structure conditions of the network, as well as the signal and noise levels of the clients that allow for recovery of the network structure. Our analysis shows that exact recovery is possible and can be achieved in polynomial time.</li> <li>We study the problem of exact partitioning of high-order models. We consider two different high-order assumptions, and show that exact partitioning of high-order planted models is achievable through solving a convex optimization problem with a novel Carathéodory symmetric tensor cone in one case, and with a tensor nuclear norm constraint in the other.</li> <li>We study the problem of inference in high-order structured prediction tasks. We apply a generative model approach to study the problem of high-order inference, and provide a two-stage convex optimization algorithm for exact label recovery. We also connect the performance of our algorithm and the hyperedge expansion property using a novel hypergraph Cheeger-type inequality.</li> <li>We study the problem of partial recovery through semidefinite programming. We are interested in the scenarios in which the SDP returns a solution that is partially correct without any rounding. We analyze the optimality condition of partial recovery and provide statistical and topological guarantees. </li> </ol> Knowledge representation and reasoning structured prediction learning theory combinatorial optimization machine learning community detection exact inference
3	STRUCTURED PREDICTION: STATISTICAL AND COMPUTATIONAL GUARANTEES IN LEARNING AND INFERENCE Kevin Segundo Bello Medina (11196552) 28 July 2021 (has links) <div>Structured prediction consists of receiving a structured input and producing a combinatorial structure such as trees, clusters, networks, sequences, permutations, among others. From the computational viewpoint, structured prediction is in general considered <i>intractable</i> because of the size of the output space being exponential in the input size. For instance, in image segmentation tasks, the number of admissible segments is exponential in the number of pixels. A second factor is the combination of the input dimensionality along with the amount of data under availability. In structured prediction it is common to have the input live in a high-dimensional space, which involves to jointly reason about thousands or millions of variables, and at the same time contend with limited amount of data. Thus, learning and inference methods with strong computational and statistical guarantees are desired. The focus of our research is then to propose <i>principled methods</i> for structured prediction that are both polynomial time, i.e., <i>computationally efficient</i>, and require a polynomial number of data samples, i.e., <i>statistically efficient</i>.</div><div><br></div><div>The main contributions of this thesis are as follows:</div><div><br></div><div>(i) We develop an efficient and principled learning method of latent variable models for structured prediction under Gaussian perturbations. We derive a Rademacher-based generalization bound and argue that the use of non-convex formulations in learning latent-variable models leads to tighter bounds of the Gibbs decoder distortion.</div><div><br></div><div>(ii) We study the fundamental limits of structured prediction, i.e., we characterize the necessary sample complexity for learning factor graph models in the context of structured prediction. In particular, we show that the finiteness of our novel MaxPair-dimension is necessary for learning. Lastly, we show a connection between the MaxPair-dimension and the VC-dimension---which allows for using existing results on VC-dimension to calculate the MaxPair-dimension.</div><div><br></div><div>(iii) We analyze a generative model based on connected graphs, and find the structural conditions of the graph that allow for the exact recovery of the node labels. In particular, we show that exact recovery is realizable in polynomial time for a large class of graphs. Our analysis is based on convex relaxations, where we thoroughly analyze a semidefinite program and a degree-4 sum-of-squares program. Finally, we extend this model to consider linear constraints (e.g., fairness), and formally explain the effect of the added constraints on the probability of exact recovery.</div><div><br></div> structured prediction learning theory combinatorial optimization machine learning sample complexity generalization bounds exact inference graphical models
4	Algoritmos de inferência exata para modelos de primeira ordem. / Exact inference algorithms for first-order models. Takiyama, Felipe Iwao 27 February 2014 (has links) Este trabalho descreve a implementação de algoritmos de inferência para modelos de primeira ordem. Três algoritmos foram implementados: ve, c-fove e ac-fove. Este último e o estado da arte no calculo de probabilidades em Redes Bayesianas Relacionais e não possua nenhuma implementação disponível. O desenvolvimento foi feito segundo uma metodologia ágil que resultou em um pacote de software que pode ser utilizado em outras implementações. Mostra-se que o software criado possui o desempenho esperado em teoria, embora apresente algumas limitações. Esta dissertação contribui também com novos tópicos teóricos que complementam o algoritmo. / In this work, we describe the implementation of inference algorithms for first order models. Three algorithms were implemented: ve, c-fove and ac-fove. The latter is the state of the art in probability calculations for Relational Bayesian Networks and had no implementation available. The development was done according to an agile methodology, which resulted in a software that can be used in other packages. We show that the resulting software has the expected performance from the theory, although with some limitations. This work also contributes with new theoretical topics that complement the algorithm. AC-FOVE AC-FOVE Algoritmo de inferência exata Exact inference algorithm
5	Algoritmos de inferência exata para modelos de primeira ordem. / Exact inference algorithms for first-order models. Felipe Iwao Takiyama 27 February 2014 (has links) Este trabalho descreve a implementação de algoritmos de inferência para modelos de primeira ordem. Três algoritmos foram implementados: ve, c-fove e ac-fove. Este último e o estado da arte no calculo de probabilidades em Redes Bayesianas Relacionais e não possua nenhuma implementação disponível. O desenvolvimento foi feito segundo uma metodologia ágil que resultou em um pacote de software que pode ser utilizado em outras implementações. Mostra-se que o software criado possui o desempenho esperado em teoria, embora apresente algumas limitações. Esta dissertação contribui também com novos tópicos teóricos que complementam o algoritmo. / In this work, we describe the implementation of inference algorithms for first order models. Three algorithms were implemented: ve, c-fove and ac-fove. The latter is the state of the art in probability calculations for Relational Bayesian Networks and had no implementation available. The development was done according to an agile methodology, which resulted in a software that can be used in other packages. We show that the resulting software has the expected performance from the theory, although with some limitations. This work also contributes with new theoretical topics that complement the algorithm. AC-FOVE Algoritmo de inferência exata AC-FOVE Exact inference algorithm
6	Inference v Bayesovských sítích / Inference in Bayesian Networks Šimeček, Josef January 2013 (has links) This master's thesis deals with demonstration of various approaches to probabilistic inference in Bayesian networks. Basics of probability theory, introduction to Bayesian networks, methods for Bayesian inference and applications of Bayesian networks are described in theoretical part. Inference techniques are explained and complemented by their algorithm. Techniques are also illustrated on example. Practical part contains implementation description, experiments with demonstration applications and conclusion of the results.

1

Page generated in 0.1002 seconds