Modeling natural language is among fundamental challenges of artificial intelligence and the design of interactive machines, with applications spanning across various domains, such as dialogue systems, text generation and machine translation. We propose a discriminatively trained log-linear model to learn the distribution of words following a given context. Due to data sparsity, it is necessary to appropriately regularize the model using a penalty term. We design a penalty term that properly encodes the structure of the feature space to avoid overfitting and improve generalization while appropriately capturing long range dependencies. Some nice properties of specific structured penalties can be used to reduce the number of parameters required to encode the model. The outcome is an efficient model that suitably captures long dependencies in language without a significant increase in time or space requirements. In a log-linear model, both training and testing become increasingly expensive with growing number of classes. The number of classes in a language model is the size of the vocabulary which is typically very large. A common trick is to cluster classes and apply the model in two-steps; the first step picks the most probable cluster and the second picks the most probable word from the chosen cluster. This idea can be generalized to a hierarchy of larger depth with multiple levels of clustering. However, the performance of the resulting hierarchical classifier depends on the suitability of the clustering to the problem. We study different strategies to build the hierarchy of categories from their observations.
Identifer | oai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-01001634 |
Date | 11 February 2014 |
Creators | Nelakanti, Anil Kumar |
Publisher | Université Pierre et Marie Curie - Paris VI |
Source Sets | CCSD theses-EN-ligne, France |
Language | English |
Detected Language | English |
Type | PhD thesis |
Page generated in 0.0019 seconds