This thesis investigates the application of structured sequence classification models to multilingual natural language processing (NLP). Many tasks tackled by NLP can be framed as classification, where we seek to assign a label to a particular piece of text, be it a word, sentence or document. Yet often the labels which we’d like to assign exhibit complex internal structure, such as labelling a sentence with its parse tree, and there may be an exponential number of them to choose from. Structured classification seeks to exploit the structure of the labels in order to allow both generalisation across labels which differ by only a small amount, and tractable searches over all possible labels. In this thesis we focus on the application of conditional random field (CRF) models (Lafferty et al., 2001). These models assign an undirected graphical structure to the labels of the classification task and leverage dynamic programming algorithms to efficiently identify the optimal label for a given input. We develop a range of models for two multilingual NLP applications: word-alignment for statistical machine translation (SMT), and multilingual super tagging for highly lexicalised grammars.
Identifer | oai:union.ndltd.org:ADTP/245493 |
Creators | Blunsom, Philip |
Source Sets | Australiasian Digital Theses Program |
Language | English |
Detected Language | English |
Rights | Terms and Conditions: Copyright in works deposited in the University of Melbourne Eprints Repository (UMER) is retained by the copyright owner. The work may not be altered without permission from the copyright owner. Readers may only, download, print, and save electronic copies of whole works for their own personal non-commercial use. Any use that exceeds these limits requires permission from the copyright owner. Attribution is essential when quoting or paraphrasing from these works., Open Access |
Page generated in 0.0017 seconds