Global ETD Search

1091	Statistical modeling of multiword expressions Su, Kim Nam January 2008 (has links) In natural languages, words can occur in single units called simplex words or in a group of simplex words that function as a single unit, called multiword expressions (MWEs). Although MWEs are similar to simplex words in their syntax and semantics, they pose their own sets of challenges (Sag et al. 2002). MWEs are arguably one of the biggest roadblocks in computational linguistics due to the bewildering range of syntactic, semantic, pragmatic and statistical idiomaticity they are associated with, and their high productivity. In addition, the large numbers in which they occur demand specialized handling. Moreover, dealing with MWEs has a broad range of applications, from syntactic disambiguation to semantic analysis in natural language processing (NLP) (Wacholder and Song 2003; Piao et al. 2003; Baldwin et al. 2004; Venkatapathy and Joshi 2006). / Our goals in this research are: to use computational techniques to shed light on the underlying linguistic processes giving rise to MWEs across constructions and languages; to generalize existing techniques by abstracting away from individual MWE types; and finally to exemplify the utility of MWE interpretation within general NLP tasks. / In this thesis, we target English MWEs due to resource availability. In particular, we focus on noun compounds (NCs) and verb-particle constructions (VPCs) due to their high productivity and frequency. / Challenges in processing noun compounds are: (1) interpreting the semantic relation (SR) that represents the underlying connection between the head noun and modifier(s); (2) resolving syntactic ambiguity in NCs comprising three or more terms; and (3) analyzing the impact of word sense on noun compound interpretation. Our basic approach to interpreting NCs relies on the semantic similarity of the NC components using firstly a nearest-neighbor method (Chapter 5), then verb semantics based on the observation that it is often an underlying verb that relates the nouns in NCs (Chapter 6), and finally semantic variation within NC sense collocations, in combination with bootstrapping (Chapter 7). / Challenges in dealing with verb-particle constructions are: (1) identifying VPCs in raw text data (Chapter 8); and (2) modeling the semantic compositionality of VPCs (Chapter 5). We place particular focus on identifying VPCs in context, and measuring the compositionality of unseen VPCs in order to predict their meaning. Our primary approach to the identification task is to adapt localized context information derived from linguistic features of VPCs to distinguish between VPCs and simple verb-PP combinations. To measure the compositionality of VPCs, we use semantic similarity among VPCs by testing the semantic contribution of each component. / Finally, we conclude the thesis with a chapter-by-chapter summary and outline of the findings of our work, suggestions of potential NLP applications, and a presentation of further research directions (Chapter 9).
1092	Structured classification for multilingual natural language processing Blunsom, Philip Unknown Date (has links) (PDF) This thesis investigates the application of structured sequence classification models to multilingual natural language processing (NLP). Many tasks tackled by NLP can be framed as classification, where we seek to assign a label to a particular piece of text, be it a word, sentence or document. Yet often the labels which we’d like to assign exhibit complex internal structure, such as labelling a sentence with its parse tree, and there may be an exponential number of them to choose from. Structured classification seeks to exploit the structure of the labels in order to allow both generalisation across labels which differ by only a small amount, and tractable searches over all possible labels. In this thesis we focus on the application of conditional random field (CRF) models (Lafferty et al., 2001). These models assign an undirected graphical structure to the labels of the classification task and leverage dynamic programming algorithms to efficiently identify the optimal label for a given input. We develop a range of models for two multilingual NLP applications: word-alignment for statistical machine translation (SMT), and multilingual super tagging for highly lexicalised grammars.
1093	Intelligent content-based image retrieval framework based on semi-automated learning and historic profiles chungkp@yahoo.com, Kien- Ping Chung January 2007 (has links) Over the last decade, storage of non text-based data in databases has become an increasingly important trend in information management. Image in particular, has been gaining popularity as an alternative, and sometimes more viable, option for information storage. While this presents a wealth of information, it also creates a great problem in retrieving appropriate and relevant information during searching. This has resulted in an enormous growth of interest, and much active research, into the extraction of relevant information from non text-based databases. In particular,content-based image retrieval (CBIR) systems have been one of the most active areas of research. The retrieval principle of CBIR systems is based on visual features such as colour, texture, and shape or the semantic meaning of the images. To enhance the retrieval speed, most CBIR systems pre-process the images stored in the database. This is because feature extraction algorithms are often computationally expensive. If images are to be retrieved from the World-Wide-Web (WWW), the raw images have to be downloaded and processed in real time. In this case, the feature extraction speed becomes crucial. Ideally, systems should only use those feature extraction algorithms that are most suited for analysing the visual features that capture the common relationship between the images in hand. In this thesis, a statistical discriminant analysis based feature selection framework is proposed. Such a framework is able to select the most appropriate visual feature extraction algorithms by using relevance feedback only on the user labelled samples. The idea is that a smaller image sample group is used to analyse the appropriateness of each visual feature, and only the selected features will be used for image comparison and ranking. As the number of features is less, an improvement in the speed of retrieval is achieved. From experimental results, it is found that the retrieval accuracy for small sample data has also improved. Intelligent E-Business has been used as a case study in this thesis to demonstrate the potential of the framework in the application of image retrieval system. In addition, an inter-query framework has been proposed in this thesis. This framework is also based on the statistical discriminant analysis technique. A common approach in inter-query for a CBIR system is to apply the term-document approach. This is done by treating each images name or address as a term, and the query session as a document. However, scalability becomes an issue with this technique as the number of stored queries increases. Moreover, this approach is not appropriate for a dynamic image database environment. In this thesis, the proposed inter-query framework uses a cluster approach to capture the visual properties common to the previously stored queries. Thus, it is not necessary to memorise the name or address of the images. In order to manage the size of the users profile, the proposed framework also introduces a merging approach to combine clusters that are close-by and similar in their characteristics. Experiments have shown that the proposed framework has outperformed the short term learning approach. It also has the advantage that it eliminates the burden of the complex database maintenance strategies required in the term-document approach commonly needed by the interquery learning framework. Lastly, the proposed inter-query learning framework has been further extended by the incorporation of a new semantic structure. The semantic structure is used to connect the previous queries both visually and semantically. This structure provides the system with the ability to retrieve images that are semantically similar and yet visually different. To do this, an active learning strategy has been incorporated for exploring the structure. Experiments have again shown that the proposed new framework has outperformed the previous framework. Content based image retrieval system relevance feedback machine learning
1094	Applications of submodular minimization in machine learning / Narasimhan, Mukund, January 2007 (has links) Thesis (Ph. D.)--University of Washington, 2007. / Includes bibliographical references (p. 134-142).
1095	Intelligent knowledge acquisition system / Youn, Bong-Soo. January 1989 (has links) Thesis (M.S.)--Rochester Institute of Technology, 1989. / Includes bibliographical references (leaves 96-97).
1096	The problem of tuning metaheuristics as seen from the machine learning perspective Birattari, Mauro January 2005 (has links) Zugl.: Brüssel, Univ., Diss., 2005
1097	A learning framework for zero-knowledge game playing agents Duminy, Willem H. January 2006 (has links) Thesis (M.Sc.)(Computer Science)--University of Pretoria, 2006. / Includes summary. Includes bibliographical references (leaves 151-152). Available on the Internet via the World Wide Web.
1098	A time series classifier Gore, Christopher Mark, January 2008 (has links) (PDF) Thesis (M.S.)--Missouri University of Science and Technology, 2008. / Vita. The entire thesis text is included in file. Title from title screen of thesis/dissertation PDF file (viewed April 29, 2008) Includes bibliographical references (p. 53-55).
1099	An inductive logic programming approach to statistical relational learning Kersting, Kristian. January 1900 (has links) Thesis (Ph. D.)--Albert-Ludwigs-Universität Freiburg im Breisgau, 2006. / Description based on print version record. Includes bibliographical references (p. 201-221) and index.
1100	Improving machine learning through oracle learning / Menke, Joshua E. January 2007 (has links) (PDF) Thesis (Ph. D.)--Brigham Young University. Dept. of Computer Science, 2007. / Includes bibliographical references (p. 203-209).

Search results