Global ETD Search

Return to search

Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing

<p> Most machine learning algorithms require a fixed length input to be able to perform commonly desired tasks such as classification, clustering, and regression. For natural language processing, the inherently unbounded and recursive nature of the input poses a unique challenge when deriving such fixed length representations. Although today there is a general consensus on how to generate fixed length representations of individual words which preserve their meaning, the same cannot be said for sequences of words in sentences, paragraphs, or documents. In this work, we study the encoders commonly used to generate fixed length representations of natural language sequences, and analyze their effectiveness across a variety of high and low level tasks including sentence classification and question answering. Additionally, we propose novel improvements to the existing Skip-Thought and End-to-End Memory Network architectures and study their performance on both the original and auxiliary tasks. Ultimately, we show that the setting in which the encoders are trained, and the corpus used for training, have a greater influence of the final learned representation than the underlying sequence encoders themselves. </p><p>

http://pqdtopen.proquest.com/#viewpdf?dispub=10599339

Identifer	oai:union.ndltd.org:PROQUEST/oai:pqdtoai.proquest.com:10599339
Date	08 September 2017
Creators	Keller, Thomas Anderson
Publisher	University of California, San Diego
Source Sets	ProQuest.com
Language	English
Detected Language	English
Type	thesis

Page generated in 0.0019 seconds

Comparison and Fine-Grained Analysis of Sequence Encoders for Natural Language Processing

Description

Links & Downloads

Tags

Additional Fields