Return to search

An Exploration of the Word2vec Algorithm: Creating a Vector Representation of a Language Vocabulary that Encodes Meaning and Usage Patterns in the Vector Space Structure

This thesis is an exloration and exposition of a highly efficient shallow neural network algorithm called word2vec, which was developed by T. Mikolov et al. in order to create vector representations of a language vocabulary such that information about the meaning and usage of the vocabulary words is encoded in the vector space structure. Chapter 1 introduces natural language processing, vector representations of language vocabularies, and the word2vec algorithm. Chapter 2 reviews the basic mathematical theory of deterministic convex optimization. Chapter 3 provides background on some concepts from computer science that are used in the word2vec algorithm: Huffman trees, neural networks, and binary cross-entropy. Chapter 4 provides a detailed discussion of the word2vec algorithm itself and includes a discussion of continuous bag of words, skip-gram, hierarchical softmax, and negative sampling. Finally, Chapter 5 explores some applications of vector representations: word categorization, analogy completion, and language translation assistance.

Identiferoai:union.ndltd.org:unt.edu/info:ark/67531/metadc849728
Date05 1900
CreatorsLe, Thu Anh
ContributorsCherry, William, 1966-, Ross, John Robert, 1938-, Fishman, Lior, 1964-
PublisherUniversity of North Texas
Source SetsUniversity of North Texas
LanguageEnglish
Detected LanguageEnglish
TypeThesis or Dissertation
Formatv, 49 pages : illustrations, Text
RightsPublic, Le, Thu Anh, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved.

Page generated in 0.0024 seconds