Global ETD Search

Return to search

An Exploration of the Word2vec Algorithm: Creating a Vector Representation of a Language Vocabulary that Encodes Meaning and Usage Patterns in the Vector Space Structure

This thesis is an exloration and exposition of a highly efficient shallow neural network algorithm called word2vec, which was developed by T. Mikolov et al. in order to create vector representations of a language vocabulary such that information about the meaning and usage of the vocabulary words is encoded in the vector space structure. Chapter 1 introduces natural language processing, vector representations of language vocabularies, and the word2vec algorithm. Chapter 2 reviews the basic mathematical theory of deterministic convex optimization. Chapter 3 provides background on some concepts from computer science that are used in the word2vec algorithm: Huffman trees, neural networks, and binary cross-entropy. Chapter 4 provides a detailed discussion of the word2vec algorithm itself and includes a discussion of continuous bag of words, skip-gram, hierarchical softmax, and negative sampling. Finally, Chapter 5 explores some applications of vector representations: word categorization, analogy completion, and language translation assistance.

Word2vec

neural networks

vector representation of language

skip-gram

continuous bag of words

Neural networks (Computer science)

Computational linguistics.

Identifer	oai:union.ndltd.org:unt.edu/info:ark/67531/metadc849728
Date	05 1900
Creators	Le, Thu Anh
Contributors	Cherry, William, 1966-, Ross, John Robert, 1938-, Fishman, Lior, 1964-
Publisher	University of North Texas
Source Sets	University of North Texas
Language	English
Detected Language	English
Type	Thesis or Dissertation
Format	v, 49 pages : illustrations, Text
Rights	Public, Le, Thu Anh, Copyright, Copyright is held by the author, unless otherwise noted. All rights Reserved.

Page generated in 0.1687 seconds

An Exploration of the Word2vec Algorithm: Creating a Vector Representation of a Language Vocabulary that Encodes Meaning and Usage Patterns in the Vector Space Structure

Description

Links & Downloads

Tags

Additional Fields