Global ETD Search

Return to search

Exponential Family Embeddings

Word embeddings are a powerful approach for capturing semantic similarity among terms in a vocabulary. Exponential family embeddings extend the idea of word embeddings to other types of high-dimensional data. Exponential family embeddings have three ingredients; embeddings as latent variables, a predefined conditioning set for each observation called the context and a conditional likelihood from the exponential family. The embeddings are inferred with a scalable algorithm. This thesis highlights three advantages of the exponential family embeddings model class: (A) The approximations used for existing methods such as word2vec can be understood as a biased stochastic gradients procedure on a specific type of exponential family embedding model --- the Bernoulli embedding. (B) By choosing different likelihoods from the exponential family we can generalize the task of learning distributed representations to different application domains. For example, we can learn embeddings of grocery items from shopping data, embeddings of movies from click data, or embeddings of neurons from recordings of zebrafish brains. On all three applications, we find exponential family embedding models to be more effective than other types of dimensionality reduction. They better reconstruct held-out data and find interesting qualitative structure. (C) Finally, the probabilistic modeling perspective allows us to incorporate structure and domain knowledge in the embedding space. We develop models for studying how language varies over time, differs between related groups of data, and how word usage differs between languages. Key to the success of these methods is that the embeddings share statistical information through hierarchical priors or neural networks. We demonstrate the benefits of this approach in empirical studies of Senate speeches, scientific abstracts, and shopping baskets.

https://doi.org/10.7916/D8NZ9RHT

Computer science

Exponential families (Statistics)

Embeddings (Mathematics)

Identifer	oai:union.ndltd.org:columbia.edu/oai:academiccommons.columbia.edu:10.7916/D8NZ9RHT
Date	January 2018
Creators	Rudolph, Maja
Source Sets	Columbia University
Language	English
Detected Language	English
Type	Theses

Page generated in 0.0018 seconds

Exponential Family Embeddings

Description

Links & Downloads

Tags

Additional Fields