GloVe - Global Vectors for Word Representation - Pennington, Socher, Manning 2014

This write-up contains my first impressions of the paper : GloVe - Global Vectors for Word Representation - (Pennington, Socher, Manning 2014).

Paper Background

Given the surprising success of the Mnih approach, this paper explores the essence of the word embedding problem, and thoughtfully explains how it can be tackled in a rigerous ‘analytical’ manner.

By proving a result concerning O(V) vs O(C) bounds, the authors show that their method is more tractable than would otherwise be imagined, and go on to demonstrate excellent word embedding results, generated more quickly than alternatives.

Surprising stuff

This seems to have boiled away the word-embedding task to its bare minimum (except for the papers to be published next year, of course!)
Open Source (Apache 2) code made available. This is well written code (particularly since it’s bleeding-edge research).
Briefly on front page of Hacker News, but didn’t last long, unfortunately.

As I wrote on the HN page :

The most remarkable take-away from this whole genre of word-embedding is that just by doing ‘dumb averages’ of word contexts and then optimizing the ‘vector[word]’ on the input (and output sides), you end up with a SEMANTIC understanding of the English language in the word vectors.

This paper is the latest in the series (across multiple researchers), and seems to boil the task down to its bare minimum : Just a raw least-squares optimization works. And instead of the ‘linguistic knowledge’ being smuggled into the problem set-up increasing (initially, people used tree-embeddings, and WordNet bootstrapping, in the 2003 papers), this is getting rid of almost all structure. And ending up with better results.

So, instead of semantics being a naturally very deep problem, apparently common sense understanding can be derived from surface statistics. IMHO, more people should be excited about this (from an AI standpoint).

Ideas to Follow-Up

Seems like the core glove.c is ripe for rewriting as OpenCL, and benchmarking…

Also seems like a Theano implementation would be interesting.

Currently using this as the basic word-vector generator for the Kaggle 1-billion word challenge.

Paper Background#

Surprising stuff#

Ideas to Follow-Up#

Paper Background

Surprising stuff

Ideas to Follow-Up