1 Billion Words for NLP

This talk was a preview of the Theano code (and the ideas behind word-embedding) that I plan to release soon (i.e. so that people can play around with it before May-2015).

A working solution to the Billion Word Imputation challenge will hopefully appear on my GitHub account shortly (was planned for 15-Jan-2015, still in process).

Research Links

Key papers to have a look at :

GloVe - Global Vectors for Word Representation - (Pennington, Socher, Manning 2014), which I wrote up here.
Mikolov 2012 and the Word2Vec code, and also a nice Python version, with interesting optimization talk.
Mnih approach

Presentation Link

I recently gave a presentation about this project to the Singapore PyData MeetUp Group.

If there are any questions about the presentation, please ask below, or via the Facebook group.

Research Links#

Presentation Link#

Research Links

Presentation Link