- Published on
1 Billion Words for NLP
- Authors
- Name
- Martin Andrews
- @mdda123
This talk was a preview of the Theano code (and the ideas behind word-embedding) that I plan to release soon (i.e. so that people can play around with it before May-2015).
A working solution to the Billion Word Imputation challenge will hopefully appear on my GitHub account shortly (was planned for 15-Jan-2015, still in process).
Research Links
Key papers to have a look at :
GloVe - Global Vectors for Word Representation - (Pennington, Socher, Manning 2014), which I wrote up here.
Mikolov 2012 and the Word2Vec code, and also a nice Python version, with interesting optimization talk.
Presentation Link
I recently gave a presentation about this project to the Singapore PyData MeetUp Group.
If there are any questions about the presentation, please ask below, or via the Facebook group.