German Word Embeddings
Pretrained and dockerized GloVe, Word2Vec & fastText
GloVe, Word2Vec, and fastText embeddings
We're sharing our code to easily train word embeddings and the German embeddings derived from the Wikipedia corpus.
At deepset we are passionate supporters and active members of the open source community. We believe that in the field of machine learning being open and transparent about our findings is the only way to go. This in turn should lead us towards innovative, transparent and responsible AI.
To the best of our knowledge, these are the first German GloVe embeddings being published.
Features
- Dockerized models with a straightforward config via docker-compose.yml that allow simple training on EC2
- Preprocessing of Wiki corpus
- Crawling, preprocessing and mixing of different corpora (coming soon)