We use cookies to provide the best site experience.
Ok, don't show again
Close
 
Open Source
German Word Embeddings

Pretrained and dockerized GloVe, Word2Vec & fastText
We at deepset are passionate supporters and active members of the open-source community. Especially, in the field of machine learning we value openness and believe that this is the path towards innovative, transparent and responsible AI.

As a small contribution, we are sharing today our code to easily train word embeddings. In addition, we publish German embeddings derived on the Wikipedia Corpus. As far as we know, these are the first published german GloVe embeddings.

Enjoy!

Code for Models
- GloVe
- Word2Vec
- fastText

Features
- Dockerized models with straightforward config via docker-compose.yml that allow simple training on EC2
- Preprocessing of Wiki corpus
- Crawling, preprocessing and mixing of different corpora (coming soon)

Trained embeddings (German Wikipedia):
- GloVe: Vectors and Vocab
- Word2Vec: Vectors and Vocab
- fastText (needs fastText installed)