You can browse the resources for all languages here. For links to specific languages, see the table below. The links will take you to a Google drive folder, where for any given language you will find…
a subfolder called fastText
This contains:
fasttext_transform_[]wiki_[threshold].rds
which is the 300 x 300 transformation matrix for the fastText embeddings with the specified minimum frequency thresholdfasttext_vectors_[]wiki.vec
which is the underlying fastText embedding matrix (of dimensions vocabulary size x 300)fasttext_model_[]wiki.bin
which is "our" fastText model, trained on the relevant language Wikipedia (rather than Common Crawl). Specifically, this file contains the subword model information that can be used to obtain embeddings for out-of-sample terms.
a subfolder called gloVe
This contains:
glove_transform_[]wiki.rds
which is the 300 x 300 transformation matrix for the GloVe embeddingsglove_vectors_[]wiki.txt
which is the underlying GloVe embedding matrix (of dimensions vocabulary size x 300)
References
If you use these resources, please cite this paper:
Wirsching EM, Rodriguez PL, Spirling A, Stewart BM. Multilanguage Word Embeddings for Social Scientists: Estimation, Inference, and Validation Resources for 157 Languages. Political Analysis. Published online 2024:1-8. doi:10.1017/pan.2024.17