You can browse the resources for all languages here. For links to specific languages, see the table below. The links will take you to a Google drive folder, where for any given language you will find…
fastTextThis contains:
fasttext_transform_[]wiki_[threshold].rds which is the
300 x 300 transformation matrix for the fastText embeddings with the
specified minimum frequency thresholdfasttext_vectors_[]wiki.vec which is the underlying
fastText embedding matrix (of dimensions vocabulary size x 300)fasttext_model_[]wiki.bin which is “our” fastText
model, trained on the relevant language Wikipedia (rather than Common
Crawl). Specifically, this file contains the subword model information
that can be used to obtain embeddings for out-of-sample terms.gloVeThis contains:
glove_transform_[]wiki.rds which is the 300 x 300
transformation matrix for the GloVe embeddingsglove_vectors_[]wiki.txt which is the underlying GloVe
embedding matrix (of dimensions vocabulary size x 300)If you use these resources, please cite this paper:
Wirsching EM, Rodriguez PL, Spirling A, Stewart BM. Multilanguage Word Embeddings for Social Scientists: Estimation, Inference, and Validation Resources for 157 Languages. Political Analysis. Published online 2024:1-8. doi:10.1017/pan.2024.17