You can browse the resources for all languages here. For links to specific languages, see the table below. The links will take you to a Google drive folder, where for any given language you will find…
fastText
This contains:
fasttext_transform_[]wiki_[threshold].rds
which is the
300 x 300 transformation matrix for the fastText embeddings with the
specified minimum frequency thresholdfasttext_vectors_[]wiki.vec
which is the underlying
fastText embedding matrix (of dimensions vocabulary size x 300)fasttext_model_[]wiki.bin
which is “our” fastText
model, trained on the relevant language Wikipedia (rather than Common
Crawl). Specifically, this file contains the subword model information
that can be used to obtain embeddings for out-of-sample terms.gloVe
This contains:
glove_transform_[]wiki.rds
which is the 300 x 300
transformation matrix for the GloVe embeddingsglove_vectors_[]wiki.txt
which is the underlying GloVe
embedding matrix (of dimensions vocabulary size x 300)If you use these resources, please cite this paper:
Wirsching EM, Rodriguez PL, Spirling A, Stewart BM. Multilanguage Word Embeddings for Social Scientists: Estimation, Inference, and Validation Resources for 157 Languages. Political Analysis. Published online 2024:1-8. doi:10.1017/pan.2024.17