Abstract
Large-scale word association datasets are both important tools used in psycholinguistics and used as models that capture meaning when considered as semantic networks. Here, we present word association norms for Rioplatense Spanish, a variant spoken in Argentina and Uruguay. The norms were derived through a large-scale crowd-sourced continued word association task in which participants give three associations to a list of cue words. Covering over 13,000 words and +3.6 M responses, it is currently the most extensive dataset available for Spanish. We compare the obtained dataset with previous studies in Dutch and English to investigate the role of grammatical gender and studies that used Iberian Spanish to test generalizability to other Spanish variants. Finally, we evaluated the validity of our data in word processing (lexical decision reaction times) and semantic (similarity judgment) tasks. Our results demonstrate that network measures such as in-degree provide a good prediction of lexical decision response times. Analyzing semantic similarity judgments showed that results replicate and extend previous findings demonstrating that semantic similarity derived using spreading activation or spectral methods outperform word embeddings trained on text corpora.