PHOR-in-One: A multilingual lexical database with PHonological, ORthographic and PHonographic word similarity estimates in four languages

Abstract

A large body of research seeking to explore how form affects lexical processing in bilinguals has suggested that orthographically similar translations (e.g., English-Portuguese “paper-papel”) are responded to more quickly and accurately than words with little to no overlap (e.g., English-Portuguese “house-casa”). One of the most prominent algorithms to estimate orthographic similarity, the normalized Levenshtein distance (NLD), returns an index of the proportion of identical characters of two strings, and is an efficient and invaluable tool for the selection, manipulation, and control of verbal stimuli. Notwithstanding its many advantages for second-language research, the absence of a comparable measure for phonology has resulted in the adoption of different strategies to assess the degree of interlanguage phonological similarity across the literature, with profound implications for the interpretation of results on the relative role of orthographic and phonological similarity in bilingual lexical access. In the present work, we introduce PHOR-in-One, a multilingual lexical database with a set of phonological and orthographic NLD estimates for 6160 translation equivalents in American and British English, European Portuguese, German and Spanish in a total of 30,800 words. We also propose a new measure of phonographic NLD, a pooled index of orthographic and phonological similarity, particularly useful for researchers interested in controlling for and/or manipulating both estimates at once. PHOR-in-One includes a comprehensive characterization of its lexical entries, namely Part-of-Speech-dependent and independent frequency counts, number of letters and phonemes, and phonetic transcription. PHOR-in-One is thus a valuable tool to support bilingual and multilingual research.

Read the full article ›