ANS 2025, February 22, 2025
Machine Learning vs. Linear Regression: A Case Study on Gender Sound Symbolism in Japanese Given Names by Alexander Kilpatrick (Nagoya University of Commerce and Business, Japan)
In onomastics, linear regression is commonly used to explore sound symbolism in names, often revealing consistent gender patterns across languages. However, when applied to Japanese given names, Poisson regression suggests atypical gender associations, with patterns that seemingly diverge from those observed in other languages. This study explores whether Japanese behaves differently to other languages or whether this observation comes from limitations in linear regression. Using a dataset of the 1,000 most common Japanese given names, we constructed two XGBoost models—one using phoneme counts and the other using mora counts—to analyze gender associations.
Our findings demonstrate that Poisson regression fails to account for the complex interactions and the influence of certain logographs, leading to misleading results. On the surface, Japanese names appear to defy common sound symbolism patterns, suggesting that high front vowels like /i/ are more common in male names, and back-of-mouth plosives like /k/ are more common in female names. However, our XGBoost models, particularly the mora-based model with an accuracy of 82.4%, successfully captured the nuanced interplay of phonetic and logographic elements, uncovering the underlying patterns. The analysis revealed that when accounting for specific logographs, Japanese names actually exhibit gender sound symbolism patterns similar to those in other languages. This
case study highlights the superior ability of machine learning to uncover intricate relationships in linguistic data, providing deeper insights into onomastic research and correcting misconceptions suggested by traditional methods. It demonstrates that Japanese names, once logographic influences are considered, follow universal sound symbolism principles.
Biography:
Alexander Kilpatrick, Associate Professor at Nagoya University of Commerce and Business, specializes in Psycholinguistics, Computational Linguistics, Iconicity, English as a Second Language, and the application of machine learning to uncover complex patterns in linguistic data, thereby enriching our comprehension of language acquisition, processing, and cognitive mechanisms.
No comments:
Post a Comment