Full text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Japanese adverbs are difficult to classify, with little progress made since the 1930s. Now in the age of large language models, linguists need a framework for lexical grouping that incorporates quantitative, evidence-based relationships rather than purely theoretical categorization. We herein address this need for the case of Japanese adverbs by developing a semantic positioning approach that incorporates large language model embeddings with fuzzy set theory to achieve empirical Japanese adverb groupings. To perform semantic positioning, we (i) obtained multi-dimensional embeddings for a list of Japanese adverbs using a BERT or RoBERTa model pre-trained on Japanese text, (ii) reduced the dimensionality of each embedding by principle component analysis (PCA), (iii) mapped the relative position of each adverb in a 3D plot using K-means clustering with an initial cluster count of n=3, (iv) performed silhouette analysis to determine the optimal cluster count, (v) performed PCA and K-means clustering on the adverb embeddings again to generate 2D semantic position plots, then finally (vi) generated a centroid distance matrix. Fuzzy set theory informs our workflow at the embedding step, where the meanings of words are treated as quantifiable vague data. Our results suggest that Japanese adverbs optimally cluster into n=4 rather than n=3 groups following silhouette analysis. We also observe a lack of consistency between adverb semantic positions and conventional classification. Ultimately, 3D/2D semantic position plots and centroid distance matrices were simple to generate and did not require special hardware. Our novel approach offers advantages over conventional adverb classification, including an intuitive visualization of semantic relationships in the form of semantic position plots, as well as a quantitative clustering “fingerprint” for Japanese adverbs that express vague language data as a centroid distance matrix.

Details

Title
Semantic Positioning Model Incorporating BERT/RoBERTa and Fuzzy Theory Achieves More Nuanced Japanese Adverb Clustering
Author
Odle, Eric 1   VIAFID ORCID Logo  ; Yun-Ju Hsueh 2 ; Pei-Chun, Lin 3   VIAFID ORCID Logo 

 Department of Natural History Sciences, Graduate School of Science, Hokkaido University, Sapporo 060-0810, Japan 
 Department of Foreign Languages and Applied Linguistics, Yuan Ze University, Taoyuan City 320315, Taiwan; [email protected] 
 Department of Information Engineering and Computer Science, Feng Chia University, Taichung City 407102, Taiwan; [email protected] 
First page
4185
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2876434378
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.