Content area

Abstract

Clickbait headlines, designed to entice readers with sensationalized or misleading content, pose significant challenges in the digital landscape. They exploit curiosity to generate traffic and revenue, often at the cost of spreading misinformation and undermining the credibility of online content. Identifying clickbait is essential for improving the quality of information consumed, fostering trust in digital media, and enabling users to make informed decisions. This study advances Hebrew clickbait detection through deep learning approaches and comprehensive data augmentation strategies, targeting the unique challenges of processing a low-resource language. Building on prior research that achieved an accuracy of 87% using traditional machine learning methods, this work explores the potential of BERT-based models and diverse augmentation techniques to further enhance performance. Our experiments incorporated a variety of augmentation methods, including weak supervision, substitution-based methods, generative techniques and language-based methods, applied to state-of-the-art Hebrew language models. The results highlight that targeted augmentation strategies, particularly those focusing on word-level replacements and contextual enhancements, consistently improved model performance. Our top-performing configuration achieved an accuracy of 92%, surpassing traditional machine learning benchmarks. These study results can be applied in real-world systems to automatically detect and reduce clickbait in Hebrew digital media, supporting news websites and social platforms in improving content quality and user trust. Furthermore, it provides a replicable framework for tackling similar challenges in other underrepresented languages, highlighting the transformative potential of combining advanced deep learning methods with tailored data augmentation strategies.

Full text

Turn on search term navigation

© 2025 Natanya, Liebeskind. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.