Content area
Human Activity Recognition (HAR) plays a crucial role in identifying and digitizing human behaviors. Among various approaches, sound-based HAR offers distinct advantages, such as overcoming visual limitations and enabling recognition in diverse environments. This study introduces an innovative application of sound segmentation with SegNet, originally designed for image segmentation, to sound-based HAR. Traditionally, labeling sound data has been challenging due to its limited scope, often restricted to specific events or time frames. To address this issue, a novel labeling approach was developed, allowing detailed annotations across the entire temporal and frequency domains. This method facilitates the use of SegNet, which requires pixel-level labeling for accurate segmentation, leading to more granular and explainable activity recognition. A dataset comprising six distinct human activities—speech, groaning, screaming, coughing, toilet and snoring—was constructed to enable comprehensive evaluation. The trained neural network, utilizing this annotated dataset, achieved F1 scores ranging from 0.68 to 0.95. The model’s practical applicability was further validated through recognition tests conducted in a professional office environment. This study presents a novel framework for quantifying daily human activities through sound segmentation, contributing to advancements in intelligent system technology.
Details
; Yoo, Byounghyun 2
1 Department of Artificial Intelligence, Jeju National University , 102 Jejudaehak-ro, Jeju-si, Jeju Special Self-Governing Province, 63243 , South Korea
2 Intelligence and Interaction Research Center, Korea Institute of Science and Technology , 5 Hwarangro14-gil, Seongbuk-gu, Seoul 02792 , South Korea
