Content area

Abstract

One metric used to measure classification performance in machine learning is F-beta score. The objective in this thesis is to improve the average F-b score computed in classifying shark data into shark behaviors, namely; Resting, Swimming, Feeding, and Non-Directed Motion (NDM). Synthetic Minority Oversampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) are utilized to balance the data, from which pre-processed Fast Fourier Transform (FFT), Walsh-Hadamard Transform (WHT), and Autocorrelation (AC) features are extracted then classified using Convolutional Neural Network (CNN) and K-Nearest Neighbors (K-NN). All the combinations of the two balancing techniques, the three feature types, and the two machine learning algorithms are applied then compared to examine the average F-beta score improvement. Other signal processing techniques are also applied, to reduce the noise level of the recorded raw shark data and enhance its Signal-to-Noise Ratio (SNR).

The average F-beta scores showed that K-NN performed at its best when using FFT-only features while CNN performed at its best when using WHT-FFT features. In the K-NN case, FFT performed better when it was used alone than when it was combined with any other feature type. On the other hand, WHT performed better when it was combined with any other feature type than when it was used alone. In the CNN case, WHT and FFT performed better together than they did separately. In other words, Combining FFT and WHT features in CNN resulted in considerably improved average F-beta score, while combining them in K-NN averaged their scores. Also, whether alone or combined with other feature types, AC did not work well in CNN as it resulted in poor average F-beta scores. In K-NN, combining AC with other feature types did not improve the average F-beta score from when it is used alone.

The average F-beta scores also showed that reducing the data imbalance nature during the pre-processing phase is more effective than mitigating the misleading classification during the machine learning phase. Prior balancing was performed using SMOTE and ADASYN, while later mitigation was performed using weight-sensitive learning. SMOTE, more so ADASYN, reduced the difference between precision and recall scores, and produced higher F-beta scores.

Besides the mentioned two balancing techniques, the three feature types, and the two machine learning algorithms, other pre-processing techniques that were applied to the raw data contributed to the improvement of the average F-beta score. These pre-processing techniques included framing, detrending, normalization, Ensemble Average (EA) based low-pass filtering, filter delay compensation, overlap windowing, and k-fold cross validation. For example, the average F-beta scores showed that applying EA-based low-pass filters (LPF) on the data, prior to machine learning and classification, improves Signal Power to Noise Power Ratio (SNR), and sequentially improves average F-beat scores significantly.

As an end result, for the shark data used in this thesis, CNN was found to be a better choice than K-NN, and it was a better choice when using WHT-FFT as features and ADASYN as balancing technique.

Details

1010268
Title
Improving F-Beta Score in Classifying Shark Data Into Shark Behaviors
Number of pages
99
Publication year
2024
Degree date
2024
School code
0047
Source
DAI-B 85/11(E), Dissertation Abstracts International
ISBN
9798382748450
Advisor
Committee member
Chugunova, Marina; Yang, Yu; Peng, Qidi
University/institution
The Claremont Graduate University
Department
Institute of Mathematical Sciences
University location
United States -- California
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31146096
ProQuest document ID
3064860928
Document URL
https://www.proquest.com/dissertations-theses/improving-f-beta-score-classifying-shark-data/docview/3064860928/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic