Content area
Abstract
Due to the popularity of the Android platform and operating system, malware targeting the platform has been increasingly seen in the wild. As a result, research to combat this rising issue grows in tandem. Defense research has been done in a variety of ways, but little research has been done in visualization as a tool for feature extraction to detect malicious applications. Machine learning can aid visualization by automatically identifying common patterns in the images of malicious applications. The goal of this thesis is to combine the fields of computer vision and machine learning to create a classification system which can accurately detect malware in Android applications.
In this thesis, we use visualization of application binaries and machine learning to detect and classify malicious applications. Our method of analysis is to extract Android Dex files, converting byte-to-byte into a bitmap image file, and then extract Haralick features. From these Haralick features, we generated a dataset of 4,000 samples. Split in half between malicious and non-malicious applications, we use Weka for a variety of Machine Learning algorithms to determine classification accuracy.
Analysis of an Android application's Dalvik bytecode has resulted in a determination of either benign or malicious content in said application. Our method has resulted in, at best, being able to correctly classify 74% of samples. This shows evidence that future work may be able to enhance classification rate to real-world potential by improving methods established in this thesis.