Article Summary: Intelligent OS X malware threat detection with code inspection

Mac vs Windows: a comparative tale as old as time. There are benefits to each system but what if one of those benefits, considered common, is inherently wrong? There is often a misconception from the general public that Apple Mac-based systems are more secure and cannot get or suffer from malware. In the work conducted by researchers Pajouh et al. in “Intelligent OS X malware threat Detection with Code Inspection”, it is inferred that Mac systems can also be exposed to malware but lack sufficient research on the matter. The authors of this work propose the development of a machine learning model that leverages aspects of the Support Vector Machine (SVM) technique to detect OS X-based malware.

The first questions the researchers answer is regarding the categorization of the related malware. They develop a novel measure to differentiate between 450 benign samples and 150 malware samples, focusing on the frequency of library calls made across the sample set. With the datasets created and correctly distinguished through library call weighting and feature selection, 5 classification techniques are evaluated: Nave Bayes, Bayesian Net, Multi-Layer Perceptron (MLP), Decision Tree-J48, and Weighted Radial Basis Function Kernels-based Support Vector Machine (Weighted-RBFSVM). From these evaluations, it is determined that Weighted-RBFSVM yields the highest accuracy at 91% with a 3.9% false alarm rate. The next question the authors answer, as if reading the mind of the reader, relates to the sample size of the datasets, are they sufficient? Are the machine learning models able to truly identify between benign and malware samples having been trained and tested against a small dataset? The researchers addressed this concern by implementing the Synthentic Minority Over-Sampling Technique (SMOTE) to develop datasets of double size, triple size, and quintuple size of the original dataset.[1] The evaluation results of the classification models against these larger datasets indicate that Weighted-RBFSVM performs with greater accuracy, improving the result to 96.62% with a 4% false alarm rate. The authors conclude that for OS X platform-based malware, leveraging the Weighted-RBFSVM classification method yields greater accuracy in detection.

The strengths of this work stem from the validation and verifications steps conducted. Additional measures were taken by creating a scale to distinguish between samples and understand how accuracy performance may shift when the dataset is scaled up. Given the robust nature of the initial work done, it is clear, that while the initial dataset was smaller, the impact to the accuracy of the classification technique did not weaken but rather improved when trained with a larger dataset. The weaknesses of the work derive from a lack of insight into the OS source used to conduct testing; it is not indicated across which Mac OS X platforms these malware samples were pulled from. If the malware samples are sourced from earlier versions of Mac operating systems, have the samples evolved for newer versions of the systems? How do the classification methods fair, when new data samples are introduced?

Pajouh, H.H, et al. “Intelligent OS X Malware Threat Detection with Code Inspection.” 2018, https://doi.org/10.1007/s11416-017-0307-5.