Automated Profiling-Based Zero-Day Malware Detection

Author

Chiho Kim

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science and Info Sys

Date of Award

Fall 2022

Abstract

Since the development of the automated malware variant toolkit, malware attacks have been growing and evolving more significantly, and more critical infrastructures and computer devices are being exposed to the attacks. The worse case is that writing new variants and families of malware utilizing the automated malware variant toolkits is getting handier and handier. At the same time, publishing observed patterns analyzed and suggested by experts manually in a conventional manner. Besides, attackers can evade existing detectors using malware signatures or machine learning (ML) models using various functions of the toolkits. Prior research has shown the promising performance of techniques of malware detection. However, ironically, they are bounded to the requirement of already-seen malware considering their flaw in its principle when training in supervised learning techniques. To overcome this issue, a body of study suggested a semi-supervised learning approach that would resolve it by classifying the unprecedented malware attacks as anomalies (i.e., zero-day malware detection). Unlike the expectations of the popular semi-supervised approaches, the preliminary research of this study shows the following limits in zero-day detection. First, the performance of one-class (OC) classification approaches is relatively far lower to be a good detector than the supervised approaches. Second, approaches using an autoencoder required an optimized threshold to be used as a detector, while they show more promising performance than OC approaches. Moreover, threshold selection always requires the usage of malware when training, which indicates that the approach relies on the quality of the malware dataset. This research tackles the issues of the prior research and suggests a novel detection architecture of a combined autoencoder and OC classifier. Thus, this architecture mainly focuses on benefiting from the advantages of the two approaches; abstraction by powerful abstraction using an autoencoder and replacement of its threshold selection with OC classification. The extended experiments using a public malware dataset (Meraz’18) report that the proposed model is effective with an accuracy of up to 96.0% in zero-day detection, which indicates that the proposed methods and the given supervised learners (forced to include reported malware classes when training) appears reasonably comparable for detecting zero-day malware. Also, this research simulates the zero-day variants of malware to compare the reliance on the proposed methods and that of the supervised models. In conclusion, the proposed methods yield better resilience in discriminating synthetic malware variants created by adversarial evasion attack tools than the supervised approaches.

Advisor

Jinoh Kim

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

Share

COinS