Prioritizing Variables for Network Traffic Analysis

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer Science and Info Sys

Date of Award

Spring 2020

Abstract

Feature selection is important to reduce the learning complexity by eliminating variables, particularly for a massive, high dimensional dataset like network traffic data. In reality, however, it is not an easy task to effectively perform the variable selection despite the availability of the existing selection techniques. From my initial experiments, I observed that the existing selection techniques produce different sets of variables even under the same condition (e.g., a static size for the resulted in set). In addition, individual selection techniques perform inconsistently, sometimes showing better performance but sometimes worse than others, thereby simply relying on one of them would be risky for building models using the selected variables. More critically, it is demanding to automate the selection process to some extent because it needs laborious endeavors with intensive studies by a group of experts otherwise. In this research, I will explore challenges in the automated variable selection with the application of network anomaly detection. In particular, I intend to develop an ensemble approach that benefits from the existing variable selection techniques by incorporating them. In addition, I will also investigate a method to determine the number of variables for the reduced feature set. Finally, the developed methods and algorithms will be evaluated using recent public network datasets.

Advisor

Jinoh Kim

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

Share

COinS