Principal Component Analysis for Training Data Distillation and Augmentation

Document Type

Thesis

Degree Name

Master of Science (MS)

Department

Mathematics

Date of Award

Summer 2023

Abstract

The process of training is an important step in machine learning methods. In some cases, there is a shortage of training data. That is why, training data augmentations is an important field of research development. In this thesis, we applied the well-known method of Principal Component Analysis (PCA) as a data distillation tool. We used distilled data for training data augmentation. Further, we demonstrate that augmenting training data with PCA-distilled data increases classification statistics. To do so, we used three machine learning methods (Neural Network (NN), Logistic Regression Model (LR), and Support Vector Machine (SVM) to classify four different numeric datasets such as Skin Lesion (SL), Diabetes (D), Heart Disease (HD), and Breast Cancer (BC ). The experimental results, shown in the thesis, validate the statement ”Augmenting the training data with PCA distilled data, increases classification statistics”.

Advisor

Nikolay Sirakov

Subject Categories

Computer Sciences | Physical Sciences and Mathematics

Share

COinS