Principal Component Analysis for Training Data Distillation and Augmentation

Document Type


Degree Name

Master of Science (MS)



Date of Award

Summer 2023


The process of training is an important step in machine learning methods. In some cases, there is a shortage of training data. That is why, training data augmentations is an important field of research development. In this thesis, we applied the well-known method of Principal Component Analysis (PCA) as a data distillation tool. We used distilled data for training data augmentation. Further, we demonstrate that augmenting training data with PCA-distilled data increases classification statistics. To do so, we used three machine learning methods (Neural Network (NN), Logistic Regression Model (LR), and Support Vector Machine (SVM) to classify four different numeric datasets such as Skin Lesion (SL), Diabetes (D), Heart Disease (HD), and Breast Cancer (BC ). The experimental results, shown in the thesis, validate the statement ”Augmenting the training data with PCA distilled data, increases classification statistics”.


Nikolay Sirakov

Subject Categories

Computer Sciences | Physical Sciences and Mathematics