Principal Component Analysis for Training Data Distillation and Augmentation
Master of Science (MS)
Date of Award
The process of training is an important step in machine learning methods. In some cases, there is a shortage of training data. That is why, training data augmentations is an important field of research development. In this thesis, we applied the well-known method of Principal Component Analysis (PCA) as a data distillation tool. We used distilled data for training data augmentation. Further, we demonstrate that augmenting training data with PCA-distilled data increases classification statistics. To do so, we used three machine learning methods (Neural Network (NN), Logistic Regression Model (LR), and Support Vector Machine (SVM) to classify four different numeric datasets such as Skin Lesion (SL), Diabetes (D), Heart Disease (HD), and Breast Cancer (BC ). The experimental results, shown in the thesis, validate the statement ”Augmenting the training data with PCA distilled data, increases classification statistics”.
Computer Sciences | Physical Sciences and Mathematics
Shahnewaz, Tahsin, "Principal Component Analysis for Training Data Distillation and Augmentation" (2023). Electronic Theses & Dissertations. 1113.