Principal Component Analysis for Training Data Distillation and Augmentation
Document Type
Thesis
Degree Name
Master of Science (MS)
Department
Mathematics
Date of Award
Summer 2023
Abstract
The process of training is an important step in machine learning methods. In some cases, there is a shortage of training data. That is why, training data augmentations is an important field of research development. In this thesis, we applied the well-known method of Principal Component Analysis (PCA) as a data distillation tool. We used distilled data for training data augmentation. Further, we demonstrate that augmenting training data with PCA-distilled data increases classification statistics. To do so, we used three machine learning methods (Neural Network (NN), Logistic Regression Model (LR), and Support Vector Machine (SVM) to classify four different numeric datasets such as Skin Lesion (SL), Diabetes (D), Heart Disease (HD), and Breast Cancer (BC ). The experimental results, shown in the thesis, validate the statement ”Augmenting the training data with PCA distilled data, increases classification statistics”.
Advisor
Nikolay Sirakov
Subject Categories
Computer Sciences | Physical Sciences and Mathematics
Recommended Citation
Shahnewaz, Tahsin, "Principal Component Analysis for Training Data Distillation and Augmentation" (2023). Electronic Theses & Dissertations. 1113.
https://digitalcommons.tamuc.edu/etd/1113