Classification Trees with Synthetic Features

Document Type


Degree Name

Master of Science (MS)



Date of Award

Spring 2018


Trained synthetic features were used with classification and regression trees (CART) andboosting methods to predict outcomes of categorical response variables in general. The trainedsynthetic features involved were synthetic features (Zieba, Tomczak, & Tomczak, 2016),principal component analysis (PCA), zero-one regression (ZO), logistic regression (LS), lineardiscriminant analysis (LDA), robust fitting of linear models (RLM), least trimmed squares(LTS), naϊve Bayes (NBAY), and univariate spline (SPL) using the statistical software R. Toillustrate the trained synthetic features in this paper, they were applied to Polish companies' financial data, Fisher's Iris data, and skin lesion data. The objective of the research was to applytrained synthetic features to CART, stock boosting method that had been fitted with the syntheticfeatures at the root node, and synthetic boosting method that was reweighted and refitted thesynthetic features at each iteration, to improve on predictive accuracy for classes in a given dataset rather than random guessing based on the prior probabilities.


Thomas Boucher

Subject Categories

Mathematics | Physical Sciences and Mathematics