Document Type
Honors Thesis
Date of Award
Spring 2024
Abstract
Variance-based sensitivity analysis serves as a crucial tool for assessing the variability of inputs on the output of complex mathematical models. In this thesis, we study Sobol indices, a class of variance-based sensitivity analysis, to quantify the importance of each input variable on the overall variability of the model output. We initially discuss Sobol’s first and total order indices. This includes a brief demonstration through two examples: the Sobol G-function and a polynomial function each with six input variables. These examples serve to highlight the theoretical foundations and practical applications of Sobol’s indices in analyzing model sensitivities. Mainly, we apply Sobol’s method within the framework of a regression model to assess the importance of various features (also known as predictors) in predicting total medical expenses. Our findings reveal that ‘smoking status’ emerged as the most important features impacting health insurance charges, followed by ‘age’ and ‘bmi’ as the second and third most important features, respectively. This application not only demonstrates the effectiveness of Sobol’s indices in real-world actuarial scenarios but also provides a clear hierarchy of factors affecting health insurance premiums. In summary, this study aims to implement a variance-based sensitivity method to select the most influential features, suggesting possible model simplifications and providing insights that could improve decision-making processes in health insurance modeling.
Advisor
Nahid Hasan
Recommended Citation
Spiller, Jakob, "Features Selection in Regression Models Using Variance-based Sensitivity Analysis" (2024). Honors Theses. 235.
https://digitalcommons.tamuc.edu/honorstheses/235
Keywords
Sobol’s indices; health insurance data; actuarial science; variable importance; regression model