1st Edition
by Tanya Kolosova (Author), Samuel Berestizhevsky (Author)
AI framework intended to solve a problem of bias-variance tradeoff
for supervised learning methods in real-life applications. The AI
framework comprises of bootstrapping to create multiple training and
testing data sets with various characteristics, design and analysis of
statistical experiments to identify optimal feature subsets and optimal
hyper-parameters for ML methods, data contamination to test for the
robustness of the classifiers.
Key Features:
- Using ML methods by itself doesn’t ensure building classifiers that generalize well for new data
- Identifying
optimal feature subsets and hyper-parameters of ML methods can be
resolved using design and analysis of statistical experiments
- Using
a bootstrapping approach to massive sampling of training and tests
datasets with various data characteristics (e.g.: contaminated training
sets) allows dealing with bias
- Developing of
SAS-based table-driven environment allows managing all meta-data related
to the proposed AI framework and creating interoperability with R
libraries to accomplish variety of statistical and machine-learning
tasks
- Computer programs in R and SAS that create AI framework are available on GitHub