This study aims to evaluate the impact of different binary classification methods for credit ratings on the overall performance of the model and to identify the optimal classification threshold. Four advanced machine learning models are employed: Random Forest (RF), Gradient Boosting Tree (GBT), Stacked Ensemble Model (SEM), and Voting Ensemble Model (VEM). To assess the performance of these models, SAFE metrics, based on the Rank Graduation Box (RGB) approach, are introduced to comprehensively measure model performance. Credit ratings are categorized into five binary classification schemes, wherein specific ratings from D to BBB are designated as high-risk (value = 1), while the remainder are classified as low-risk (value = 0). A systematic comparison of the models' performance under different classification schemes is conducted using RGA, RGR, RGE, and RGF metrics, alongside traditional measures such as AUC, feature importance, and SHAP values. This research seeks to identify the classification method that most effectively explains the credit risk and its corresponding optimal threshold. The experimental results not only reveal the effectiveness of different classification methods, but also provide a theoretical basis for selecting the best threshold, thereby offering a more reliable and interpretable framework for credit risk assessment.
SAFE Ensemble models to classify credit ratings
Giudici, Paolo
2025-01-01
Abstract
This study aims to evaluate the impact of different binary classification methods for credit ratings on the overall performance of the model and to identify the optimal classification threshold. Four advanced machine learning models are employed: Random Forest (RF), Gradient Boosting Tree (GBT), Stacked Ensemble Model (SEM), and Voting Ensemble Model (VEM). To assess the performance of these models, SAFE metrics, based on the Rank Graduation Box (RGB) approach, are introduced to comprehensively measure model performance. Credit ratings are categorized into five binary classification schemes, wherein specific ratings from D to BBB are designated as high-risk (value = 1), while the remainder are classified as low-risk (value = 0). A systematic comparison of the models' performance under different classification schemes is conducted using RGA, RGR, RGE, and RGF metrics, alongside traditional measures such as AUC, feature importance, and SHAP values. This research seeks to identify the classification method that most effectively explains the credit risk and its corresponding optimal threshold. The experimental results not only reveal the effectiveness of different classification methods, but also provide a theoretical basis for selecting the best threshold, thereby offering a more reliable and interpretable framework for credit risk assessment.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


