系统管理学报 ›› 2024, Vol. 33 ›› Issue (3): 735-754.DOI: 10.3969/j.issn.1005-2542.2024.03.013

• 数字经济与金融工程 • 上一篇    下一篇

基于XGBoost的中国上市公司违约风险预测模型

迟国泰,王珊珊   

  1. 大连理工大学经济管理学院,辽宁 大连 116024
  • 收稿日期:2022-11-10 修回日期:2023-09-11 出版日期:2024-05-28 发布日期:2024-06-04
  • 基金资助:

    国家自然科学基金重点项目(71731003);国家自然科学基金面上项目(7207102672173096719710517197103471873103;国家自然科学基金青年科学基金资助项目(7190105571903019;国家自然科学基金地区科学基金资助项目(72161033);国家社会科学基金重大项目(18ZDA095)

Default Risk Prediction Model for  Chinese Listed Companies Based on XGBoost

CHI Guotai, WANG Shanshan   

  1. School of Economics and Management, Dalian University of Technology, Dalian 116024, Liaoning, China
  • Received:2022-11-10 Revised:2023-09-11 Online:2024-05-28 Published:2024-06-04

摘要:

准确预测上市公司的违约风险,是企业信用风险评价的关键,也是金融机构信贷决策的重要依据。通过线性回归模型的信息量AIC遴选违约判别能力最大的指标组合,采用粒子群优化算法构建基于XGBoost的违约预测模型。选取中国A3 425家上市公司不同时间窗口的数据为样本进行违约预测,将所构建的PSO-XGBoost模型与逻辑回归、支持向量机等13种预测模型对比,验证所建模型的有效性通过UCI数据库中的3个公开信用数据集,利用Friedman检验,验证所建模型的稳健性。研究表明:使用上市公司数据与13种模型对比,PSO-XGBoost模型提高了预测精度G-mean使用3个公开信用数据集,在多个评价指标上,PSO-XGBoost模型的平均预测性能显著优于对比模型;通过指标对预测结果的贡献获得指标重要性得分,增强了预测模型的可解释性。研究发现:“资产负债率”“流动比率”“长期资本负债率”等财务指标对违约预测的影响最大,“行业景气指数”“社会消费品零售总额增长率”“流通中现金(M0)供应量同比增长率等指标是影响违约预测的重要指标。本研究可以为提高违约风险预测的准确性提供有效的方法和实证证据,有助于加强上市公司违约风险的预警和防范,降低违约风险监管成本,为企业管理者、债权人及投资者提供很好的决策支持。

关键词:

违约预测, 指标组合遴选, 决策树参数

Abstract:

Accurate prediction of default risk of listed companies is essential to credit risk evaluation and an important basis for financial institutions to make credit decisions. This paper, by selecting the optimal feature subset with a strong default discriminative ability using the linear regression model based on the Akaike information criterion (AIC) measure, and utilizing particle swarm optimization (PSO) algorithm, builds an extreme gradient boosting (XGBoost) default prediction model based on selected feature subset. Based on the dataset covering 3 425 A-share listed companies in China for different time windows, it empirically compares the proposed model (PSO-XGBoost)  with thirteen well-known benchmark models, including logistic regression and support vector machine, to check the effectiveness of the model. Moreover, it uses Friedman test to further examine the significant difference between the proposed model and the benchmark models using three credit datasets from UCI machine learning repository. The empirical results on listed companies dataset show that the proposed model has a good prediction performance and outperforms other benchmark models in terms of geometric mean G-mean. The majority of performance measures on three credit datasets show that the average prediction performance of the proposed model surpasses that of other benchmark models. This paper obtains the feature importance measured by the relative contribution of each feature to the prediction results and increases the interpretability of the model. The findings reveal that financial indicators containing asset liability ratio, current ratio, and long-term debt to asset ratio have the greatest effects on default prediction. Macro factors including industry prosperity index, gross retail sales growth rate of consumer goods, and growth rate of cash in circulation M0supply, are important features affecting default prediction. This paper provides effective methods and empirical evidence for improving the prediction accuracy of default risk, which helps strengthen the early warning and prevention of default risk for listed companies, reduces regulatory costs for default risk, and provides decision-making support for enterprise managers, creditors, and investors.

Key words:

default prediction, feature subset selection, parameters for decision tree

中图分类号: