系统管理学报

• •    下一篇

基于因果推断思想的数据价值评估方法

张怡,王志远   

  1. 上海大学 管理学院,上海 200444
  • 收稿日期:2024-12-25 修回日期:2025-08-16
  • 基金资助:
    国家自然科学基金青年项目(72201160,72271155)

Data Evaluation Method Based on Causal Inference

ZHANG Yi,WANG Zhiyuan   

  1. School of Management, Shanghai University, Shanghai 200444, China
  • Received:2024-12-25 Revised:2025-08-16

摘要: 为推动数据交易与数据市场的发展,解决传统数据评估方法在成本和计算复杂性方面的挑战,本文提出了一种新的基于因果推断的数据增量价值评估方法——数据合成控制法(data synthetic control method,DSCM)。DSCM从数据买方视角出发,将新增数据视为一项干预,创新性地构建反事实推断框架,以精准量化新增数据对监督学习模型性能的贡献。在仿真数据实验中,DSCM对不同模型的数据价值估计与实际数据价值高度吻合,平均误差仅为0.0032。在广告点击预测实际应用中,DSCM对用户中心数据价值的估计与实际价值的平均误差率仅为9%,显著优于传统评估方法。这表明DSCM能够提供准确、稳定的数据价值评估,有效支持企业的数据驱动决策。

关键词: 数据价值, 数据估值, 因果推断, 合成控制

Abstract: To promote the development of data trading and markets by addressing the challenges of high cost andcomputational complexity in traditional data valuation methods, this paper proposes a novel causal inference-basedmethod for assessing the incremental value of data from a buyer's perspective: the Data Synthetic Control Method(DSCM). Viewing the addition of new data as an intervention, DSCM innovatively constructs a counterfactualinference framework to precisely quantify the contribution of new data to the performance of supervised machinelearning models. In simulated experiments, the value estimation from DSCM method shows high concordance withactual observed values for various machine learning models, with an average error of only 0.0032. In a real-worldapplication of advertising click-through rate prediction, the value estimation from DSCM method of user-centricdata exhibits an average error rate of only 9% compared to the actual performance lift, significantly outperformingtraditional valuation methods. These evidences show that the proposed method can provide accurate and stable datavalue assessment and effectively support enterprises' data-driven decision-making.

Key words: data value, data evaluation, causal inference, synthetic control

中图分类号: