考虑行为克隆的深度强化学习股票交易策略

doi:10.3969/j.issn.1005-2542.2024.01.011

系统管理学报 ›› 2024, Vol. 33 ›› Issue (1): 150-161.DOI: 10.3969/j.issn.1005-2542.2024.01.011

考虑行为克隆的深度强化学习股票交易策略

杨兴雨，陈亮威，郑萧腾，张永

广东工业大学管理学院，广州 510520

收稿日期:2022-11-28 修回日期:2023-06-23 出版日期:2024-01-28 发布日期:2024-01-26
基金资助:
国家自然科学基金资助项目（72371080）；广东省基础与应用基础研究基金资助项目（2023A1515012840）；广东省哲学社会科学规划项目（GD23XGL022）

Stock Trading Strategy via Deep Reinforcement Learning with Behavior Cloning

YANG Xingyu, CHEN Liangwei, ZHENG Xiaoteng, ZHANG Yong

School of Management, Guangdong University of Technology, Guangzhou 510520, China

Received:2022-11-28 Revised:2023-06-23 Online:2024-01-28 Published:2024-01-26
Supported by:

摘要/Abstract

摘要：

为提高股票投资的收益并降低风险，将模仿学习中的行为克隆思想引入深度强化学习框架中设计股票交易策略。在策略设计过程中，将对决DQN深度强化学习算法和行为克隆进行结合，使智能体在自主探索的同时模仿事先构造的投资专家的决策。选择不同行业的股票进行数值实验，说明了所设计的交易策略在年化收益率、夏普比率和卡玛比率等收益与风险指标上优于对比策略。研究结果表明：将模仿学习与深度强化学习相结合可以使智能体同时具有探索和模仿能力，从而提高模型的泛化能力和策略的适用性。

关键词: 股票交易策略, 深度强化学习, 模仿学习, 行为克隆, 对决DQN

Abstract:

In order to improve the return of stock investment and reduce the risk, this paper introduces the idea of behavior cloning in imitation learning into the deep reinforcement learning framework to design a stock trading strategy. In the process of strategy design, the dueling deep Q-learning (DQN) algorithm and behavior cloning are combined, which enables the agent to imitate the decision of pre-constructed investment expert while exploring autonomously. A numerical experiment is conducted on selected stocks from different industries, which illustrates that the designed trading strategy is superior to the comparison strategies in terms of the return and risk metrics such as the annualized percentage yield (APY), Sharpe ratio (SR), and Calmar ratio (CR). The research result shows that combining imitation learning and deep reinforcement learning enables the agent to simultaneously have the abilities of exploration and imitation, and thus improves the generalization ability of the model and the applicability of the strategy.

Key words: stock trading strategy, deep reinforcement learning, imitation learning, behavior cloning, dueling deep Q-learning network (DQN)

中图分类号:

F830

杨兴雨, 陈亮威, 郑萧腾, 张永.

考虑行为克隆的深度强化学习股票交易策略 [J]. 系统管理学报, 2024, 33(1): 150-161.

YANG Xingyu, CHEN Liangwei, ZHENG Xiaoteng, ZHANG Yong.

Stock Trading Strategy via Deep Reinforcement Learning with Behavior Cloning [J]. Journal of Systems & Management, 2024, 33(1): 150-161.

[1]	迟国泰, 王珊珊. 基于XGBoost的中国上市公司违约风险预测模型 [J]. 系统管理学报, 2024, 33(3): 735-754.
[2]	黄苒, 冯小钰. 供应链信用风险传染、银行策略与风险控制 [J]. 系统管理学报, 2024, 33(1): 137-149.
[3]	甘柳, 夏鑫. 基于动态投资的鲁棒契约设计与应用研究 [J]. 系统管理学报, 2024, 33(1): 230-239.
[4]	杨科, 付胜杰, 田凤平. 基于收缩和稀疏方法的商品期货市场已实现协方差矩阵动态建模与预测 [J]. 系统管理学报, 2023, 32(6): 1283-1298.
[5]	吴海波, 吴冲锋. 移动化交易与彩票类股票需求：来自A股市场的经验证据 [J]. 系统管理学报, 2023, 32(5): 1022-1035.
[6]	程飞阳, 姚守宇, 王春峰, 高学鑫. 地区酒文化与股价崩盘风险：来自A股市场的经验证据 [J]. 系统管理学报, 2023, 32(5): 1086-1102.
[7]	朱顺伟, 刘海龙, 周春阳. 量价趋势、信息不对称与股票收益率：基于中国A股市场的实证研究 [J]. 系统管理学报, 2023, 32(4): 774-783.
[8]	李合龙, 袁宜晨, 张卫国. 中国股市行业间投资者情绪传染效应研究——基于VMD-WA模型 [J]. 系统管理学报, 2023, 32(4): 784-795.
[9]	牛华伟. 债务能力、流动性与风险管理——基于内生信贷约束的视角 [J]. 系统管理学报, 2023, 32(4): 839-852.
[10]	何锦安, 彭方平, 殷仕成. 基于自适应矩估计的在线投资组合梯度下降策略[J]. 系统管理学报, 2023, 32(2): 343-354.
[11]	尚倩倩. 上市公司高管迎合市场并购动机[J]. 系统管理学报, 2023, 32(2): 424-434.
[12]	张耀杰, 王玉东. 原油价格预测：近30年研究回顾和未来展望[J]. 系统管理学报, 2022, 31(6): 1169-1189.
[13]	孙灏, 朱晓谦, 李建平. 考虑财务报告中文本风险信息的财务困境预测[J]. 系统管理学报, 2022, 31(6): 1204-1215.
[14]	陈洪涛, 昝秋雨, 王锋, 叶鑫. 基于均值-MF-X-DMA的能源产业链投资组合策略[J]. 系统管理学报, 2022, 31(5): 964-975.
[15]	张小成, 谭琳琳. 异质预期还是情绪异化？——IPO高抑价解释的新见解[J]. 系统管理学报, 2022, 31(5): 976-987.

考虑行为克隆的深度强化学习股票交易策略

Stock Trading Strategy via Deep Reinforcement Learning with Behavior Cloning

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics