Stock Trading Strategy via Deep Reinforcement Learning with Behavior Cloning

doi:10.3969/j.issn.1005-2542.2024.01.011

Journal of Systems & Management ›› 2024, Vol. 33 ›› Issue (1): 150-161.DOI: 10.3969/j.issn.1005-2542.2024.01.011

Previous Articles Next Articles

Stock Trading Strategy via Deep Reinforcement Learning with Behavior Cloning

YANG Xingyu, CHEN Liangwei, ZHENG Xiaoteng, ZHANG Yong

School of Management, Guangdong University of Technology, Guangzhou 510520, China

Received:2022-11-28 Revised:2023-06-23 Online:2024-01-28 Published:2024-01-26
Supported by:

考虑行为克隆的深度强化学习股票交易策略

杨兴雨，陈亮威，郑萧腾，张永

广东工业大学管理学院，广州 510520

基金资助:
国家自然科学基金资助项目（72371080）；广东省基础与应用基础研究基金资助项目（2023A1515012840）；广东省哲学社会科学规划项目（GD23XGL022）

Abstract

Abstract:

In order to improve the return of stock investment and reduce the risk, this paper introduces the idea of behavior cloning in imitation learning into the deep reinforcement learning framework to design a stock trading strategy. In the process of strategy design, the dueling deep Q-learning (DQN) algorithm and behavior cloning are combined, which enables the agent to imitate the decision of pre-constructed investment expert while exploring autonomously. A numerical experiment is conducted on selected stocks from different industries, which illustrates that the designed trading strategy is superior to the comparison strategies in terms of the return and risk metrics such as the annualized percentage yield (APY), Sharpe ratio (SR), and Calmar ratio (CR). The research result shows that combining imitation learning and deep reinforcement learning enables the agent to simultaneously have the abilities of exploration and imitation, and thus improves the generalization ability of the model and the applicability of the strategy.

Key words: stock trading strategy, deep reinforcement learning, imitation learning, behavior cloning, dueling deep Q-learning network (DQN)

摘要：

为提高股票投资的收益并降低风险，将模仿学习中的行为克隆思想引入深度强化学习框架中设计股票交易策略。在策略设计过程中，将对决DQN深度强化学习算法和行为克隆进行结合，使智能体在自主探索的同时模仿事先构造的投资专家的决策。选择不同行业的股票进行数值实验，说明了所设计的交易策略在年化收益率、夏普比率和卡玛比率等收益与风险指标上优于对比策略。研究结果表明：将模仿学习与深度强化学习相结合可以使智能体同时具有探索和模仿能力，从而提高模型的泛化能力和策略的适用性。

关键词: 股票交易策略, 深度强化学习, 模仿学习, 行为克隆, 对决DQN

CLC Number:

F830

YANG Xingyu, CHEN Liangwei, ZHENG Xiaoteng, ZHANG Yong.

Stock Trading Strategy via Deep Reinforcement Learning with Behavior Cloning [J]. Journal of Systems & Management, 2024, 33(1): 150-161.

杨兴雨, 陈亮威, 郑萧腾, 张永.

考虑行为克隆的深度强化学习股票交易策略 [J]. 系统管理学报, 2024, 33(1): 150-161.

[1]	CHI Guotai, WANG Shanshan. Default Risk Prediction Model for Chinese Listed Companies Based on XGBoost [J]. Journal of Systems & Management, 2024, 33(3): 735-754.
[2]	HUANG Ran, FENG Xiaoyu. Supply Chain Credit Risk Contagion, Banking Strategy, and Risk Control [J]. Journal of Systems & Management, 2024, 33(1): 137-149.
[3]	GAN Liu, XIA Xin. Design of Robust Contracts and Its Application Based on Dynamic Investment [J]. Journal of Systems & Management, 2024, 33(1): 230-239.
[4]	YANG Ke, FU Shengjie, TIAN Fengping. Dynamic Modeling and Forecasting of Realized Covariance Matrices in Commodity Futures Markets Based on Shrinkage and Sparsity Methods [J]. Journal of Systems & Management, 2023, 32(6): 1283-1298.
[5]	WU Haibo, WU Chongfeng. Mobile Trading and Lottery-Like Stocks Demand: Empirical Evidence from A Share Market [J]. Journal of Systems & Management, 2023, 32(5): 1022-1035.
[6]	CHENG Feiyang, YAO Shouyu, WANG Chunfeng, GAO Xuexin. Drinking Culture and Stock Price Crash Risk: Empirical Evidence from Chinese Stock Market [J]. Journal of Systems & Management, 2023, 32(5): 1086-1102.
[7]	ZHU Shunwei, LIU Hailong, ZHOU Chunyang. Price and Volume Trends, Information Asymmetry and Stock Returns: An Empirical Study Based on China Stock Market [J]. Journal of Systems & Management, 2023, 32(4): 774-783.
[8]	LI Helong, YUAN Yichen, ZHANG Weiguo. Contagion Effect of Investor Sentiment Among Chinese Stock Market Industries Based on the VMD-WA Model [J]. Journal of Systems & Management, 2023, 32(4): 784-795.
[9]	NIU Huawei. Debt Capacity, Liquidity, and Risk Management:Perspective from Endogenous Credit Constraints [J]. Journal of Systems & Management, 2023, 32(4): 839-852.
[10]	HE Jin’an, PENG Fangping, YIN Shicheng. Gradient Descent Strategy for Online Portfolio Based on Adaptive Moment Estimation [J]. Journal of Systems & Management, 2023, 32(2): 343-354.
[11]	SHANG Qianqian. Motivation of Executives of Listed Companies Catering to Market Merger and Acquisition [J]. Journal of Systems & Management, 2023, 32(2): 424-434.
[12]	ZHANG Yaojie, WANG Yudong. Crude Oil Price Forecasting: A 30-Year Literature Review and Future Directions [J]. Journal of Systems & Management, 2022, 31(6): 1169-1189.
[13]	SUN Hao, ZHU Xiaoqian, LI Jianping. Financial Distress Prediction Based on Textual Risk Disclosures in Financial Reports [J]. Journal of Systems & Management, 2022, 31(6): 1204-1215.
[14]	CHEN Hongtao, ZAN Qiuyu, WANG Feng, YE Xin. A Portfolio Strategy of Energy Industry Chain Based on Mean-MF-X-DMA [J]. Journal of Systems & Management, 2022, 31(5): 964-975.
[15]	ZHANG Xiaocheng, TAN Linlin. Heterogeneous Expectation or Emotional Alienation? A New View on the Explanation of IPO High Underpricing [J]. Journal of Systems & Management, 2022, 31(5): 976-987.

Stock Trading Strategy via Deep Reinforcement Learning with Behavior Cloning

考虑行为克隆的深度强化学习股票交易策略

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics