skip to main content
10.1145/3638530.3664145acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article
Open access

Enabling An Informed Contextual Multi-Armed Bandit Framework For Stock Trading With Neuroevolution

Published: 01 August 2024 Publication History

Abstract

Multi-armed bandits and contextual multi-armed bandits have demonstrated their proficiency in a variety of application areas. However, these models are highly susceptible to volatility and often exhibit knowledge gaps due to a limited understanding of future states. In this paper, we propose a new bandit framework for what we refer to as informed contextual multi armed bandits (iCMABs) to mitigate these gaps, facilitating "informed" decisions based on predicted future contexts. The performance of an iCMAB is thus highly dependent on the accuracy of the forecast it uses. We examine the use of recurrent neural networks (RNNs) evolved through the EX-AMM neuroevolution algorithm as compared to other time series forecasting (TSF) methods and evaluate our iCMAB framework's ability to make stock market trading decisions for the Dow-Jones Index (DJI) in comparison to other decision making strategies using these forecasts. Our results demonstrate that an iCMAB, driven by evolved RNN architectures, performs better than statistical TSF methods, fixed architecture RNNs for TSF, and other CMAB methods. Using evolved RNNs, iCMAB is able to achieve the highest return of over 21%, a ~7% improvement over not incorporating forecasted values, and a ~5% improvement over DJI's return for that time period.

References

[1]
Shipra Agrawal and Navin Goyal. 2013. Thompson sampling for contextual bandits with linear payoffs. In International conference on machine learning. PMLR, 127--135.
[2]
Mohammad Al Faruque, Francesco Regazzoni, and Miroslav Pajic. 2015. Design methodologies for securing cyber-physical systems. In 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ ISSS). IEEE, 30--36.
[3]
Adebiyi A Ariyo, Adewumi O Adewumi, and Charles K Ayo. 2014. Stock price prediction using the ARIMA model. In 2014 UKSim-AMSS 16th international conference on computer modelling and simulation. IEEE, 106--112.
[4]
Martin Arjovsky, Amar Shah, and Yoshua Bengio. 2016. Unitary evolution recurrent neural networks. In International conference on machine learning. PMLR, 1120--1128.
[5]
Peter Auer, Nicolo Cesa-Bianchi, and Paul Fischer. 2002. Finite-time analysis of the multiarmed bandit problem. Machine learning 47 (2002), 235--256.
[6]
Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund, and Robert E Schapire. 2002. The nonstochastic multiarmed bandit problem. SIAM journal on computing 32, 1 (2002), 48--77.
[7]
Yujin Baek and Ha Young Kim. 2018. ModAugNet: A new forecasting framework for stock market index value with an overfitting prevention LSTM module and a prediction LSTM module. Expert Systems with Applications 113 (2018), 457--480.
[8]
Ayan Banerjee, Krishna K Venkatasubramanian, Tridib Mukherjee, and Sandeep Kumar S Gupta. 2011. Ensuring safety, security, and sustainability of mission-critical cyber-physical systems. Proc. IEEE 100, 1 (2011), 283--299.
[9]
Yoshua Bengio, Patrice Simard, and Paolo Frasconi. 1994. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks 5, 2 (1994), 157--166.
[10]
Djallel Bouneffouf. 2021. Corrupted contextual bandits: Online learning with corrupted context. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 3145--3149.
[11]
Djallel Bouneffouf, Irina Rish, Guillermo A Cecchi, and Rapha�l F�raud. 2017. Context attentive bandits: Contextual bandit with restricted context. arXiv preprint arXiv:1705.03821 (2017).
[12]
Djallel Bouneffouf, Sohini Upadhyay, and Yasaman Khazaeni. 2020. Contextual bandit with missing rewards. arXiv preprint arXiv:2007.06368 (2020).
[13]
Yahya Eru Cakra and Bayu Distiawan Trisedya. 2015. Stock price prediction using linear regression based on sentiment analysis. In 2015 international conference on advanced computer science and information systems (ICACSIS). IEEE, 147--154.
[14]
Luo Chao, Jiang Zhipeng, and Zheng Yuanjie. 2019. A novel reconstructed training-set SVM with roulette cooperative coevolution for financial time series classification. Expert Systems with Applications 123 (2019), 283--298.
[15]
Kyunghyun Cho, Bart Van Merri�nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014).
[16]
Wei Chu, Lihong Li, Lev Reyzin, and Robert Schapire. 2011. Contextual bandits with linear payoff functions. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 208--214.
[17]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014).
[18]
Jasmine Collins, Jascha Sohl-Dickstein, and David Sussillo. 2016. Capacity and Trainability in Recurrent Neural Networks. arXiv preprint arXiv:1611.09913 (2016).
[19]
Travis Desell, AbdElRahman ElSaid, and Alexander G. Ororbia. 2020. An Empirical Exploration of Deep Recurrent Connections Using Neuro-Evolution. In The 23nd International Conference on the Applications of Evolutionary Computation (EvoStar: EvoApps 2020). Seville, Spain.
[20]
Maria Dimakopoulou, Zhengyuan Zhou, Susan Athey, and Guido Imbens. 2019. Balanced linear contextual bandits. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 3445--3453.
[21]
Kevin Dowd. 2007. Measuring market risk. John Wiley & Sons.
[22]
Jeffrey L Elman. 1990. Finding structure in time. Cognitive science 14, 2 (1990), 179--211.
[23]
AbdElRahman ElSaid, Joshua Karnas, Zimeng Lyu, Daniel Krutz, Alexander G Ororbia, and Travis Desell. 2020. Neuro-Evolutionary Transfer Learning through Structural Adaptation. In International Conference on the Applications of Evolutionary Computation (Part of EvoStar). Springer, 610--625.
[24]
AbdElRahman ElSaid, Joshua Karns, Zimeng Lyu, Daniel Krutz, Alexander Ororbia, and Travis Desell. 2020. Improving neuroevolutionary transfer learning of deep recurrent neural networks through network-aware adaptation. In Proceedings of the 2020 Genetic and Evolutionary Computation Conference. 315--323.
[25]
Robert F Engle and Clive WJ Granger. 2003. Time-series econometrics: cointegration and autoregressive conditional heteroskedasticity. Advanced information on the Bank of Sweden Prize in Economic Sciences in Memory of Alfred Nobel 95 (2003), 98.
[26]
Thomas Fischer and Christopher Krauss. 2018. Deep learning with long short-term memory networks for financial market predictions. European journal of operational research 270, 2 (2018), 654--669.
[27]
Pratik Gajane, Tanguy Urvoy, and Emilie Kaufmann. 2018. Corrupt bandits for preserving local privacy. In Algorithmic Learning Theory. PMLR, 387--412.
[28]
Nicolas Galichet, Michele Sebag, and Olivier Teytaud. 2013. Exploration vs exploitation vs safety: Risk-aware multi-armed bandits. In Asian Conference on Machine Learning. PMLR, 245--260.
[29]
Claudio Gentile, Shuai Li, and Giovanni Zappella. 2014. Online clustering of bandits. In International conference on machine learning. PMLR, 757--765.
[30]
Umang Gupta, Vandana Bhattacharjee, and Partha Sarathi Bishnu. 2022. Stock-Net---GRU based stock index prediction. Expert Systems with Applications 207 (2022), 117986.
[31]
Faizal Hafiz, Jan Broekaert, Davide La Torre, and Akshya Swain. 2023. Co-evolution of neural architectures and features for stock market forecasting: A multi-objective decision perspective. Decision Support Systems 174 (2023), 114015.
[32]
Kyle Helfrich, Devin Willmott, and Qiang Ye. 2018. Orthogonal recurrent neural networks with scaled Cayley transform. In International Conference on Machine Learning. PMLR, 1969--1978.
[33]
Bruno Miranda Henrique, Vinicius Amorim Sobreiro, and Herbert Kimura. 2019. Literature review: Machine learning techniques applied to financial market prediction. Expert Systems with Applications 124 (2019), 226--251.
[34]
Sepp Hochreiter and J�rgen Schmidhuber. 1997. Long short-term memory. Neural computation 9, 8 (1997), 1735--1780.
[35]
John J Hopfield. 1982. Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences 79, 8 (1982), 2554--2558.
[36]
Xiaoguang Huo and Feng Fu. 2017. Risk-aware multi-armed bandit problem with application to portfolio selection. Royal Society open science 4, 11 (2017), 171377.
[37]
Anmol Kagrecha, Jayakrishnan Nair, and Krishna Jagannathan. 2022. Statistically robust, risk-averse best arm identification in multi-armed bandits. IEEE Transactions on Information Theory (2022).
[38]
Anmol Kagrecha, Jayakrishnan Nair, and Krishna P Jagannathan. 2019. Distribution oblivious, risk-aware algorithms for multi-armed bandits with unbounded rewards. In NeurIPS. 11269--11278.
[39]
Gul Mummad Khan and Durr e Nayab. 2018. Learning Trends on the Fly in Time Series Data Using Plastic CGP Evolved Recurrent Neural Networks. In Artificial Neural Networks and Machine Learning - ICANN 2018, Věra Kůrková, Yannis Manolopoulos, Barbara Hammer, Lazaros Iliadis, and Ilias Maglogiannis (Eds.). Springer International Publishing, Cham, 199--207.
[40]
Ha Young Kim and Chang Hyun Won. 2018. Forecasting the volatility of stock price index: A hybrid model integrating LSTM with multiple GARCH-type models. Expert Systems with Applications 103 (2018), 25--37.
[41]
Lihong Li, Wei Chu, John Langford, and Robert E Schapire. 2010. A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web. 661--670.
[42]
Qi Li, Norshaliza Kamaruddin, Siti Sophiayati Yuhaniz, and Hamdan Amer Ali Al-Jaifi. 2024. Forecasting stock prices changes using long-short term memory neural network with symbolic genetic programming. Scientific reports 14, 1 (2024), 422.
[43]
Yifan Li, Jing Liu, and Yingzhi Teng. 2022. A decomposition-based memetic neural architecture search algorithm for univariate time series forecasting. Applied Soft Computing 130 (2022), 109714.
[44]
Yifan Lin, Yuhao Wang, and Enlu Zhou. 2023. Risk-averse contextual multi-armed bandit problem with linear payoffs. Journal of Systems Science and Systems Engineering 32, 3 (2023), 267--288.
[45]
Yang Liu, Qi Liu, Hongke Zhao, Zhen Pan, and Chuanren Liu. 2020. Adaptive quantitative trading: An imitative deep reinforcement learning approach. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 2128--2135.
[46]
Wen Long, Zhichen Lu, and Lingxiao Cui. 2019. Deep learning-based feature engineering for stock price movement prediction. Knowledge-Based Systems 164 (2019), 163--173.
[47]
James H. Lorie. 1960. Center for Research in Security Prices, LLC. https://www.crsp.org/
[48]
Tyler Lu, D�vid P�l, and Martin P�l. 2010. Contextual multi-armed bandits. In Proceedings of the Thirteenth international conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, 485--492.
[49]
Zimeng Lyu, AbdElRahman ElSaid, Joshua Karns, Mohamed Mkaouer, and Travis Desell. 2021. An Experimental Study of Weight Initialization and Lamarckian Inheritance on Neuroevolution. The 24th International Conference on the Applications of Evolutionary Computation (EvoStar: EvoApps) (2021).
[50]
Zimeng Lyu, Joshua Karnas, AbdElRahman ElSaid, Mohamed Mkaouer, and Travis Desell. 2021. Improving Distributed Neuroevolution Using Island Extinction and Repopulation. The 24th International Conference on the Applications of Evolutionary Computation (EvoStar: EvoApps) (2021).
[51]
Odalric-Ambrym Maillard. 2013. Robust risk-averse stochastic multi-armed bandits. In International Conference on Algorithmic Learning Theory. Springer, 218--233.
[52]
Warren S McCulloch and Walter Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics 5 (1943), 115--133.
[53]
Jo�o Nadkarni and Rui Ferreira Neves. 2018. Combining NeuroEvolution and Principal Component Analysis to trade in the financial markets. Expert Systems with Applications 103 (2018), 184--195.
[54]
Alexander Ororbia, AbdElRahman ElSaid, and Travis Desell. 2019. Investigating Recurrent Neural Network Memory Structures Using Neuro-evolution. In Proceedings of the Genetic and Evolutionary Computation Conference (Prague, Czech Republic) (GECCO '19). ACM, New York, NY, USA, 446--455.
[55]
Alexander G. Ororbia II, Tomas Mikolov, and David Reitter. 2017. Learning Simpler Language Models with the Differential State Framework. Neural Computation 0, 0 (2017), 1--26. arXiv:https://doi.org/10.1162/neco_a_01017 28957029.
[56]
Felipe Dias Paiva, Rodrigo Tom�s Nogueira Cardoso, Gustavo Peixoto Hanaoka, and Wendel Moreira Duarte. 2019. Decision-making for financial trading: A fusion approach of machine learning and portfolio selection. Expert Systems with Applications 115 (2019), 635--655.
[57]
Jeffrey Palmerino, Qi Yu, Travis Desell, and Daniel Krutz. 2019. Improving the decision-making process of self-adaptive systems by accounting for tactic volatility. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 949--961.
[58]
Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the difficulty of training recurrent neural networks. In International conference on machine learning. Pmlr, 1310--1318.
[59]
Mingyue Qiu, Yu Song, and Fumio Akagi. 2016. Application of artificial neural network for the prediction of stock market returns: The case of the Japanese stock market. Chaos, Solitons & Fractals 85 (2016), 1--7.
[60]
Mehreen Rehman, Gul Muhammad Khan, and Sahibzada Ali Mahmud. 2014. Foreign currency exchange rates prediction using cgp and recurrent neural network. IERI Procedia 10 (2014), 239--244.
[61]
David E Rumelhart and David Zipser. 1985. Feature discovery by competitive learning. Cognitive science 9, 1 (1985), 75--112.
[62]
Aadirupa Saha, Pierre Gaillard, and Michal Valko. 2020. Improved sleeping bandits with stochastic action sets and adversarial rewards. In International Conference on Machine Learning. PMLR, 8357--8366.
[63]
Patrick Saux and Odalric Maillard. 2023. Risk-aware linear bandits with convex loss. In International Conference on Artificial Intelligence and Statistics. PMLR, 7723--7754.
[64]
Ramit Sawhney, Shivam Agarwal, Arnav Wadhwa, Tyler Derr, and Rajiv Ratn Shah. 2021. Stock selection via spatiotemporal hypergraph attention network: A learning to rank approach. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 497--504.
[65]
Shubham Sharma, Yunfeng Zhang, Jes�s M R�os Aliaga, Djallel Bouneffouf, Vinod Muthusamy, and Kush R Varshney. 2020. Data augmentation for discrimination prevention and bias disambiguation. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 358--364.
[66]
Aleksandrs Slivkins. 2019. Introduction to multi-armed bandits. arXiv preprint arXiv:1904.07272 (2019).
[67]
Mehak Usmani, Syed Hasan Adil, Kamran Raza, and Syed Saad Azhar Ali. 2016. Stock market prediction using machine learning techniques. In 2016 3rd international conference on computer and information sciences (ICCOINS). IEEE, 322--327.
[68]
Joannes Vermorel and Mehryar Mohri. 2005. Multi-armed bandit algorithms and empirical evaluation. In European conference on machine learning. Springer, 437--448.
[69]
Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017. Factorization bandits for interactive recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 31.
[70]
Xiao Xu, Fang Dong, Yanghua Li, Shaojian He, and Xin Li. 2020. Contextual-bandit based personalized recommendation with time-varying user interests. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6518--6525.
[71]
Hongyang Yang, Xiao-Yang Liu, Shan Zhong, and Anwar Walid. 2020. Deep reinforcement learning for automated stock trading: An ensemble strategy. In Proceedings of the first ACM international conference on AI in finance. 1--8.
[72]
Yunan Ye, Hengzhi Pei, Boxin Wang, Pin-Yu Chen, Yada Zhu, Ju Xiao, and Bo Li. 2020. Reinforcement-learning based portfolio management with augmented asset movement prediction states. In Proceedings of the AAAI conference on artificial intelligence, Vol. 34. 1112--1119.
[73]
Seyoung Yun, Jun Hyun Nam, Sangwoo Mo, and Jinwoo Shin. 2017. Contextual multi-armed bandits under feature uncertainty. Technical Report. Los Alamos National Lab.(LANL), Los Alamos, NM (United States).
[74]
Faheem Zafari, Gul Muhammad Khan, Mehreen Rehman, and Sahibzada Ali Mahmud. 2014. Evolving recurrent neural network using cartesian genetic programming to predict the trend in foreign currency exchange rates. Applied Artificial Intelligence 28, 6 (2014), 597--628.
[75]
Guo-Bing Zhou, Jianxin Wu, Chen-Lin Zhang, and Zhi-Hua Zhou. 2016. Minimal gated unit for recurrent neural networks. International Journal of Automation and Computing 13, 3 (2016), 226--234.
[76]
Li Zhou. 2015. A survey on contextual multi-armed bandits. arXiv preprint arXiv:1508.03326 (2015).

Index Terms

  1. Enabling An Informed Contextual Multi-Armed Bandit Framework For Stock Trading With Neuroevolution

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        GECCO '24 Companion: Proceedings of the Genetic and Evolutionary Computation Conference Companion
        July 2024
        2187 pages
        ISBN:9798400704956
        DOI:10.1145/3638530
        This work is licensed under a Creative Commons Attribution International 4.0 License.

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 01 August 2024

        Check for updates

        Author Tags

        1. multi-armed bandits
        2. recurrent neural networks
        3. decision making

        Qualifiers

        • Research-article

        Funding Sources

        Conference

        GECCO '24 Companion
        Sponsor:

        Acceptance Rates

        Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 89
          Total Downloads
        • Downloads (Last 12 months)89
        • Downloads (Last 6 weeks)37
        Reflects downloads up to 22 Oct 2024

        Other Metrics

        Citations

        View Options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media