Comparative Analysis of XGBoost Algorithm and Linear Regression in Predicting the Trend of Investor Overreaction
Keywords:
Overreaction, investment decisions, XGBoost algorithmAbstract
Overreaction is one of the observable anomalies in financial markets that can lead to market inefficiency. This phenomenon is particularly prevalent in emerging and less developed markets. Evidence suggests that investors tend to overreact to financial events, which introduces bias into their decision-making processes. Consequently, the market deviates from its optimal efficiency. Detecting and predicting such reactions can assist investors in making more rational decisions regarding the purchase and sale of stocks and other securities. For this purpose, methods such as linear regression and the XGBoost algorithm are employed. Due to its high capability in modeling complex relationships, the XGBoost algorithm can play a significant role in analyzing investor behavior. The objective of this study is to compare the performance of these two methods in predicting the trend of investor overreaction and to contribute to the improvement of investment strategies and risk management. This study is descriptive-causal in nature and is conducted based on an experimental design with a post-event approach. To test the hypotheses, a multivariate linear regression method based on panel data and a combination of time series was used. The required information was collected through library research, and financial data from companies within the statistical population were examined. The statistical population includes all companies listed on the Tehran Stock Exchange during the period from 2011 to 2021. Using a systematic elimination sampling method, 110 companies were selected as the sample. In data analysis, the relationships between variables were examined using regression methods, and the findings were compared with the results obtained from the XGBoost algorithm. The findings indicate the superiority of the XGBoost algorithm over linear regression in terms of the coefficient of determination and the mean squared error (MSE) index. Specifically, the highest coefficient of determination in the test data for the XGBoost algorithm was found to be 0.5713, whereas for the linear regression model, it was 0.4938. Additionally, the MSE index for the XGBoost algorithm in the test data was reported as 0.002288, while for the linear regression model, it was 0.0042. These results demonstrate that the XGBoost algorithm outperforms linear regression in terms of reducing error and increasing predictive accuracy. The XGBoost algorithm, with its ability to detect complex and nonlinear patterns, offers higher accuracy in predictions. By reducing error and increasing the coefficient of determination, this model enables more precise and reliable forecasting of the persistence of investor overreaction trends. Therefore, utilizing the XGBoost algorithm can be considered an efficient method in financial data analysis and investment decision-making improvement.