Forecasting S&P 500 Volatility with the HAR-RV Model


This article explores the application of the Heterogeneous Autoregressive Realized Volatility (HAR-RV) model to forecast future volatility in the S&P 500 index using data from 1990 to 2023. By incorporating log-transformed daily, weekly, and monthly realized volatilities, the HAR-RV model captures volatility patterns across different time horizons. The results show that long-term volatility (monthly) plays the most significant role in predicting future volatility, while short-term volatility (daily) exhibits mean-reversion, contributing less to long-term forecasts.

The analysis is complemented by a rolling window regression, revealing how the importance of each volatility component changes over time, particularly during financial crises. Despite the model's success in capturing broad trends, residual diagnostics indicate that the model struggles to account for extreme market events, as evidenced by non-normal residuals and underpredictions during periods of high volatility, such as the Global Financial Crisis and the COVID-19 pandemic.

{tocify} $title={Table of Contents}

Introduction

The S&P 500 is one of the most significant benchmarks in the global economy, representing the performance of 500 of the largest publicly traded companies in the United States. As a market-capitalization-weighted index, it accounts for approximately 80% of the U.S. stock market value, making it a reliable indicator of the overall health of the U.S. economy, which, in turn, impacts the global financial markets. The S&P 500 influences investor sentiment and decision-making not only in the U.S. but also internationally, as U.S. companies often have substantial global operations and supply chains. Consequently, fluctuations in the S&P 500 can signal broader economic trends, affecting everything from corporate earnings expectations to global trade flows. Accurate forecasting of S&P 500 volatility is critical for risk management, asset pricing, and investment strategies, as it helps market participants anticipate periods of uncertainty or stability. 

Volatility forecasting is a crucial tool for risk management, asset pricing, and portfolio optimization. Traditional models often struggle to capture the different time horizons over which market participants respond to volatility. The Heterogeneous Autoregressive Realized Volatility (HAR-RV) model, proposed by Corsi (2009), offers a solution by accounting for daily, weekly, and monthly volatilities. This article applies the HAR-RV model to the S&P 500 index over the period 1990–2023. By using log transformations of realized volatility components, we aim to reduce skewness and stabilize variance, thus improving the accuracy of volatility forecasts. Additionally, this paper utilizes the econkit library, which automates the data retrieval process and the calculation of daily returns, making it easier to focus on advanced modeling techniques. 

In the following sections, we will go step by step through the data collection, computation of realized volatilities, log transformations, and the implementation of the HAR-RV model. Finally, we will analyze the results and explore how volatility behaves over time through a rolling window analysis.

Data Collection and Preprocessing

The data for this analysis consists of the daily prices of the S&P 500 index, retrieved using the econkit library. The data spans from January 1, 1990, to December 31, 2023, and includes daily adjusted closing prices.

Return Calculation

The daily logarithmic returns were automatically calculated using econkit’s data retrieval function. The logarithmic returns are computed as:

`r_{t}=\ln((P_{t})/(P_{t-1}))`

This automation ensures accuracy and saves time, making it easier to move forward with modeling.

Realized Volatility 

The daily realized volatility (`RV_{t}`) was calculated as the square of the daily returns:

`RV_{t}=r_t^2`

This daily realized volatility serves as the primary measure of market volatility and is used as a building block for the HAR-RV model. The Figure 1 shows the daily realized volatility for the S&P 500 index from 1990 to 2024. The graph highlights the presence of volatility clustering, a common phenomenon in financial markets where high-volatility periods are followed by more high-volatility periods, and similarly, low-volatility periods are followed by more calm periods. Volatility does not occur uniformly but instead spikes during specific events, as shown by the peaks in the graph.

Figure 1

Major Peaks

  • The first major spike occurs during the Dot-com Bubble in the early 2000s, where the technology sector experienced severe instability, leading to a noticeable rise in volatility.
  • The Global Financial Crisis (2007–2008) marks the second major spike. During this period, the S&P 500 experienced dramatic price movements due to the collapse of financial institutions and global market turbulence, causing volatility to reach its highest level.
  • The final significant spike is observed in 2020 during the onset of the COVID-19 pandemic, which led to extreme market uncertainty and unprecedented price swings. This was one of the most volatile periods in recent history, as reflected by the sharp peak in the graph.

Stable Periods

Between these crisis periods, there are relatively stable periods where daily volatility remains low and consistent. For example, the period from 2012 to 2015 exhibits low realized volatility, indicating a time of relative market stability.

Post-Crisis Recovery

After each significant spike in volatility, the market gradually returns to periods of lower volatility. This recovery behavior is evident after the peaks in both 2008 and 2020, where the extreme volatility slowly subsides as markets stabilize.

Weekly and Monthly Realized Volatility

In line with the HAR-RV model, weekly realized volatility was calculated as the 5-day rolling average of daily realized volatility, and monthly realized volatility was computed as the 22-day rolling average:

`RV_5 = 1/5 sum_(i=1)^5 RV_(t-i)`, `RV_{22} = 1/22 sum_(i=1)^22 RV_(t-i)`

Log Transformations

To stabilize the variance of the volatility measures and reduce skewness, log transformations were applied to the daily, weekly, and monthly volatilities:

`\log(RV_1)`, `\log(RV_5)`, `\log(RV_{22})`

Figure 2 presents the log-transformed daily (`\log(RV_1)`), weekly (`\log(RV_5)`) and monthly (`\log(RV_22)`) volatility components for the S&P 500 from 1990 to 2023. This transformation stabilizes the variance of the volatility components, allowing for better linear relationships between them and future volatility in the HAR-RV model.

Figure 2

The black line, representing daily volatility (`\log(RV_1)`), is characterized by high variability and frequent spikes. This reflects the nature of short-term market fluctuations, where daily volatility reacts immediately to market events or shocks. The sharp peaks, particularly during major financial crises such as the Dot-com Bubble (2000), the Global Financial Crisis (2008), and the COVID-19 pandemic (2020), highlight the responsiveness of daily volatility to sudden, short-term disruptions

In contrast, the grey line, which represents weekly volatility (`\log(RV_5)`), smooths out some of the extreme fluctuations seen in daily volatility. Although it still captures significant volatility trends and spikes, weekly volatility is less sensitive to immediate shocks. Instead, it exhibits more gradual changes over time, reflecting the market’s response to intermediate-term factors such as economic data releases, corporate earnings reports, or geopolitical developments.

The red line represents monthly volatility (`\log(RV_22)`), which is the smoothest of the three volatility components. Monthly volatility reflects longer-term market trends and is less influenced by short-term events. Its behavior captures broader, sustained movements in market volatility over time, making it more indicative of longer-term market conditions. During periods of financial instability, such as the Global Financial Crisis, the monthly volatility component rises, but its movements are much more tempered compared to daily and weekly volatilities.

Overall, the figure highlights how the different volatility horizons—daily, weekly, and monthly—behave across time and respond to various market conditions. Daily volatility captures the most immediate market reactions, with sharp, erratic movements. Weekly volatility balances the short-term spikes with more gradual changes, while monthly volatility reflects long-term trends and broader market shifts. The different volatility components play complementary roles in the HAR-RV model, where each horizon contributes uniquely to the forecasting of future volatility. The phenomenon of volatility clustering is also visible, where high-volatility periods (e.g., 2008, 2020) persist for extended periods, further illustrating how market uncertainty often builds up and does not dissipate quickly.

The HAR-RV Model

This article outlines the steps of the model implementation, presents the results, and discusses the implications for volatility forecasting. The HAR-RV model is designed to capture the heterogeneous time horizons over which market participants react to volatility [1]. The model is defined as:

`RV_{t+1} = \beta_0 + \beta_1\timesRV_t + \beta_2\timesRV_{t}^{(5)}+ \beta_3\timesRV_{t}^{(22)} + \epsilon_t`

Where:

  • `RV_{t}` is the daily realized volatility,
  • `RV_{t}^{(5)}` is the weekly realized volatility, and
  • `RV_{t}^{(22)}` is the monthly realized volatility.

This model is particularly useful for capturing short-term, medium-term, and long-term volatility dynamics. For a more detailed explanation of the HAR-RV model and its components, you can refer to my earlier post where I explore its theoretical foundations (A Comprehensive Overview of HAR-RV Model). 

In this application, we estimate the following log-transformed HAR-RV model:

`\log(RV_{t+1}) = \beta_0 + \beta_1\times\log(RV_t) + \beta_2\times\log(RV_{t}^{(5)})+ \beta_3\times\log(RV_{t}^{(22)}) + \epsilon_t`

This approach ensures that the relationships between the variables are more linear and that the model fits better to the data.

Results and Discussion

Interpretation of Results

Variable Coefficient Std. Error t-statistic p-value 95% CI
const -2.5458 0.256 -9.932 0.000 [-3.048, -2.043]
`log(RV_{t}^1)` -0.0438 0.012 -3.763 0.000 [-0.067, -0.021]
`log(RV_{t}^5)` 0.3150 0.037 8.500 0.000 [0.242, 0.388]
`log(RV_{t}^22)` 0.6081 0.044 13.840 0.000 [0.522, 0.694]

The constant term in the model, `const` (`2.5458`) is highly significant with a t-statistic of `-9.932` and a p-value of `0.000`. This negative intercept indicates that, in the absence of any influence from daily, weekly, or monthly volatility (i.e., when these components are zero), the expected future volatility is relatively low on the log scale. While this value may not have a direct financial interpretation, it serves as the baseline from which the other coefficients operate. The significance of the constant term suggests that there are baseline market dynamics not fully captured by the realized volatility components.

The coefficient for daily volatility (`\log(RV_t)`) is `-0.0438`, which is both statistically significant (`p=0.000`) and negative. This negative relationship implies that as daily volatility increases, future volatility tends to decrease slightly, showcasing a mean-reverting behavior in daily volatility. Mean reversion is a common phenomenon in financial markets, where short-term spikes in volatility are followed by a period of relative calm, as market participants adjust their expectations. Although the magnitude of this effect is small, its significance indicates that recent market turbulence tends to dissipate quickly, rather than influencing volatility in the long term.

Weekly volatility (`\log(RV_5)`) shows a positive and significant coefficient of `0.3150` with a t-statistic of 8.500 and a p-value of 0.000. This suggests that weekly volatility has a much stronger and more persistent effect on future volatility compared to daily volatility. The positive coefficient indicates that an increase in weekly volatility is associated with an increase in future volatility, reflecting that market trends over a medium-term horizon (5 trading days) tend to have a lasting impact on future volatility levels. This finding aligns with financial theory, which posits that intermediate-term trends, such as market corrections or reactions to macroeconomic news, can lead to sustained periods of heightened or diminished volatility.

The most influential predictor in the model is monthly volatility (`\log(RV_22)` with a coefficient of 0.6081, which is highly significant (p=0.000) and has a t-statistic of 13.840. This result highlights the importance of long-term volatility in forecasting future market conditions. The large positive coefficient suggests that when volatility is elevated over a longer time horizon (22 trading days, roughly one trading month), it exerts a strong influence on future volatility. This finding is consistent with the view that market participants react more to sustained volatility trends than to short-term fluctuations. Long-term volatility may reflect deeper market or macroeconomic shifts, such as changes in monetary policy, geopolitical events, or prolonged financial uncertainty, which can persistently affect market expectations and future volatility.

In summary, while all three volatility components (daily, weekly, and monthly) significantly contribute to the forecast of future volatility, their effects differ in magnitude and persistence. Daily volatility shows a small mean-reverting behavior, suggesting that short-term market movements are less predictive of future volatility. In contrast, weekly volatility has a more sustained impact, indicating that market conditions over the course of a week provide valuable information about future volatility. Most importantly, monthly volatility exerts the strongest influence, reinforcing the idea that long-term trends are the key drivers of future market uncertainty.

Figure 3

The graph shows that the HAR-RV model does a good job of capturing the overall trends in volatility, particularly during periods of relative market stability. During calmer market periods, the predicted volatility closely follows the actual volatility, indicating that the model accurately predicts future movements based on historical data.

However, during high-volatility events, such as the financial crises, the model underpredicts the extreme peaks in volatility. This suggests that while the HAR-RV model is well-suited for forecasting typical market conditions, it may not be able to fully capture the impact of sudden, extreme market shocks, which are often driven by unpredictable events or external shocks not accounted for in the model.

In summary, Figure 3 highlights that while the HAR-RV model provides a solid framework for forecasting volatility trends, there are limitations in its ability to predict extreme market events. These deviations suggest that additional models or extensions (such as GARCH models or volatility models that account for fat tails) [2][3][4] may be required to capture these sharp market movements more effectively.

Model Diagnostics

Diagnostic Value
R-squared (R²) 0.119
Adjusted R-squared (Adj. R²) 0.118
Durbin-Watson (DW) 2.004
Omnibus 1924.645
Prob(Omnibus) 0.000
Jarque-Bera (JB) 4542.993
Prob(JB) 0.000
Skew -1.259
Kurtosis 5.534

The R-squared value of 0.119 indicates that approximately 11.9% of the variance in future volatility is explained by the daily, weekly, and monthly log-transformed realized volatilities. While this may seem like a modest explanatory power, it is important to note that volatility, especially in financial markets, is inherently stochastic and difficult to predict. In practice, volatility models like the HAR-RV rarely achieve high R-squared values due to the noisy and unpredictable nature of market data. Therefore, an R-squared of 0.119 is reasonable for this type of financial time-series model, especially given that the model is designed to capture medium- to long-term trends rather than short-term market fluctuations.

The Durbin-Watson statistic of 2.004 is very close to 2, suggesting that there is no significant autocorrelation in the residuals of the model. This is a crucial diagnostic for time-series models, as the presence of autocorrelation in the residuals would indicate that the model has not fully captured the underlying dynamics of volatility. The lack of autocorrelation implies that the model residuals are well-behaved and that the HAR-RV model effectively captures the dependence structure of the volatility process across the different time horizons (daily, weekly, and monthly).

The results of the Omnibus test and the Jarque-Bera (JB) test for normality show that the residuals are not normally distributed. Specifically, the Omnibus statistic is 1924.645 with a probability of 0.000, and the Jarque-Bera test has a value of 4542.993, also with a probability of 0.000. This indicates that there is significant skewness and kurtosis in the residuals. The negative skewness of -1.259 suggests that the residuals are slightly tilted to the left, implying that the model tends to underpredict volatility during extreme negative market events. The kurtosis of 5.534 is greater than 3, indicating that the residuals exhibit heavy tails, which is characteristic of financial time series data where extreme market movements (either upward or downward) are more frequent than would be expected under a normal distribution.

While the departure from normality in the residuals suggests that the model does not fully account for all the extreme movements in the market, this is not entirely unexpected for a volatility model applied to financial data. Financial markets are known to exhibit volatility clustering, where periods of extreme volatility tend to be followed by more extreme volatility, a feature that may not be fully captured by the linear HAR-RV model alone. For this reason, more advanced models, such as GARCH or Stochastic Volatility (SV) models, might be considered for further improvements, as they are designed to handle clustering and excess kurtosis in volatility [5][6][7].

Figure 4

Figure 4 shows a QQ (Quantile-Quantile) plot of the residuals from the HAR-RV model, which is used to evaluate whether the residuals (errors) follow a normal distribution. The plot compares the sample quantiles (the quantiles of the model residuals) with the theoretical quantiles (the expected quantiles if the residuals were normally distributed). The red line represents the line where the residuals would lie if they followed a perfect normal distribution.

In an ideal model, the residuals should closely follow the red line, indicating that the errors are normally distributed and that the model's assumptions hold. However, as the plot shows, the residuals deviate significantly from the red line, particularly in the tails. This deviation suggests that the residuals are not normally distributed, especially in the extremes, where the residuals are either much larger or much smaller than expected under a normal distribution.

This departure from normality is common in financial time series data, where extreme events (such as financial crises) lead to heavy tails or skewness in the residuals. This observation aligns with the earlier findings from the Jarque-Bera test and kurtosis, which showed significant skewness and kurtosis in the residuals. Such heavy tails imply that the model does not fully capture the extreme movements in volatility.

While the log-transformed HAR-RV model captures a significant portion of the medium- and long-term trends in realized volatility, the model diagnostics suggest that there are additional complexities in the volatility process that may require more sophisticated modeling techniques. The presence of skewness and kurtosis in the residuals is typical in financial time series, and this finding reinforces the importance of considering advanced models for capturing tail events and volatility clustering. 

In conclusion, the current model provides a solid foundation for understanding how daily, weekly, and monthly volatilities impact future volatility, particularly in normal market conditions. Also, the QQ plot confirms that the residuals of the HAR-RV model are not normally distributed, especially in the tails. This suggests that while the model captures the general trends in volatility, it struggles to account for extreme events, and more sophisticated models (such as GARCH or models that incorporate fat tails) may be necessary to better capture these deviations [8][9][10].

Rolling Window Analysis

This section will introduce the rolling window regression analysis, which shows how the model's coefficients evolve over time. You can explain how rolling window regressions help to understand the time-varying relationships between the volatility components and future volatility. This can be especially useful for understanding how volatility behaves during different market conditions (e.g., crises vs. stable periods).

The rolling window results are presented in Figure 4. The graph shows the evolution of the coefficients for the constant term, daily volatility (`\log(RV_1)`), weekly volatility (`\log(RV_5)`) and monthly volatility (`\log(RV_22)`) over the analysis period from 1990 to 2023.

Figure 4

The coefficient for daily volatility (`\log(RV_1)`) remains relatively stable around zero throughout most of the analysis period, suggesting that daily volatility typically has a minimal influence on future volatility compared to weekly and monthly volatilities. However, during periods of extreme market stress, such as the Global Financial Crisis (2007–2008) and the COVID-19 pandemic (2020), the daily volatility coefficient becomes more volatile, indicating that market participants are more reactive to short-term fluctuations during periods of heightened uncertainty.

The coefficient for weekly volatility (`\log(RV_5)`) is consistently positive over time, reflecting its sustained influence on future volatility. Weekly volatility acts as a strong predictor of future market uncertainty, especially during periods of relatively stable markets. This is because weekly trends are more likely to capture intermediate-term market movements, such as reactions to economic data releases or corporate earnings reports. The relatively consistent positive coefficient demonstrates that weekly volatility contributes significantly to forecasting future volatility across both normal and turbulent market conditions.

The monthly volatility coefficient (`\log(RV_22)`) exhibits the most significant and consistent positive values across the analysis period, emphasizing its dominant role in predicting future volatility. The strong influence of monthly volatility is particularly evident during periods of prolonged market uncertainty, such as the Dot-com Bubble (1999–2000) and the Global Financial Crisis. The relatively high and stable positive coefficients indicate that long-term volatility trends, reflecting broader economic and market conditions, are key drivers of future volatility expectations.

The constant term shows considerable variation over time, particularly during crises, such as the Global Financial Crisis and the COVID-19 pandemic. During these periods, the constant term becomes increasingly negative, suggesting that external shocks not captured by the volatility components may be affecting the market. This variation implies that the baseline market volatility tends to drop significantly during crises, possibly because extreme volatility from daily or weekly fluctuations dominates the predictive framework.

Interpretation of Time-Varying Relationships

The rolling window analysis highlights the dynamic nature of volatility prediction and suggests that the relationship between volatility components and future market conditions is not static. Instead, it evolves with changing market dynamics. During crisis periods, such as the Global Financial Crisis and the COVID-19 pandemic, the model shows that daily volatility becomes a more significant predictor of future volatility. This implies that during periods of high uncertainty, market participants respond more strongly to short-term price fluctuations, as immediate market reactions tend to dominate.

In contrast, during normal market periods, the weekly and monthly volatility components are more stable and contribute more consistently to predicting future volatility. This indicates that, under more stable conditions, market participants take a broader view, focusing more on intermediate and long-term trends to inform their expectations. The dominance of monthly volatility throughout most of the period reinforces the idea that long-term market conditions, such as changes in macroeconomic policies or geopolitical events, play a crucial role in shaping future volatility. As such, traders and analysts should pay close attention to these longer-term trends when forecasting market risk.

Conclusion

This article applied the HAR-RV model to the S&P 500 index to forecast volatility over different time horizons—daily, weekly, and monthly—using data from 1990 to 2023. Through this process, we demonstrated how the HAR-RV model effectively captures the medium- and long-term trends in realized volatility, while also highlighting its limitations in handling extreme market events.

The results showed that monthly volatility (`\log(RV_22)`) is the most significant predictor of future volatility, as indicated by its large positive coefficient in the model. This reinforces the idea that long-term market trends have a stronger impact on future volatility than short-term fluctuations. Weekly volatility (`\log(RV_5)`) also contributed positively to forecasting future volatility, albeit to a lesser extent than monthly volatility. On the other hand, daily volatility (`\log(RV_1)`)  exhibited a mean-reverting behavior, suggesting that short-term volatility has a temporary influence that diminishes over time.

The rolling window analysis further illustrated that the importance of these volatility horizons is not static; instead, it evolves with changing market conditions. During periods of market turmoil, such as the Global Financial Crisis and the COVID-19 pandemic, the significance of daily volatility spikes, suggesting that market participants react more strongly to short-term movements in uncertain times. Conversely, in more stable market periods, long-term volatility trends dominate the predictive landscape.

Despite the model's ability to capture broad volatility patterns, its residual analysis revealed key limitations. Both the QQ plot of residuals and the Jarque-Bera test indicated that the model's residuals are not normally distributed, particularly in the tails. This lack of normality, as evidenced by heavy tails and skewness, reflects the model's difficulty in accounting for extreme market events, which often result in sharp volatility spikes that the HAR-RV model underpredicts. This is also evident in the actual vs. predicted volatility comparison, where the model performs well during normal periods but struggles during high-volatility periods, such as financial crises.

For market participants, understanding volatility is crucial for risk management, asset pricing, and portfolio optimization. This paper underscores the importance of considering multiple volatility horizons when forecasting future volatility. Long-term trends provide the most reliable guidance, but short-term fluctuations become increasingly relevant during times of market uncertainty. Practitioners should be aware of the model’s strengths in stable periods but also recognize its limitations in extreme environments.

References

  1. Corsi, F. (2009). "A Simple Approximate Long-Memory Model of Realized Volatility."
  2. Engle, R. F. (1982). "Autoregressive Conditional Heteroscedasticity with Estimates of the Variance of UK Inflation."
  3. Bollerslev, T. (1986). "Generalized Autoregressive Conditional Heteroskedasticity."
  4. Bollerslev, T., Engle, R. F., & Nelson, D. B. (1994). "ARCH Models."
  5. Diebold, F. X., & Nerlove, M. (1989). "The Dynamics of Exchange Rate Volatility: A Multivariate Latent Factor ARCH Model."
  6. Andersen, T. G., Bollerslev, T., Diebold, F. X., & Labys, P. (2003). "Modeling and Forecasting Realized Volatility."
  7. Taylor, S. J. (1986). "Modeling Financial Time Series."
  8. Hansen, P. R., & Lunde, A. (2005). "A Forecast Comparison of Volatility Models: Does Anything Beat a GARCH(1,1)?"
  9. Nelson, D. B. (1991). "Conditional Heteroskedasticity in Asset Returns: A New Approach."
  10. Hamilton, J. D. (1989). "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle."

Thank you for visiting my blog! I am Stefanos Stavrianos, a PhD Candidate in Computational Finance at the University of Patras. I hold an Integrated Master’s degree in Agricultural Economics from the Agricultural University of Athens and have specializations in Quantitative Finance from the National Research University of Moscow, Python 3 Programming from the University of Michigan, and Econometrics from Queen Mary University of London. My academic interests encompass economic theory, quantitative finance, risk management, data analysis and econometrics.

Post a Comment