Market Models with Gaussian Innovations

Today, I will start a new series of posts with models having innovations which are not only independent identically distributed (IID) but Gaussian. It is clear why this makes them so easy to analyze and simulate.

Model building goals. We evaluate the model based on innovations.

  • Is each series of innovations a (weak) white noise, measured by the empirical autocorrelation function (ACF) and standard Ljung-Box white noise tests? If yes, this would mean autocorrelations are zero. But this does not yet mean the series is truly IID. Stochastic volatility models show why.
  • Is each series of innovations after taking their absolute values have autotocorrelations zero? We again apply the empirical ACF and the Ljung-Box white noise tests. If yes, and the first answer above is also yes, then it is reasonable to model this series as IID.
  • Is each series of innovations Gaussian? We can ask this question only if we answered affirmatively on the first two. This is answered by making a quantile-quantile plot versus the normal distribution and applying Jarque-Bera normality test.

As an attentive reader can see, the techniques are essentially the same as previously. But there are a couple of important differences.

  • First, we apply the Ljung-Box white noise tests based on the (weighted) sum of squares, not the customized sum of absolute values tests we considered previously. I think it is simply easier and better known to apply Ljung-Box tests. The test based on L1 norm did not really show anything special different from L2 tests.
  • Second, we do not apply the Shapiro-Wilk normality test. We consider it to be a bit of an overkill. Jarque-Bera test captures skewness and fat tails commonly present in financial analysis which prevent the data from being normal. And anyway, the Jarque-Bera test is present in the standard Python OLS regression output.

Let us stress what features we are not interested in.

  • Maximizing  R^2 of linear regressions
  • Information criteria (Akakie/Bayesian)
  • Minimizing  standard error of regressions.

Data description. We have annual data 1927-2025. It is available as a spreadsheet here. Take five data series:

  • Total returns invested in S&P 500 and its predecessor, S&P 90:  Q_1(t) during year  t available 1928-2025. End-of-year values were taken by Ian Anderson from Yahoo Finance, and dividend annual data is taken from Robert Shiller’s data library.
  • Total returns for international stocks (see remark below):  Q_2(t) available 1970-2025.
  • Total return index value  B(t) for the USA corporate bonds (measured by Bank of America Intercontinental Exchange total return index value, taken from Federal Reserve Economic Data web site), available 1972-2025.
  • Annual realized volatility  V(t) for year  t. Recall that this was computed by Angel Piotrowski for 1928-2025.
  • December daily average BAA Moody’s rate:  R(t) available for 1927-2025.

A note on international stocks. The total returns of international stocks are now measured by a customized portfolio of 88% MSCI EAFE and 12% MSCI Canada. I did this adjustment because the data for MSCI EAFE (=developed markets including Europe, Australia and Far East) was available from 1970 on the web site Novel Investor, as opposed to MSCI emerging markets (available only from 1988), but this EAFE index did not include Canada! I thought this is very unfair, since Canada is a major component (~12%) of existing developed market stock ETFs. So I decided to include it manually. The data for Canada was also available as another MSCI index from 1970.

Model equations. Try modeling  Q_k(t)/V(t) = c_k + Z_k(t) where  Z_k are innovation series. This is in line with our long-standing idea of dividing stock returns by volatility to make them closer to IID Gaussian. It works perfectly well here.

Also,  (\ln R(t) - \ln R(t-1))/V(t) = Z_R(t) Note that this makes the rates non-stationary: More like a geometric random walk, except we have stochastic volatility here. This is one more remarkable example of how to use stock volatility for bonds, which we discussed earlier.

Next, the equation for volatility stays the same as in Angel Piotrowski’s analysis:  \ln V(t) = \alpha + \beta \ln V_k(t-1) + Z_V(t).

Finally, the equation for bond returns is as folows:  \ln(B(t)/B(t-1) - 0.01R(t-1)) = -a - d(R(t) - R(t-1)) + V(t)Z_0(t).

Each  Z is IID Gaussian series with mean zero. This is confirmed by the tests and graphs above.

Properties of this model. As mentioned above, rates and therefore bond returns are non-stationary. But stock returns and volatility are stationary. Also, stock returns do not have duration in their modeling. In fact, if we include the same difference term, it would be highly significant, with extremely high T-value. But we decided to create the simplest model.

Published by


Leave a comment