Category: Uncategorized

Bond Returns 1973-2025
1. Introduction and data description
2. Relation between price returns and wealth process
3. Fitting linear regression
4. Conclusion
1. Introduction and data description. Continue the research after updating the data for 2025. In the previous post, we discussed total returns for S&P 500 in detail. Now we discuss bond returns. We take the same data as above, and for bond wealth process $B$ , we take FRED series BAMLCC0A0CMTRIV, last trading day of year 1972-2025. We start from $B(0) = 100$ for the last trading day of 1972, and go on from there. This way we can compute log price returns for year $t.$

2. Relation between price returns and wealth process. We derive log price returns from this wealth process using bond math. Then we run linear regression of this log price returns versus rate change. This regression coefficient is called the duration. In continuous setting, this is defined as minus derivative of the log price with respect to the interest rate. More precisely, we use yield to maturity as this interest rate.

Wealth at end of year $t-1$ is $B(t-1)$ and at end of year $t$ is $B(t).$ Assuming the price of this bond at end of year $t-1$ is $P(t-1) = B(t-1).$ Then the coupon paid during year $t$ is $B(t-1)R(t-1).$ The wealth at end of year $t$ is the sum of this coupon and the price $P(t)$ at end of year $t.$ Combining this, we get: $P(t) + B(t-1)R(t-1) = B(t).$

We have $P(t)/P(t-1) = B(t)/B(t-1) - R(t-1).$ This gives us the formula for log price returns:

$Q(t) = \ln(P(t)/P(t-1)) = \ln(B(t)/B(t-1) - R(t-1)).$

Since $B(t-1), B(t), R(t-1)$ are observed, we can compute $Q(t).$ If we fit this model, we can rewrite

$\ln(B(t)/B(t-1)) = \ln(\exp(Q(t)) + R(t-1)).$

3. Fitting linear regression. As mentioned above, the log price returns $Q(t)$ can be regressed upon $-(R(t) - R(t-1)).$ We do not include an intercept into this regression, since this has the meaning of a discrete approximation of a derivative. The regression coefficient is 6.1053, and the residual analysis is done by the plots below. These are Gaussian but not quite independent identically distributed. This is confirmed by the p-values of Shapiro-Wilk and Jarque-Bera normality tests: $92\%$ and $75\%.$ And by the Ljung-Box test for 5 lags for original and absolute values of residuals, which give us $2.8\%$ and $4.3\%.$

To make residuals independent identically distributed, we divide each term by annual volatility $V(t).$ This is equivalent to assuming that the residuals are heteroscedastic: These are white noise multiplied by $V(t).$ This gives us 5.96 regression coefficient, and $p = 86\%$ for Shapiro-Wilk normality test, $p =91\%$ for Jarque-Bera normality test. Also, the Ljung-Box test for the first 5 lags of original residuals has $p = 12\%$ and for absolute residuals has $p = 57\%.$ Finally, see the graphs for residuals below.

4. Conclusion. We have $\ln(B(t)/B(t-1) - R(t-1)) = -d(R(t) - R(t-1)) + Z(t)V(t)$ where $Z$ are independent identically distributed Gaussian. We succeeded in modeling the bond market! Duration in both cases is around $d = 6.$ What is more, this is better than previous models:
- We clearly explain the meaning of duration as dependence measure of the price upon yield to maturity, and not rely on approximate formulas, such as in this blog post
- We used volatility in our model, but stock volatility used for bond returns is highly unusual
- Residuals are independent identically distributed and Gaussian
Together with the two previous blog posts, we can now create a complete model of:
- BAA rates
- Dividends
- The volatility
- The valuation measure
- Total stock returns
- Total bond returns
These are 6 time series with 5 innovation sequences. Thus we can use it to create a simpler version of the financial simulator.
February 2, 2026
New Valuation Measure based on Dividends
1. Motivation of the new valuation measure
2. Fit autoregression with linear trend as before
3. Use this valuation measure for modeling returns
4. Include bond rates and duration
5. Conclusion
1. Motivation of the new valuation measure. We continue the previous blog post. We replicate the valuation measure here. We use updated data for 2025. Previously we did this with 10-year earnings but now we wish to do this with 1-year dividends.

We prefer dividends to earnings for the following reasons:
- Dividends are the actual cash paid, and they are not disputable, but earnings depend on accounting standards
- Dividends are more predictable, since companies do not like to cut them, but earnings are highly volatile
- Earnings of companies can be negative, and thus suffer from the aggregation bias, but dividends are nonnegative
2. Fit autoregression with linear trend as before: Take the index level $S(t)$ at end of year $t$ and dividends $D(t)$ paid at year $t.$ Total returns and dividend growth are given by $Q(t) = \ln(S(t)+D(t)) - \ln (S(t-1))$ and $G(t) = \ln D(t) - \ln D(t-1).$

We model the cumulative difference $C(t) = Q(1) + \ldots + Q(t) - G(1) - \ldots - G(t)$ as a simple autoregression of order 1 with trend: $C(t) - ct = a + b(C(t-1) - c(t-1)) + Z(t)$ where $Z$ are innovations. The valuation measure then is defined as $H(t) = C(t) - ct.$

This can be written as $Q(t) - G(t) = \alpha + \beta t - \gamma C(t-1) + Z(t).$ We fit $\alpha = 0.0436, \beta = 0.0121, \gamma = 0.2443.$ The autoregression becomes the random walk (there is no mean-reversion) if $\gamma = 0$ but this hypothesis has $p = 0.045\%$ which is very low. Next, the trend coefficient is zero if $\beta = 0$ which has $p = 0.06\%.$

From here, we can deduce $a, b, c,$ and compute the valuation measure $H(t) = C(t) - ct.$ The measure, as before, shows us that the market is not overvalued, since it is average compared to the historical standard.

Analysis of residuals: See the autocorrelation function plots for $Z$ and for $|Z|$ as well as the quantile-quantile plot for $Z.$ The Shapiro-Wilk and the Jarque-Bera test give us $p = 29\%$ and $25\%.$

We can approximately assume that residuals are independent identically distributed Gaussian, although the autocorrelation function for lag 1 for the absolute values of innovations raises questions.

3. Use this valuation measure for modeling returns. We can model total stock returns $Q(t)$ with dividends.

Model 1. Since we know how to model dividend growth from the previous blog post, together with annual volatility, we can simply model stock returns using three time series:
- the new valuation measure $H$ as autoregression
- volatility $V$ as another autoregression on the log scale
- normalized dividend growth $F(t) = \ln(D(t)/D(t-1))/V(t)$ as yet another autoregression
Model 2. However, we can also regress $Q(t)$ upon $H(t-1)$ as follows:

$Q(t) = h - kH(t-1) + W(t).$

We get $h = 0.0933, k = 0.131.$ Also the p-value for hypothesis $k = 0$ is $p = 3.6\%.$ The plots for residuals $W$ are below. This is independent identically distributed but not normal. Same is confirmed by the two normality tests, which give us extremely low p-values.

This model uses four time series, but with only three series of innovations:
- returns $Q$ regressed upon last year’s new valuation measure $H(t-1)$
- the new valuation measure $H$ as the detrended difference of total returns and dividend growth
- volatility $V$ as another autoregression on the log scale
- normalized dividend growth $\ln(D(t)/D(t-1))/V(t)$ as yet another autoregression
The second time series is without new innovations: Indeed, we simply write $H(t) = Q(t) - G(t) - c + H(t-1)$ from the definition of the new valuation measure; and this does not have any new innovations. We modeled $Q$ and $G$ separately.

Model 3. Let us modify Model 2 to include division by volatility: We divide by $V$ both returns $Q$ and the right-hand side.

$Q(t)/V(t) = l/V(t) + m - kH(t-1)/V(t) + W(t).$

We get $l = 0.2468, m = -0.0147, k = 0.1576.$ The p-values are all $0.1\%$ or less. The normality tests for innovations $W$ show p-values above 90% and this is confirmed by the plot below. The values of $W$ can be modeled as independent identically distributed Gaussian, therefore; see the three plots below.

This model also uses four time series but with three series of innovations, as in Model 2.

4. Include bond rates and duration. Following the previous blog post, we include rate change $R(t) - R(t-1)$ in our time series models. Here $R(t)$ is the BAA rate, December daily average for year $t.$

Model 1. Try to include this rate change as a factor in dividend growth model $F.$ The two other time series: the valuation measure $H$ and the volatility $V$ do not need rate change as the factor. We get:

$F(t) - F(t-1) = a - bF(t-1) - c(R(t) - R(t-1)) + U(t).$

But we run into problems: The coefficient $c$ is not significantly different from zero, with $p = 70\%$ and the autocorrelation function and quantile-quantile plots for residuals $U$ shows this is not independent identically distributed and not Gaussian, see below.

Similar results are if $R(t) - R(t-1)$ is divided by $V(t).$ Thus we abandon this idea of including duration (dependence upon rate change) in normalized log dividend growth.

Finally, try to include $R(t-1)$ instead of $R(t) - R(t-1).$ This means using rate itself instead of rate change as a factor. Or normalize this rate by volatility: $R(t-1)/V(t).$ In each case, still we have these plots as above for regression residuals.

Conclusion: We failed to model normalized dividend growth using rate or rate change for BAA bonds.

Model 2. Include duration in the regression for total returns, together with the valuation measure:

$Q(t) = k - hH(t-1) - d(R(t) - R(t-1)) + W(t).$

We get $k = 0.0945, h = 0.0960, d = 0.0834$ with p-values 8.6% for valuation coefficient zero and less than 0.1% for intercept and duration. Also, the residuals are Gaussian, with Shapiro-Wilk and Jarque-Bera normality tests giving us $p = 16\%$ and $p =18\%.$ But not independent identically distributed. See the three graphs below.

Conclusion: We failed to include duration in total returns modeling without normalizing by volatility.

Model 3. Include duration in the regression for total returns, together with the valuation measure:

$Q(t)/V(t) = l/V(t) + m - hH(t-1)/V(t) - d(R(t) - R(t-1))/V(t) + W(t).$

We get a much better fit than without the duration or in Model 2: $l = 0.2296, m = -0.0129, h = 0.1303, d = 0.0553$ with p-values 0.4% for valuation coefficient zero and 0.1% or less for others. Also, the residuals are Gaussian, with Shapiro-Wilk and Jarque-Bera normality tests giving us $p = 50\%$ and $p =77\%.$ Finally, looking at autocorrelation function plots for $W$ and for $|W|$ we see that residuals are independent identically distributed Gaussian.

Conclusion: Here we succeeded in including the duration as a factor for regression modeling of total returns after normalizing.

5. Conclusion: We can reasonably model the new valuation measure using one-year dividends, not trailing ten- or five-year earnings, as in previous articles or blog posts. This might be better, since in previous models we used both dividends and earnings, but here we use only dividends. It is useful to include rate change as a factor in a regression for total returns, but only after normalizing, and not for normalized dividend growth. This updates our blog post. In the next post, we consider total corporate bond returns modeling using bond rates.
January 30, 2026
Updates for 2025
Dear readers, after a long break, I am back. I updated the annual volatility and other data for S&P 500 for the year 2025. The data are available here.
1. Data Updates
2. New Graphs
3. Total Returns
4. Volatility Autoregression
5. Price Returns
6. BAA Bond Rates
7. Dividend Growth
8. Conclusion
1. Data Updates. Annual volatility is computed as the empirical standard deviation of daily log changes multiplied by 1000 (for normalizing). The end-of-year price for S&P 500 in 2025 is also updated. We also add S&P 500 dividends for 2025. Now we have data on volatility for 1928-2025, on dividends for 1927-2025, and end-of-year level of S&P 500 for 1927-2025 too.

We added the dividend data for 1927 as well, to increase the number of data points. This is fine, since S&P 90 (a predecessor for S&P 500) was created in 1926, and the data is taken from Robert Shiller’s data library.

The volatility for 2025 is 11.77. This is higher than the long-term average 10.51, or the 2024 volatility, which is 7.98. See the original post with computations of Angel Piotrowski for 1928-2023 and its previous update for 2024.

Dividends for 2025 are 78.92, which is significantly higher than dividends for 2024, which are 74.83.

The S&P 500 increased a lot in 2025: End-of-year 2024 level is 5881.63, but end-of-year 2025 level is 6845.5.

We could not yet provide earnings for 2025, since we have earnings for 2025 Quarter 4 reported only on 2026 Quarter 1, which is still ongoing. We will provide them as soon as we can.

Finally, we added the BAA rate: December 2025 daily average. The BAA are lowest-rated investment-grade corporate bonds. The rate in December 2025 is 5.9, slightly higher than 5.8 for December 2024.

2. New Graphs. We graph the index, dividend, rates, and volatility.

Above, logarithmic plots of index levels and dividends for 1927-2025. Below, the annual volatility and December BAA rate.

The data are published on my web page: We created a new tab named Financial Data Library on my web page. Let us now apply

Let us replicate this post: Make stock returns IID Gaussian.

We have the following notation:
- $S(t)$ the S&P level at end-of-year $t.$
- $D(t)$ the dividend of S&P in year $t.$
- $R(t)$ December daily average BAA rate during year $t.$
- $V(t)$ annual realized volatility for the S&P for year $t.$
3. Total Returns. We continue this blog post. Compute total nominal geometric returns for the S&P 500: $Q(t) = \ln(S(t) + D(t)) - \ln S(t-1)$ for year $t.$ Below is the graph of returns 1928-2025.

Now plot the autocorrelation function for these total returns $Q.$ And another autocorrelation function for their absolute values $|Q|.$ Both plots are below, and both are consistent with the white noise hypothesis. It is surprising that we, in fact, do not have to divide total returns by annual volatility to make it white noise.

The quantile-quantile plot of these returns is shown as well. We see that the returns are not Gaussian. This is consistent with the normality testing. Shapiro-Wilk and Jarque-Bera tests give us $p = 0.02\%$ and $p = 3\cdot 10^{-5}.$

What if we do divide these total returns by annual volatility? We get $N(t) = Q(t)/V(t).$ Let us plot the autocorrelation function for $N$ and the autocorrelation function for $|N|.$

These are still consistent with white noise, although, in my view, the autocorrelation function values are greater. But the quantile-quantile plot versus the normal distribution is below. We get $p = 12\%$ and $p = 71\%$ for Shapiro-Wilk and Jarque-Bera normality tests.

4. Volatility Autoregression. We continue this blog post. Let us now fit the auto-regression model for logarithm of volatility:

$\ln V(t) - ln V(t-1) = \alpha + \beta \ln V(t-1) + W(t).$

We fit $\beta = -0.3824$ and $\alpha = 0.8569.$ Also, plotting the autocorrelation function of $W$ and of $|W|$ we see:

This is consistent with the assumption that $W(t)$ are independent identically distributed. But it is more ambiguous to assume they are Gaussian, see the quantile-quantile plot below. The Shapiro-Wilk and Jarque-Bera tests give us $p = 1.1\%$ and $p = 7.5\%$ respectively.

5. Price Returns. These are computed as $Q(t) = \ln S(t) - \ln S(t-1).$ We continue this blog post. These contain only price changes, not dividends. The autocorrelation function for these values and their absolute values is plotted below.

Quite close to independent identically distributed! Next, the quantile-quantile plot versus the Gaussian distribution: This shows price returns are not Gaussian, similarly to total returns. This is confirmed by familiar Shapiro-Wilk and Jarque-Bera tests $p = 7\cdot 10^{-5}$ and $p = 1.4\cdot 10^{-6}.$

Let us divide price returns by volatility. Below we plot the autocorrelation function of $Q/V$ and of $|Q/V|$ and see that this is still consistent with being independent identically distributed. Only these values are slightly higher.

The Shapiro-Wilk and Jarque-Bera tests give us $p = 9\%$ and $p = 75\%$ respectively. See also the quantile-quantile plot. This is much closer to normal distribution.

Finally, let us plot price and total returns together. We see that, of course, total returns are greater than price returns.

6. BAA Bond Rates. Continue this blog. We also fit a simple autoregression:

$R(t) - R(t-1) = a + bR(t-1) + Z(t).$

We get: $a = 0.43$ and $b = -0.062.$ But the p-value for the null hypothesis when we have a random walk is $p = 8\%$ so we fail to reject the random walk hypothesis. This is not acceptable from the financial point of view, since the random walk implies $R$ will go negative eventually. Also, consider the graphs of autocorrelation function for $Z$ and for $|Z|.$ These are not independent identically distributed.

Both p-values for normality tests of innovations $Z$ are less than 0.01%. The quantile-quantile plot is shown below for $Z.$ It is clear these are not Gaussian.

Instead, like for the volatility, let us take the logarithm:

$\ln R(t) - \ln R(t-1) = a + b\ln R(t-1) + Z(t).$

We get $a = 0.11$ and $b = -0.058$ with $p = 9.2\%$ for the null hypothesis of random walk, which corresponds to $b = 0.$ Plot the autocorrelation function for $Z$ and for $|Z|$ below:

Let us modify this to try a random walk model: $L(t) = \ln R(t) - \ln R(t-1)$ are they really independent identically distributed Gaussian? Below are the autocorrelation function plots for $L$ and for $|L|$ which show that these are independent identically distributed.

Next, the quantile-quantile plot versus the normal distribution is much closer to the straight line than before for other models of the BAA rate. This is confirmed that the Shapiro-Wilk and Jarque-Bera tests give us $p = 0.4\%$ and $p = 10^{-5}$ which rejects the null hypothesis but are not as small as the previous ones.

Next, try to make these independent identically distributed but non-Gaussian terms Gaussian. We do the same as in sections 1 and 3: Divide the log rate change by volatility. We get $N(t) = L(t)/V(t) = \ln(R(t)/R(t-1))/V(t).$ Below are autocorrelation function plots for $N$ and for $|N|.$

The quantile-quantile plot below shows these are Gaussian terms, and the same is shown by the Shapiro-Wilk and Jarque-Bera tests with $p = 46\%$ and $p = 91\%.$ This was done in the spirit of this blog post.

7. Dividend Growth is computed as $G(t) = \ln D(t)/D(t-1).$ We continue this blog post. See below the autocorrelation function plots for $G$ and for $|G|$ which show lag 1, also the quantile-quantile plot.

Define $N(t) = G(t)/V(t)$ and analyze it as well. The data are closer to the Gaussian distribution, with $p =0.03\%$ and $p = 0.01\%.$

See the plot of the dividend growth below. It is quite volatile but not as much as the stock returns. But we clearly see the persistence: It makes sense to model dividend growth or its normalized version as the simple autoregression. This is different from annual earnings growth, where dividing by volatility makes it independent identically distributed, see this blog post.

Let us try the simple autoregression for normalized annual dividend growth $N(t) = \ln(D(t)/D(t-1))/V(t).$

$N(t) - N(t-1) = k + mN(t-1) + Y(t)$

We have $k = 0.0044$ and $m = -0.66$ with $p < 10^{-9}.$ Shapiro-Wilk and Jarque-Bera normality tests give us $p = 6\cdot 10^{-5}$ and $p = 1.7\cdot 10^{-7}.$ See the graphs below, autocorrelation for $Y$ autocorrelation for $|Y|$ and the quantile-quantile plot for $Y.$

Here we have independent identically distributed but not Gaussian residuals $Y.$

8. Conclusion. Here, we found all time series Markov models for dividends, price and total returns, volatility, and the BAA rates. In the next post, we will discuss updates for the valuation measure based on one-year dividends instead of trailing 10-year earnings, and regression modeling using rate change and duration, continuing this post and this post.
January 28, 2026
Advanced Version
In the repository https://github.com/asarantsev/advanced we added another page with advanced version of the simulator, where we can pick initial conditions for model factors:
- S&P 500 Annual Volatility (VIX)
- S&P 500 Bubble Valuation Measure
- Moody’s BAA Bond Rate
- Treasury 10Y – 3M Long-Short Bond Spread
We repainted the button Compute in orange. For the advanced option, we made the legend font and the web page font smaller. And we updated the default initial conditions for the main version of the simulator to make it for June 2025.

See the historical graphs of these four measures below. We can see their historical range. We included them in the HTML page for the advanced version of the simulator.

Below we see the autocorrelation function plots for original and absolute values, and the quantile-quantile plot versus the normal distribution, of residuals for each of the seven regressions. First, let us consider four factors: volatility, BAA rate, and spread (autoregressions for logarithms) and earnings growth divided by volatility (needed to compute the evolution of the bubble measure; here no regression).

Then consider regression residuals for total returns of three asset classes: S&P 500, International Stocks, USA Corporate Bonds.
July 10, 2025
Autoregression Simplification

I am back! Now let me reduce the complexity of this model to model factors: log volatility, log BAA rate, and long-short Treasury spread, only using one-dimensional simple autoregressions, possibly with non-Gaussian independent identically distributed innovations. Different series of innovations might correlate.

We have the following differences from the previous setting:

logarithm of the BAA rate is modeled as $X(t) = 0.643694 + 0.539518 \cdot X(t-1)$

and the spread is modeled as $X(t) = 0.107208 + 0.942062 \cdot X(t-1)$

Analysis of residuals for these is given below. You can see that residuals (innovations) are IID and close to normal.

We shall not present here all residuals analysis, instead referring the reader to the new GitHub repository, but results are almost the same as before. If I have time, I will write a detailed description of analysis of residuals. Next, we need to move to an advanced version of the simulator, which allows you to choose factor values.

July 9, 2025
Simulator Final Update

I updated the online simulator to include the bubble measure involving earnings growth, and the long-short spread. The previous version included only the BAA rate and stock volatility as factors.

I use a mix of developed and emerging markets in the portfolio of international stocks, in proportion 60/40, as in the previous version of the simulator. I use only total (not price) returns, and nominal (not real) returns. There are three asset classes: S&P 500; International stocks: 60% of MSCI EAFE (Europe-Australasia-Far East) and 40% of MSCI EM (emerging markets); and USA corporate bonds.

I modeled innovations using kernel density estimation. There are 8 regressions, but one of them (the bubble measure) is used to create said measure, not model the actual returns, so we use 7 series of residuals. I use the same algorithm discussed there to impute missing data.

For current market conditions, we use May 2025. I could write the API to take these from Yahoo Finance but I just manually put them. They need to be updated from time to time.

I put the backend Python simulation code, the original Excel file for financial data, the Python code for filling innovations series, the Excel file for innovations before and after imputation, and the HTML frontend files, to a separate GitHub repository.

This is my 50th post. And so far, the main work on this simulator has been finished, or so I hope. Now I would like to tell this as many people as possible. I am taking some break from this coding and web development. Enjoy the summer!

June 25, 2025
Emerging Stocks Included
From the same web site Novel Investor, we included total nominal annual returns of emerging markets stocks (MSCI EM index). They are available only from 1988, as opposed to developed markets from 1970. So I added emerging markets to the portfolio in the following way: Starting from 1988, we have 60% Developed and 40% Emerging portfolio of international stocks. I fitted the econometrics model using the new data, and rewrote the simulator.

The model fits well, judging by the innovations for the regression for international stock returns. However, simulator runs show that returns have considerably increased. This is due to historical data: Returns of emerging markets were much higher than of developed markets during 1988-2024, although some years were an exception.

Regression coefficients in the model for international stock returns $Q_2(t)$ have changed:

$Q_2(t) = a_2 - b_2V(t) - c_2(R(t) - R(t-1)) + \delta_2(t)$ with $a_2 = 30.328921$ and $b_2 = 1.818674$ and $c_2 = 6.297304.$

The updated code is available in the same GitHub/asarantsev repository simulator-current. It has version 10.

I also made the following changes, per discussion with a colleague:
- Removed the simplified version of the financial simulator, with pre-specified portfolios and withdrawals
- Changed from 5 percentiles 10%, 30%, 50%, 70%, 90% to 3 percentiles 20%, 50%, 80%
- Included average ruin time over all paths in ruin, and the ruin time for each chosen percentile path in ruin
June 25, 2025
Modeling Innovations
Contents: Problems; Kernel Density Estimation; Missing Data

Problems. We have non-normality of innovations. Even when they are independent identically distributed, we cannot guarantee their normality. What is more, different series of innovations have different lengths. For example, in our current version of the simulator, we have five series of innovations. They are available in the Excel file innovations.xlsx in GitHub repository asarantsev/simulator-current
- Autoregression for log volatility: 96 data points
- S&P stock returns: 97 data points
- International stock returns: 55 data points
- Autoregression for log rate: 97 data points
- Corporate bond returns: 52 data points
In the previous version of the simulator, I simply used the multivariate normal distribution. We can compute the empirical covariance matrix by ignoring the missing data. And the means are, of course, zero. But, as mentioned above, the distribution is not normal in fact. To be more precise, the first and fourth components are not normal. In particular, their kurtosis is greater than that of the normal distribution, and the skewness is nonzero.

I tried to apply other distributions to fit each of these two components: skew-normal and variance-gamma. But I failed to fit them well. What is more, fitting univariate distribution is not enough. I need to combine them with normal marginals for the other three components. I did not find relevant exact literature. This would require developing entire new theory of multivariate distributions.

Kernel Density Estimation. This is a universal nonparametric method: (KDE). We apply Gaussian kernel: For $X_1, \ldots, X_N \in \mathbb R^d$ we have the density

$f(x) = \frac1N\sum\limits_{k=1}^N\varphi(x; X_N, \Sigma)$

where $\varphi(x; \mu, \sigma)$ is the probability density function on $\mathbb R^N$ for $\mathcal N_d(\mu, \Sigma)$ which is the $d$ dimensional Gaussian distribution with mean vector $\mu$ and covariance matrix $\Sigma.$ In other words, to simulate a random variable $Y$ with such density, we pick $U$ at random (uniformly) from $X_1, \ldots, X_N$ and simulate the additional noise $Z \sim \mathcal N_d(0, \Sigma)$ independent of $U.$ Thus $Y = U + Z.$

For $\Sigma$ we apply Silverman’s rule of thumb: This is a diagonal matrix with

$\Sigma_{ii} = \left(\frac{4}{d+2}\right)^{1/(d+4)}\cdot N^{-1/(d+4)}\cdot \min(\sigma_i, \rho_i/1.34)$

Here, $\sigma_i, \rho_i$ are statistics of the $i$ th component of the data $X_1, \ldots, X_n.$ Namely, $\sigma_i$ is the empirical standard deviation, and $\rho_i$ is the empirical inter quartile range: 75% quantile minus 25% quantile. This is realized in the file simKDE.py from our main repository. We stress that this code simulates innovations, not computes the joint density function.

Missing Data: The other problem mentioned above is lack of data for some series of innovations. We considered many imputation methods, for example iterative imputer using Principal Component Analysis or k-nearest neighbor method. They are implemented in Python package sklearn. But such methods reduce variance, because imputed data reverts to the mean. I chose a custom designed approach. It comes in iterated steps. We describe one step below.

Step. Assume we have $d + 1$ series of $N$ independent identically distributed data points. Out of these, $d$ have full $N$ values, and the last one has missing $M$ values. We then regress this last series (of course, only existing $N - M$ values) versus the full $d$ series (of course, only the matching $N - M$ data points). We use ordinary least squares linear regression. Then we take residuals of this regression. We randomly choose with replacement $M$ of these residuals. And for each of $M$ missing points, we pick the predicted value by this regression using the first $M$ data series, and add this randomly chosen residuals. This is the way to fill the missing data. This completes the description of this step.

We first apply this step to one missing data point for series 1 (using series 2 and 4 as the backbone), then to $97-55 = 42$ missing data points for series 3 (using series 1, 2, 4 as the backbone), and then to $97-52 = 45$ missing data points for series 5. We write this new data frame into a separate Excel file called filled.xlsx. It is available in the same GitHub repository. The Python code is given in innovations.py in the same repository.
June 20, 2025

Bubble Measure and Long-Short Spread in the Simulator

Introduction. Here I talk about improvement of my financial simulator. So far, I have the following factors:

BAA corporate bond rate, average December
Annual volatility, S&P 500

I decided to include two more important factors:

A bubble measure, which is an improvement over Robert Shiller’s cyclically adjusted price-earnings ratio (CAPE), for which he got a Nobel Prize in Economics. We discussed it here and here and here.
The long-short spread between 10-year and 3-month (average December) Treasury rates, which is often quoted as an important indicator. For example, if it is negative (inverted yield curve), then a recession is looming. We discussed it here and here and here.

We will now use annual earnings of S&P for our research. This is important, since it connects stock returns with fundamentals. This is similar to computing bond returns using bond rates. Robert Shiller used this comparison for his work. And we use this comparison to make our valuation measure. But, of course, stocks are much more volatile than bonds. So it’s harder to model.

Since we covered so much in previous posts, here we will be brief. For the person who wants to know details, we refer to GitHub repository.

Results. 1. First, recall the autoregression of order 1 for annual volatility:

$\ln V(t) = \alpha + \beta \ln V(t-1) + W(t).$

Recall that $\alpha = 0.847850, \beta = 0.620146.$

2-3. Then, modify the autoregression for log BAA rate $R(t)$ to include spreads: $S(t).$ This is vector autoregression of order 1:

$\begin{bmatrix} \ln R(t) \\ S(t) \end{bmatrix}$

It does NOT include volatility. It has intercept and the slope matrix

$\begin{bmatrix} -0.263745 \\ 0.151941 \end{bmatrix}$

$\begin{bmatrix} 1-0.471137 & 0.502590 \\ -0.043837 & 1-0.049088 \end{bmatrix}$

4. Model annual earnings $E(t)$ by considering its growth: $G(t) = \ln E(t) - \ln E(t-1).$ We model this as a regression

$G(t) = a + bS(t-1) + k(R(t) - R(t-1)) + cV(t) + V(t)Z(t).$

As usual, we fit it after dividing by volatility. We have

$a = 7.756497, c = -0.784163, b = 4.786272, k = 3.720972.$

5. Next, we consider the bubble measure $B(t)$ computed as in the above blog posts, with 10-year averaging window, and without using volatility. We get for $\gamma = -0.02138$ and $\theta = 1 - 0.1901:$

$B(t) = \gamma + \theta B(t-1) + U(t).$

6. We fit the corporate bond returns (geometric) denoted by $Q_0(t)$ in the same way as above. Namely,

$100\cdot Q_0(t) - R(t-1) = h_0 -m_0(R(t) - R(t-1)) + \delta_0(t).$

This makes sense in the context of bond markets, even though we use geometric instead of arithmetic returns here. Note that this does not use volatility. Similar to the above post, we see: $h_0 = -1.661125,\, m_0 = 5.588353.$

7. Next, we fit the geometric Standard & Poor 500 returns $Q_1(t)$ by dividing it by volatility $V(t).$ We also do regression versus

$B(t-1), S(t-1), R(t) - R(t-1), V(t).$

$Q_1(t) = h_1 - m_1(R(t) - R(t-1)) - a_1B(t-1) - b_1S(t-1) + cV(t) + V(t)\delta_1(t).$

The quantity $m_1$ plays the role of the duration: Dependence upon the change in interest rates. This coefficient is negative because returns of stocks and bonds decrease when interest rates increase. Next, $a_1$ is the coefficient for the bubble measure. Of course, this is also negative, since being in a bubble implies low future returns. Same is true for the long-short spread, as discussed at the top of this post. Numerical values of coefficients are:

$h_1 = 26.851013, m_1 = 7.823798, c = -1.356810, a_1 = 16.439790, b_1 = 3.411985.$

8. Finally, for international stocks we do the same. Thus we write this regression as

$Q_2(t) = h_2 - m_2(R(t) - R(t-1)) - a_2B(t-1) + c_2V(t) + V(t)\delta_2(t).$

Note that $m_2$ is still the duration. Although $R(t)$ is BAA rate, which is the USA, but it influences the international stocks as well. Same for the bubble measure. Numerical values of coefficients are:

$h_2 = 25.607625, m_2 = 4.081440, c_2 = -1.919254, a_2 = 12.617889.$

Remark. In the regression for earnings growth, we tried $S(t) - S(t-1)$ instead of $S(t-1)$ , but the p-value for the Student test is too large. Also, we tried $R(t-1)$ instead of $R(t) - R(t-1)$ , but the p-value for the Student test is too large. We used covariates in linear regressions if $p < 20\%.$ Similarly, for the international stocks, we found that spread $S(t-1)$ is not statistically significant.

Innovations. Thus we have 8 (eight) innovation series. Only $\delta_i(t)$ for $i = 0, 1, 2,$ can be modeled as Gaussian. But all of them, judging by the autocorrelation function for $\delta_i$ and for $|\delta_i|$ and other tests, can be modeled as independent identically distributed random variables.

I do not have energy to present all these plots for each innovation series. But below is the table. Here, ACF is the L1 norm for the first 5 lags of the autocorrelation function values. Kurtosis is normalized so for normal distribution it is zero. Of course, the same is true for skewness.

Series	Length	Skewness	Kurtosis	Shapiro-Wilk p	Jarque-Bera p	ACF original values	ACF absolute values
Volatility	96	0.401	0.401	0.401	0.401	0.401	0.237
BAA rate	97	0.008	1.754	0.008	0.002	0.375	0.655
Long-short spread	97	1.058	3.382	0.000	0.000	0.455	0.468
Earnings growth	97	0.614	2.903	0.000	0.000	0.474	0.253
Bubble measure	97	-0.816	1.102	0.003	0.000	0.291	0.608
US corporate bond returns	52	0.193	0.238	0.857	0.800	0.878	0.706
US stock returns	97	0.039	0.157	0.344	0.940	0.413	0.590
International stock returns	55	-0.015	-0.202	0.941	0.953	0.527	0.331

Covariance matrix

$\begin{array}{lrrrrrrrr} \textsc{Series} & \text{vol} & \text{spread} & \text{baa-rate} & \text{earn-growth} & \text{bubble} & \text{usa-ret} & \text{intl-ret} & \text{bond-ret} \\ \text{vol} & 0.1342 & 0.0179 & 0.0134 & -0.1393 & -0.0273 & -0.0253 & -0.0316 & 0.0724 \\ \text{spread} & 0.0179 & 1.0797 & -0.0270 & -0.1194 & -0.0106 & -0.1961 & -0.2815 & -0.3577 \\ \text{baa-rate} & 0.0134 & -0.0270 & 0.0156 & -0.0398 & -0.0120 & -0.0151 & -0.0008 & -0.1121 \\ \text{earn-growth} & -0.1393 & -0.1194 & -0.0398 & 3.1196 & -0.0136 & 0.3457 & 0.4155 & -1.1972 \\ \text{bubble} & -0.0273 & -0.0106 & -0.0120 & -0.0136 & 0.0344 & 0.1699 & 0.0914 & 0.1133 \\ \text{usa-ret} & -0.0253 & -0.1961 & -0.0151 & 0.3457 & 0.1699 & 1.8470 & 0.7857 & 0.9502 \\ \text{intl-ret} & -0.0316 & -0.2815 & -0.0008 & 0.4155 & 0.0914 & 0.7857 & 2.9181 & -0.3106 \\ \text{bond-ret} & 0.0724 & -0.3577 & -0.1121 & -1.1972 & 0.1133 & 0.9502 & -0.3106 & 7.0280 \\ \end{array}$

Correlation matrix

$\begin{array}{lrrrrrrrr} \textsc{Series} & \text{vol} & \text{spread} & \text{baa-rate} & \text{earn-growth} & \text{bubble} & \text{usa-ret} & \text{intl-ret} & \text{bond-ret} \\ \text{vol} & 1.0000 & 0.0472 & 0.2919 & -0.1800 & -0.4234 & -0.0516 & -0.0493 & 0.0725 \\ \text{spread} & 0.0472 & 1.0000 & -0.2082 & -0.0544 & -0.0575 & -0.1388 & -0.1304 & -0.1043 \\ \text{baa-rate} & 0.2919 & -0.2082 & 1.0000 & -0.1510 & -0.5435 & -0.0890 & -0.0036 & -0.3101 \\ \text{earn-growth} & -0.1800 & -0.0544 & -0.1510 & 1.0000 & -0.1304 & 0.1204 & 0.1036 & -0.1878 \\ \text{bubble} & -0.4234 & -0.0575 & -0.5435 & -0.1304 & 1.0000 & 0.7038 & 0.3467 & 0.2714 \\ \text{usa-ret} & -0.0516 & -0.1388 & -0.0890 & 0.1204 & 0.7038 & 1.0000 & 0.3705 & 0.2817 \\ \text{intl-ret} & -0.0493 & -0.1304 & -0.0036 & 0.1036 & 0.3467 & 0.3705 & 1.0000 & -0.0701 \\ \text{bond-ret} & 0.0725 & -0.1043 & -0.3101 & -0.1878 & 0.2714 & 0.2817 & -0.0701 & 1.0000 \\ \end{array}$

See below the p-values for the Student T-test for null hypothesis which is zero correlation between series of innovations.

$\begin{bmatrix} 1 & 65 & 0 & 8 & 0 & 62 & 72 & 61 \\ 65 & 1 & 4 & 60 & 58 & 17 & 34 & 46 \\ 0 & 4 & 1 & 14 & 0 & 39 & 98 & 3 \\ 8 & 60 & 14 & 1 & 14 & 24 & 45 & 18 \\ 0 & 58 & 0 & 14 & 1 & 0 & 1 & 5 \\ 62 & 17 & 39 & 24 & 0 & 1 & 1 & 4 \\ 72 & 34 & 98 & 45 & 1 & 1 & 1 & 62 \\ 61 & 46 & 3 & 18 & 5 & 4 & 62 & 1 \\ \end{bmatrix}$

June 19, 2025

Simplified Simulator Version
The main simulator gives users a choice of portfolio: US stocks, international stocks, and bonds. Moreover, the main choice in the portfolio is the proportion of stocks and bonds. In the newest version, we allowed this proportion to vary and not to be fixed throughout these simulated years. For example, at the start of 30 years, the stock/bond split can be 80/20, and at the end, 50/50. This is done to make room for retirement planning. Usually, people choose to invest more in risky assets at the start of their savings journey, and to make it less risky when they become closer to retirement. Within retirement, it is wise to do the converse: Invest in bonds less and less as you progress through the retirement.

Some users might be confused by this variability. In addition to options for withdrawals/contributions, this can be challenging to navigate. In practice, most users care about only a few modes: saving before retirement and living in retirement. To this end, I created a simplified version with the following options:

Risk Tolerance: High (Assertive), Mid (Moderate), Low (Conservative)

Your Goal:
- Lump-Sum Investing: invest initial wealth, amount provided by user, and do not contribute annually
- Regular Savings: start with zero wealth, contribute annually a fixed nominal amount, provided by user, which grows 3% annually
- Retirement Spending: invest initial wealth, amount provided by user, and withdraw annually fixed nominal amount, initially it is 4% of the initial wealth, according to the celebrated 4% retirement rule, and grows 4% annually
Why choose 3% annual increase for annual contributions and 4% annual increase for annual withdrawals? Historically, inflation was running around 3% annually in 1928-2024. We consider only nominal (not inflation-adjusted) returns, because we could not model inflation-adjusted version of corporate bond returns. Thus we need some compensation for inflation.

Below, we provide the split between stocks and bonds: stocks/bonds. Stocks include both USA and international.

Lump-Sum Investing and Regular Savings:
- Conservative: 60/40 constant during simulation
- Moderate: 90/10 at the start, 60/40 at the end, linear during simulation
- Assertive: 90/10 constant during simulation
Retirement Spending:
- Conservative: 60/40 constant during simulation
- Moderate: 60/40 at the start, 90/10 at the end, linear during simulation
- Assertive: 90/10 constant during simulation
Other than that, in this simulator the user can choose the number of years, wealth (initial investment in case of lump-sum investing or retirement spending, or annual investment in case of regular savings), but not growth rate.
June 18, 2025