In the repository https://github.com/asarantsev/advanced we added another page with advanced version of the simulator, where we can pick initial conditions for model factors:
S&P 500 Annual Volatility (VIX)
S&P 500 Bubble Valuation Measure
Moody’s BAA Bond Rate
Treasury 10Y – 3M Long-Short Bond Spread
We repainted the button Compute in orange. For the advanced option, we made the legend font and the web page font smaller. And we updated the default initial conditions for the main version of the simulator to make it for June 2025.
See the historical graphs of these four measures below. We can see their historical range. We included them in the HTML page for the advanced version of the simulator.
Below we see the autocorrelation function plots for original and absolute values, and the quantile-quantile plot versus the normal distribution, of residuals for each of the seven regressions. First, let us consider four factors: volatility, BAA rate, and spread (autoregressions for logarithms) and earnings growth divided by volatility (needed to compute the evolution of the bubble measure; here no regression).
Then consider regression residuals for total returns of three asset classes: S&P 500, International Stocks, USA Corporate Bonds.
I am back! Now let me reduce the complexity of this model to model factors: log volatility, log BAA rate, and long-short Treasury spread, only using one-dimensional simple autoregressions, possibly with non-Gaussian independent identically distributed innovations. Different series of innovations might correlate.
Analysis of residuals for these is given below. You can see that residuals (innovations) are IID and close to normal.
We shall not present here all residuals analysis, instead referring the reader to the new GitHub repository, but results are almost the same as before. If I have time, I will write a detailed description of analysis of residuals. Next, we need to move to an advanced version of the simulator, which allows you to choose factor values.
I use a mix of developed and emerging markets in the portfolio of international stocks, in proportion 60/40, as in the previous version of the simulator. I use only total (not price) returns, and nominal (not real) returns. There are three asset classes: S&P 500; International stocks: 60% of MSCI EAFE (Europe-Australasia-Far East) and 40% of MSCI EM (emerging markets); and USA corporate bonds.
I modeled innovations using kernel density estimation. There are 8 regressions, but one of them (the bubble measure) is used to create said measure, not model the actual returns, so we use 7 series of residuals. I use the same algorithm discussed there to impute missing data.
For current market conditions, we use May 2025. I could write the API to take these from Yahoo Finance but I just manually put them. They need to be updated from time to time.
I put the backend Python simulation code, the original Excel file for financial data, the Python code for filling innovations series, the Excel file for innovations before and after imputation, and the HTML frontend files, to a separate GitHub repository.
This is my 50th post. And so far, the main work on this simulator has been finished, or so I hope. Now I would like to tell this as many people as possible. I am taking some break from this coding and web development. Enjoy the summer!
From the same web site Novel Investor, we included total nominal annual returns of emerging markets stocks (MSCI EM index). They are available only from 1988, as opposed to developed markets from 1970. So I added emerging markets to the portfolio in the following way: Starting from 1988, we have 60% Developed and 40% Emerging portfolio of international stocks. I fitted the econometrics model using the new data, and rewrote the simulator.
The model fits well, judging by the innovations for the regression for international stock returns. However, simulator runs show that returns have considerably increased. This is due to historical data: Returns of emerging markets were much higher than of developed markets during 1988-2024, although some years were an exception.
Regression coefficients in the model for international stock returns have changed:
Contents: Problems; Kernel Density Estimation; Missing Data
Problems. We have non-normality of innovations. Even when they are independent identically distributed, we cannot guarantee their normality. What is more, different series of innovations have different lengths. For example, in our current version of the simulator, we have five series of innovations. They are available in the Excel file innovations.xlsx in GitHub repository asarantsev/simulator-current
Autoregression for log volatility: 96 data points
S&P stock returns: 97 data points
International stock returns: 55 data points
Autoregression for log rate: 97 data points
Corporate bond returns: 52 data points
In the previous version of the simulator, I simply used the multivariate normal distribution. We can compute the empirical covariance matrix by ignoring the missing data. And the means are, of course, zero. But, as mentioned above, the distribution is not normal in fact. To be more precise, the first and fourth components are not normal. In particular, their kurtosis is greater than that of the normal distribution, and the skewness is nonzero.
I tried to apply other distributions to fit each of these two components: skew-normal and variance-gamma. But I failed to fit them well. What is more, fitting univariate distribution is not enough. I need to combine them with normal marginals for the other three components. I did not find relevant exact literature. This would require developing entire new theory of multivariate distributions.
Kernel Density Estimation. This is a universal nonparametric method: (KDE). We apply Gaussian kernel: For we have the density
where is the probability density function on for which is the dimensional Gaussian distribution with mean vector and covariance matrix In other words, to simulate a random variable with such density, we pick at random (uniformly) from and simulate the additional noise independent of Thus
For we apply Silverman’s rule of thumb: This is a diagonal matrix with
Here, are statistics of the th component of the data Namely, is the empirical standard deviation, and is the empirical inter quartile range: 75% quantile minus 25% quantile. This is realized in the file simKDE.py from our main repository. We stress that this code simulates innovations, not computes the joint density function.
Missing Data: The other problem mentioned above is lack of data for some series of innovations. We considered many imputation methods, for example iterative imputer using Principal Component Analysis or k-nearest neighbor method. They are implemented in Python package sklearn. But such methods reduce variance, because imputed data reverts to the mean. I chose a custom designed approach. It comes in iterated steps. We describe one step below.
Step. Assume we have series of independent identically distributed data points. Out of these, have full values, and the last one has missing values. We then regress this last series (of course, only existing values) versus the full series (of course, only the matching data points). We use ordinary least squares linear regression. Then we take residuals of this regression. We randomly choose with replacement of these residuals. And for each of missing points, we pick the predicted value by this regression using the first data series, and add this randomly chosen residuals. This is the way to fill the missing data. This completes the description of this step.
We first apply this step to one missing data point for series 1 (using series 2 and 4 as the backbone), then to missing data points for series 3 (using series 1, 2, 4 as the backbone), and then to missing data points for series 5. We write this new data frame into a separate Excel file called filled.xlsx. It is available in the same GitHub repository. The Python code is given in innovations.py in the same repository.
Introduction. Here I talk about improvement of my financial simulator. So far, I have the following factors:
BAA corporate bond rate, average December
Annual volatility, S&P 500
I decided to include two more important factors:
A bubble measure, which is an improvement over Robert Shiller’s cyclically adjusted price-earnings ratio (CAPE), for which he got a Nobel Prize in Economics. We discussed it here and here and here.
The long-short spread between 10-year and 3-month (average December) Treasury rates, which is often quoted as an important indicator. For example, if it is negative (inverted yield curve), then a recession is looming. We discussed it here and here and here.
We will now use annual earnings of S&P for our research. This is important, since it connects stock returns with fundamentals. This is similar to computing bond returns using bond rates. Robert Shiller used this comparison for his work. And we use this comparison to make our valuation measure. But, of course, stocks are much more volatile than bonds. So it’s harder to model.
Since we covered so much in previous posts, here we will be brief. For the person who wants to know details, we refer to GitHub repository.
Results. 1. First, recall the autoregression of order 1 for annual volatility:
Recall that
2-3. Then, modify the autoregression for log BAA rate to include spreads: This is vector autoregression of order 1:
It does NOT include volatility. It has intercept and the slope matrix
4. Model annual earnings by considering its growth: We model this as a regression
As usual, we fit it after dividing by volatility. We have
5. Next, we consider the bubble measure computed as in the above blog posts, with 10-year averaging window, and without using volatility. We get for and
This makes sense in the context of bond markets, even though we use geometric instead of arithmetic returns here. Note that this does not use volatility. Similar to the above post, we see:
7. Next, we fit the geometric Standard & Poor 500 returns by dividing it by volatility We also do regression versus
The quantity plays the role of the duration: Dependence upon the change in interest rates. This coefficient is negative because returns of stocks and bonds decrease when interest rates increase. Next, is the coefficient for the bubble measure. Of course, this is also negative, since being in a bubble implies low future returns. Same is true for the long-short spread, as discussed at the top of this post. Numerical values of coefficients are:
8. Finally, for international stocks we do the same. Thus we write this regression as
Note that is still the duration. Although is BAA rate, which is the USA, but it influences the international stocks as well. Same for the bubble measure. Numerical values of coefficients are:
Remark. In the regression for earnings growth, we tried instead of , but the p-value for the Student test is too large. Also, we tried instead of , but the p-value for the Student test is too large. We used covariates in linear regressions if Similarly, for the international stocks, we found that spread is not statistically significant.
Innovations. Thus we have 8 (eight) innovation series. Only for can be modeled as Gaussian. But all of them, judging by the autocorrelation function for and for and other tests, can be modeled as independent identically distributed random variables.
I do not have energy to present all these plots for each innovation series. But below is the table. Here, ACF is the L1 norm for the first 5 lags of the autocorrelation function values. Kurtosis is normalized so for normal distribution it is zero. Of course, the same is true for skewness.
Series
Length
Skewness
Kurtosis
Shapiro-Wilk p
Jarque-Bera p
ACF original values
ACF absolute values
Volatility
96
0.401
0.401
0.401
0.401
0.401
0.237
BAA rate
97
0.008
1.754
0.008
0.002
0.375
0.655
Long-short spread
97
1.058
3.382
0.000
0.000
0.455
0.468
Earnings growth
97
0.614
2.903
0.000
0.000
0.474
0.253
Bubble measure
97
-0.816
1.102
0.003
0.000
0.291
0.608
US corporate bond returns
52
0.193
0.238
0.857
0.800
0.878
0.706
US stock returns
97
0.039
0.157
0.344
0.940
0.413
0.590
International stock returns
55
-0.015
-0.202
0.941
0.953
0.527
0.331
Covariance matrix
Correlation matrix
See below the p-values for the Student T-test for null hypothesis which is zero correlation between series of innovations.
The main simulator gives users a choice of portfolio: US stocks, international stocks, and bonds. Moreover, the main choice in the portfolio is the proportion of stocks and bonds. In the newest version, we allowed this proportion to vary and not to be fixed throughout these simulated years. For example, at the start of 30 years, the stock/bond split can be 80/20, and at the end, 50/50. This is done to make room for retirement planning. Usually, people choose to invest more in risky assets at the start of their savings journey, and to make it less risky when they become closer to retirement. Within retirement, it is wise to do the converse: Invest in bonds less and less as you progress through the retirement.
Some users might be confused by this variability. In addition to options for withdrawals/contributions, this can be challenging to navigate. In practice, most users care about only a few modes: saving before retirement and living in retirement. To this end, I created a simplified version with the following options:
Risk Tolerance: High (Assertive), Mid (Moderate), Low (Conservative)
Your Goal:
Lump-Sum Investing: invest initial wealth, amount provided by user, and do not contribute annually
Regular Savings: start with zero wealth, contribute annually a fixed nominal amount, provided by user, which grows 3% annually
Retirement Spending: invest initial wealth, amount provided by user, and withdraw annually fixed nominal amount, initially it is 4% of the initial wealth, according to the celebrated 4% retirement rule, and grows 4% annually
Why choose 3% annual increase for annual contributions and 4% annual increase for annual withdrawals? Historically, inflation was running around 3% annually in 1928-2024. We consider only nominal (not inflation-adjusted) returns, because we could not model inflation-adjusted version of corporate bond returns. Thus we need some compensation for inflation.
Below, we provide the split between stocks and bonds: stocks/bonds. Stocks include both USA and international.
Lump-Sum Investing and Regular Savings:
Conservative: 60/40 constant during simulation
Moderate: 90/10 at the start, 60/40 at the end, linear during simulation
Assertive: 90/10 constant during simulation
Retirement Spending:
Conservative: 60/40 constant during simulation
Moderate: 60/40 at the start, 90/10 at the end, linear during simulation
Assertive: 90/10 constant during simulation
Other than that, in this simulator the user can choose the number of years, wealth (initial investment in case of lump-sum investing or retirement spending, or annual investment in case of regular savings), but not growth rate.
This is the usual disclaimer for risky financial products, found at every investment company web site.
The performance data shown represent past performance, which is not a guarantee of future results. Investment returns and principal value will fluctuate so that investors’ shares, when sold, may be worth more or less than their original cost. Current performance may be lower or higher than the performance data cited. The performance of an index is not an exact representation of any particular investment, as you cannot invest directly in an index.
Assets represented by these portfolios and indices in the simulator are not guaranteed and protected by the US government, including Federal Deposit Insurance Corporation, and may lose value, including the loss of principal.
I, Andrey Sarantsev, PhD, the creator of this simulator, discount any responsibility, legal or otherwise, resulting in the use of this simulator. Any harms, losses, or damages resulting from this use are the responsibility of the user, not me.
For professional advice on investments or retirement, speak to your financial adviser or retirement adviser.
See my GitHub repository simulator-current for the HTML frontend pages, Python backend code in Flask, Excel data file, and Python code for validation of the model described below.
Continuing the previous post, we updated the financial simulator to make for geometric returns instead of arithmetic returns. We had mistakenly made linear regression for arithmetic returns, but this does not work well, since such returns can be only greater than . Thus we replaced returns of all three asset classes (US stocks, developed-markets stocks, US bonds) from arithmetic to geometric. To compute portfolio returns, we later convert these geometric returns to arithmetic returns. The updated version is specified below.
Description and system of equations.
Testing innovations for white noise.
Testing innovations for normality.
Description and system of equations
We use the following autoregression equation for annual volatility for S&P 500 and its predecessor, S&P 90, computed by Angel Piotrowski:
with and
Next, we use the following autoregression for BAA rate, following previous research:
with and
We use the logarithm because otherwise the rate might become negative, even with very small probability.
We consider three classes of assets and denote their annual geometric total returns (multiplied by 100 for normalization):
USA stocks, measured by Standard & Poor 500 index and its predecessor, the Standard & Poor 90 index
International stocks, measured by MSCI EAFE (Europe/Australasia/Far East) index
USA corporate investment-grade bonds, measured by Bank of America ICE index (ratings AAA, AA, A, BBB)
We normalize the two stock returns by dividing them by annual volatility. But we do not normalize the bond returns. We have the following equations for these three classes of assets:
For USA corporate bonds, following this blog post, we get: with and
For USA stocks, following this blog post, we get: with and and
For international stocks, similarly to USA stocks, we get: with and and
Thus all three classes of assets have returns highly dependent upon change in interest rates, with duration (regression coefficient) for returns Note that and
All five series of residuals are well-modeled by independent identically distributed random variables, judging by Monte Carlo simulation. I present the autocorrelation plots for them and their absolute values below.
Unfortunately, they are not normal. Namely, and (the innovations for factor autoregressions) are closer to skew-normal. I did not yet pursue this direction of research. But the other three residual series are Gaussian. I discuss their normality below.
However, I still modeled these five series as multivariate Gaussian with mean vector zero and the following empirical covariance matrix.
We consider only nominal, not real returns. To compensate for that, withdrawals/contributions can change annually.
We allow for constant split between US and international stocks. Bond and overall stock percentages might change from year to year linearly. This is to allow for a more conservative portfolio as time goes.
Also, and very importantly, we added a separate web page with a simplified version of this simulator. We describe it in a separate post.
Initial value for volatility is taken as average daily close VIX June 1, 2024 – May 31, 2025. The initial value for the BAA rate is taken as average daily May 2025.
Testing innovations for white noise
Below are autocorrelation function plots for the five series of innovations. Original and absolute in the captions refer to whether innovations are taken as is or after taking absolute values.
has tag ln-vol
has tag ln-baa
has tag bonds
has tag usa-stocks
has tag intl-stocks
Also, we compute L1 norms for first 5 values of the ACF, for original innovations and their absolute values. Comparing with these threshold values, we see that it is reasonable to model these as independent identically distributed.
N Data Points
96
97
52
97
55
Innovations
Original
0.40
0.18
0.88
0.48
0.49
Absolute
0.24
0.36
0.71
0.43
0.58
Testing innovations for normality
Similarly, we plot the five quantile-quantile plots versus the normal distribution below. Tags are the same.
Also, see p-values for statistical testing for normality: Shapiro-Wilk (SW) and Jarque-Bera (JB) testing.
Test
SW
0.9%
0.6%
86%
42%
99%
JB
6.1%
0.009%
80%
66%
90%
Finally, let us provide skewness and kurtosis for these innovations, normalized so that for the normal distribution they are 0.
Function
Skewness
0.59
0.81
0.19
0.23
-0.13
Kurtosis
0.057
1.4
0.24
0.068
0.13
Summary: are well modeled by normal, but and are not.
We have updated annual simulator. The current version uses two factors: S&P 500 volatility and BAA bond rate. Initial factors are from May 2025. We simulate portfolio of three classes of assets:
Standard & Poor 500 USA Stocks
MSCI EAFE Developed Markets Index Stocks
USA Investment-Grade Corporate Bonds, ICE Bank of America Index
We use the following autoregression equations for factors:
with and
with and
For corporate bonds, US stocks, and international stocks, denote their annual arithmetic total returns in % as
with and
with and and
with and and
All five series of residuals are well-modeled by independent identically distributed random variables, judging by Monte Carlo simulation. Unfortunately, they are not normal. Namely, and (the innovations for factor autoregressions) are closer to skew-normal. I did not yet pursue this direction of research. But the other three residual series are Gaussian.
However, I still modeled these five series as multivariate Gaussian with mean vector zero and empirical covariance matrix
We consider only nominal, not real returns. To compensate for that, withdrawals/contributions can change annually.
We allow for constant split between US and international stocks. Bond and overall stock percentages might change from year to year linearly. This is to allow for a more conservative portfolio as time goes.
We use May 2025 averages for initial factors: volatility (using rescaled VIX) and rate.