Skip to content

My Finance

University Home Page


  • The new valuation measure and dividend yield

    Consider the new valuation measure which is supposed to be an improvement of Shiller CAPE and dividend yield. Previously, we considered this measure based on trailing 10-year earnings. Right now, we consider it based on 1-year dividend yield and use it to create a new simulator. But recently, it occurred to me that one can express this measure based on the history of annual dividend yields. How so? Let us recall that the valuation measure is defined as

     H(t) = \ln(W(t)/W(0)) - \ln(D(t)/D(0)) - ct

    where  W(t) is the wealth at end of year  t invested in stocks, and  D(t) is dividend paid in year  t for these stocks. And  c \approx 4-5\% is the linear trend. Assume now  S(t) is end-of-year level at year  t. Then  \triangle(t) = D(t)/S(t) is the dividend yield for this year. Total returns then are  Q(t) = \ln(W(t)/W(t-1)) = \ln(S(t)+D(t)) - \ln(S(t-1)) and we can express  \ln(W(t)/W(0)) = Q(1) + \ldots + Q(t).

    The crucial insight:

     Q(t) =  \ln(S(t)/S(t-1)) + \ln(1 + \triangle(t)).

    Plugging this into the main formula for  H(t) and denoting  A(t) := \ln(1 + \triangle(1)) + \ldots + \ln(1 + \triangle(t)) we get:

     H(t) = \ln(S(1)/S(0)) + \ldots + \ln(S(t)/S(t-1)) + A(t) - \ln(D(t)/D(0)) - ct.

    Canceling these logarithms, we get

     H(t) = \ln(S(t)/S(0)) + A(t) - \ln(D(t)/D(0)) - ct which we can write as  H(t) = -\ln(\triangle(t)/\triangle(0)) + A(t) - ct which in turn, finally, we can write as  H(t) = -\ln \triangle(t) + \ln\triangle(0) + \sum\limits_{s=1}^t\ln(1 + \triangle(s)) - ct.

    If  \Delta is a strongly stationary process such that  d(t) = \ln(1 + \Delta) satisfies the Strong Law of Large Numbers (ergodic), then this measure converges to the stationary distribution if and only if  c is the mean of this  d(t).

    May 25, 2026

  • Fama-French factors time series modeling

    Well-diversified stock portfolios are well-explained by three factors:

    • market exposure  \beta to the equity premium (total returns minus risk-free returns)
    • size (market capitalization) computed as returns of small minus returns of large stocks
    • value (measured by book-to-market ratio) computed as returns of stocks with high minus low ratio

    The  R^2 is usually very high. Of course, such factors might be not constant over time. For example, the  \beta is famously unstable. To model properly such portfolios, we must model also the evolution of these factors, by showing how the size and value are changing. I am working on this based on

    The market exposure is already modeled since equity premium divided by annual volatility is independent identically distributed Gaussian. Thus I decided to model returns of size and value factors. I took annual returns 1927-2025 from Dartmouth College data library. See the GitHub repository.

    We succeeded for value but failed for size. We cannot use this in our future research!

    May 25, 2026

  • Including bond factors into the new Gaussian model

    Recall this blog post where we modeled stock returns using bond spreads. It continues this blog post, which, in turn, improves upon this foundational post.

    I wish to include the risk spreads BAA-AAA or BAA-Long as factors for stock returns, continuing this research. We use the valuation measure based on one-year dividends not ten-year earnings. Also, maybe bond returns, continuing this blog post.

    Maybe these risk spreads will improve our prediction. But we need to ensure that innovations and residuals are independent identically distributed and Gaussian.

    Also, the term spread Long-Short might be useful. But it does not fit our Gaussian assumptions for innovations.

    Finally, include Long in our model, since this will allow us to model long-term Treasury returns. Indeed, Long corresponds to 10-year yields but with coupons. This makes it difficult to compute returns explicitly. However, we have 10-year and 9-year zero-coupon bonds which have yields very close to these 10-year coupons. See the Federal Reserve Economic Data. Indeed, let  R(t) be the end-of-year (December daily average, more precisely) rate (assuming these three rates are the same). Then the price of a 10-year zero-coupon bond at end of year  t-1 is  P_0 = (1 + R(t-1)/100)^{-10} and it becomes a 9-year zero-coupon bond with rate  R(t) by end of year  t with price  P_1 = (1 + R(t)/100)^{-9} and the geometric returns are  Q = \ln(P_1/P_0).

    We can compute returns for zero-coupon bonds, because we compute its price explicitly. A big difficulty from this post is that Long rates do not have residuals which are independent identically distributed and Gaussian.

    Note that, unfortunately, in our previous research we used the long rates which are time-inconsistent: DGS10 are taken to be end-of-year, but LTGOV which precede DGS10 are December monthly average. Only in a recent post we correct this, taking both to be monthly average, when we unsuccessfully fit autoregression for long-short bond spread (with and without volatility).

    The Python file and data.xlsx in the same GitHub repository shows that we tried to fit Long separately, with and without volatility, and failed. We also tried with (BAA, Long) with and without volatility as vector autoregression, and failed. Finally, we tried the BAA-Long spread, and also did not succeed.

    Further tries to switch to log rates instead of rates failed for the Long and combined model. Except one: If we take log of spread, then this fits autoregression without volatility normalization. Together with the model for the BAA rate, this gives us the right model. In fact, we succeeded in our modeling: For  R_1(t) =  \ln(BAA)(t) and  R_2(t) = \ln(\ln(BAA) - \ln(Long))(t) we model  R_1(t) = a + bR_1(t-1) + Z_{1}(t)V(t) and  R_2(t) = c + dR_2(t-1) + Z_{2}(t) for Gaussian independent identically distributed  Z_1, Z_2.

    Finally, we consider regression of US stock normalized returns upon the spreads. We try 12 versions, with various combinations of volatility, 6 with original spreads and 6 with log spreads. But we could accept only two versions out of these 12:  Q(t)/V(t) = m + S(t)/V(t) + W(t) with  S(t) standing for either the spread or the log spread. We prefer the model with the log spread because the p-values for the Ljung-Box test are further from 0.05.

    This implies we should use this regression of log spreads divided by volatility as a factor in our further models. See the same GitHub repository.

    May 25, 2026

  • Improved Six-Equation Model Selection

    Valuation measure. Continuing the previous post, let us include the valuation measure  H(t) based on one-year dividend, described in previous posts:  H(t) = \ln W_1(t) - \ln D(t) - ct. The regression which defines such measure is an autoregression of order 1 with a linear trend:  H(t) = a + bH(t-1) + Z(t). Its innovations  Z(t) do not satisfy the assumptions 1-5 from the previous post. Therefore, we reject the regression as the model. But we accept this model:

     H(t) = \alpha + \beta H(t-1) + \gamma V(t) + V(t)Z_H(t)

    The values are  \alpha = 0.1699, \beta = 0.8262, \gamma = -0.0129. The  \beta is significantly different from one. There is no unit root.

    Historical factors. The latest (as of 2025) and long-term average of these three factors are:

    1. The volatility  V is 11.77 and 10.51
    2. The BAA rate  R is 5.9 and 6.8
    3. The new valuation measure  H is 0.192 and 0.24

    Domestic returns. Now add the new valuation measure as a factor for domestic returns. We accept this model:  Q_1(t) = a_1 - d_1(R(t) - R(t-1)) + b_1V(t) - c_1H(t-1) + V(t)Z_k(t). The values are  a_1 = 0.2637, d_1 = 0.0553, c_1 = 0.1303, b_1 = -0.0129. All coefficients are significantly different from zero. Also,  R^2 = 48\% which is very high! Usually, annual stock returns are not very predictable.

    International returns. Add the new valuation measure for international returns, although they are made for domestic returns. The regression model is accepted, and  R^2 = 39.1\% and all coefficients are significant except the valuation measure. So it is not needed, after all. Without the new valuation measure, the regression has  R^2 = 37.8\%. We do not need this!

    Covariance and correlation. In order  Z_1, Z_2, Z_0, Z_V, Z_R, Z_H and we can treat them as multivariate Gaussian independent identically distributed with covariance matrix (times 10000):

    2.026369 0.847703 -0.153965 -3.626994 0.208415 2.063568
    0.847703 2.975298 0.036818 -5.399599 -0.000237 0.863375
    -0.153965 0.036818 1.935904 14.068448 -0.050806 -0.844710
    -3.626994 -5.399599 14.068448 1338.685234 2.754026 -5.842714
    0.208415 -0.000237 -0.050806 2.754026 0.113497 0.261286
    2.063568 0.863375 -0.844710 -5.842714 0.261286 3.152435

    and the correlation matrix

    1.000000 0.380257 -0.077736 -0.070715 0.466564 0.816463
    0.380257 1.000000 0.014113 -0.083611 -0.000410 0.306855
    -0.077736 0.014113 1.000000 0.275118 -0.097698 -0.341934
    -0.070715 -0.083611 0.275118 1.000000 0.217726 -0.090182
    0.466564 -0.000410 -0.097698 0.217726 1.000000 0.469910
    0.816463 0.306855 -0.341934 -0.090182 0.469910 1.000000

    See the data and code in GitHub repository which verify the research here.

    May 9, 2026

  • Improved Five-Equation Model Selection

    1. Methodology. In my previous post, I relied on ACF and QQ plots to choose whether innovations are IID Gaussian. But I thought this is too informal. Let me instead select the model based on the following. Each series of residuals must have  p > 5\% for each of the following 5 statistical tests:

    1. The Jarque-Bera normality test for original values of residuals
    2. The Ljung-Box white noise test for original values of residuals with 5 lags
    3. The Ljung-Box white noise test for original values of residuals with 10 lags
    4. The Ljung-Box white noise test for absolute values of residuals with 5 lags
    5. The Ljung-Box white noise test for absolute values of residuals with 10 lags

    2. Results. We accept the following models:

    Stock Returns:  Q_k(t) = \ln(W_k(t)) - \ln(W_k(t-1)) is modeled as  Q_k(t) = a_k - d_k(R(t) - R(t-1)) + b_kV(t) + V(t)Z_k(t) for  k = 1, 2 which corresponds to domestic and international geometric stock returns. Also, a sub-model with  d_k = 0 (no duration) or  a_k = 0 (no volatility as an additive factor) or both.

    Bond Returns: Continue this blog post. Adjust them  Q_0(t) = W_0(t)/W_0(t-1) - 0.01R(t-1) and regress  \ln(Q_0(t)) = -d_0(R(t) - R(t-1)) + V(t)Z_0(t) or  Q_0(t) - 1 = -d_0(R(t) - R(t-1)) + V(t)Z_0(t) We accept them but reject the augmented models (for both arithmetic and geometric adjusted returns): with factors  a_0 (intercept) and  b_0V(t) (volatility as an additive factor).

    Volatility: The classic AR(1) model for log volatility  \ln V(t) = \alpha + \beta \ln V(t-1) + W_V(t) works.

    Bond Rates: Consider autoregression with volatility  \ln R(t) = \mu + \gamma \ln R(t-1) + V(t)W_R(t) is accepted. But we reject the one with  \delta V(t) instead of  \mu or with  \mu + \delta V(t). Models with  R(t) instead of  \ln R(t) are rejected. These are stationary models.

    Random walk for logarithms  \ln R(t) - \ln R(t-1) = \delta V(t) + V(t)Z_R(t) is also accepted, as well as an augmented model with volatility as an additive factor:  \ln R(t-1) - \ln R(t-1) = \mu + \delta V(t) + V(t)Z_R(t). The models without logarithms or volatility or both are rejected. These are non-stationary models.

    3. Choice. We pick the following model:  Q_k(t)  = a_k - d_k(R(t) - R(t-1)) + b_kV(t) + V(t)Z_k(t) for  k = 1, 2 and  \ln(W_0(t)/W_0(t-1) - 0.01R(t-1)) = -d_0(R(t) - R(t-1)) + V(t)Z_0(t) for stock and bond returns. Next, AR(1) for log volatility and  \ln R(t) = \mu + \gamma \ln R(t-1) + V(t)W_R(t) for log rates.

    Stock Returns:  Q_1(t) = 0.2111 - 0.0107 V(t) - 0.0621(R(t) - R(t-1)) + V(t)Z_1(t) for domestic and  Q_2(t) = 0.2684 - 0.0180 V(t) - 0.0390(R(t) - R(t-1)) + V(t)Z_2(t) for international stocks.

    Bond Returns:  \ln(W_0(t)/W_0(t-1) - 0.01R(t-1)) = -0.0596(R(t) - R(t-1)) + V(t)Z_0(t).

    Stock Volatility:  \ln V(t) = 0.8569 + 0.6176 \ln V(t-1) + Z_V(t).

    Bond Rates:  \ln R(t) - \ln R(t-1) = 0.0708 - 0.0411\ln R(t-1) + V(t)Z_R(t).

    The covariance matrix for innovations  Z_1, Z_2, Z_0, Z_V, Z_R (times 10000) is:

    2.218258 0.948039 -0.210064 -6.113641 0.215726
    0.948039 2.975298 0.036818 -5.399599 -0.000237
    -0.210064 0.036818 1.935904 14.068448 -0.050806
    -6.113641 -5.399599 14.068448 1338.685234 2.754026
    0.215726 -0.000237 -0.050806 2.754026 0.113497

    The correlation matrix for innovations is

    1.000000 0.403307 -0.101369 -0.113717 0.459186
    0.403307 1.000000 0.014113 -0.083611 -0.000410
    -0.101369 0.014113 1.000000 0.275118 -0.097698
    -0.113717 -0.083611 0.275118 1.000000 0.217726
    0.459186 -0.000410 -0.097698 0.217726 1.000000

    We can consider  Z_1, Z_2, Z_0, Z_V, Z_R to be multivariate Gaussian independent identically distributed.

    See the data and code in GitHub repository which verify the research here.

    May 9, 2026

  • Market Models with Gaussian Innovations

    This post focuses on models having innovations which are not only independent identically distributed (IID) but Gaussian. It is clear why this makes them so easy to analyze and simulate.

    Model building goals. We evaluate the model based on innovations.

    • Is each series of innovations a (weak) white noise, measured by the empirical autocorrelation function (ACF) and standard Ljung-Box white noise tests? If yes, this would mean autocorrelations are zero. But this does not yet mean the series is truly IID. Stochastic volatility models show why.
    • Is each series of innovations after taking their absolute values have autotocorrelations zero? We again apply the empirical ACF and the Ljung-Box white noise tests. If yes, and the first answer above is also yes, then it is reasonable to model this series as IID.
    • Is each series of innovations Gaussian? We can ask this question only if we answered affirmatively on the first two. This is answered by making a quantile-quantile plot versus the normal distribution and applying Jarque-Bera normality test.

    As an attentive reader can see, the techniques are essentially the same as previously. But there are a couple of important differences.

    • First, we apply the Ljung-Box white noise tests based on the (weighted) sum of squares, not the customized sum of absolute values tests we considered previously. I think it is simply easier and better known to apply Ljung-Box tests. The test based on L1 norm did not really show anything special different from L2 tests.
    • Second, we do not apply the Shapiro-Wilk normality test. We consider it to be a bit of an overkill. Jarque-Bera test captures skewness and fat tails commonly present in financial analysis which prevent the data from being normal. And anyway, the Jarque-Bera test is present in the standard Python OLS regression output.

    Let us stress what features we are not interested in.

    • Maximizing  R^2 of linear regressions
    • Information criteria (Akakie/Bayesian)
    • Minimizing the standard error  s of regression residuals.

    Data description. We have annual data 1927-2025. It is available as a spreadsheet here. Take five data series:

    • Total returns invested in S&P 500 and its predecessor, S&P 90:  Q_1(t) during year  t available 1928-2025. End-of-year values were taken by Ian Anderson from Yahoo Finance, and dividend annual data is taken from Robert Shiller’s data library. See the source here.
    • Total returns for international stocks (see remark below):  Q_2(t) available 1970-2025.
    • Total return index value  B(t) for the USA corporate bonds (measured by Bank of America Intercontinental Exchange total return index value, taken from Federal Reserve Economic Data (FRED) web site), available 1972-2025.
    • Annual realized volatility  V(t) for year  t. Recall that this was computed by Angel Piotrowski for 1928-2025.
    • December daily average BAA Moody’s rate:  R(t) available for 1927-2025 also from FRED web site.

    A note on international stocks. The total returns of international stocks are now measured by a customized portfolio of 88% MSCI EAFE and 12% MSCI Canada. I did this adjustment because the data for MSCI EAFE (=developed markets including Europe, Australia and Far East) was available from 1970 on the web site Novel Investor, as opposed to MSCI emerging markets (available only from 1988), but this EAFE index did not include Canada! I thought this is very unfair, since Canada is a major component (~12%) of existing developed market stock ETFs. So I decided to include it manually. The data for Canada was also available as another MSCI index from 1970.

    The simplest model equations. Try modeling  Q_k(t)/V(t) = c_k + Z_k(t) where  Z_k are innovation series. This is in line with our long-standing idea of dividing stock returns by volatility to make them closer to IID Gaussian. It works perfectly well here.

    Also,  (\ln R(t) - \ln R(t-1))/V(t) = Z_R(t) Note that this makes the rates non-stationary: More like a geometric random walk, except we have stochastic volatility here. This is one more remarkable example of how to use stock volatility for bonds, which we discussed earlier.

    Next, the equation for volatility stays the same as in Angel Piotrowski’s analysis:  \ln V(t) = a_V + b_V \ln V_k(t-1) + Z_V(t).

    Finally, the equation for bond returns is as follows:  B(t)/B(t-1) - 0.01R(t-1) - 1 = -a - d(R(t) - R(t-1)) + V(t)Z_0(t).

    Each  Z is IID Gaussian series with mean zero. This is confirmed by the tests and graphs above.

    Properties of this model. As mentioned above, rates and therefore bond returns are non-stationary. But stock returns and volatility are stationary. Also, stock returns do not have duration in their modeling. In fact, if we include the same difference term, it would be highly significant, with extremely high T-value. But we decided to create the simplest model.

    Extensions. We can increase complexity of this model as follows:

    1. Include  Q_k(t) = c_k - d_k(R(t) - R(t-1)) + V(t)Z_k(t) then the innovations are also IID Gaussian. This involves duration for stocks just like for bonds. The values of coefficients are significantly different from zero. Accept!
    2. Include a constant for increments of log rates  \ln R(t) - \ln R(t-1) = a_RV(t) + V(t)Z_0(t) and  a_R = 0 has  p = 0.3 for Student T-test. Accept!
    3. On top of item 1, for stock returns (both domestic and international) we can add volatility as an additive factor, not just multiplicative:  Q_k(t) = c_k - d_k(R(t) - R(t-1)) + a_kV(t) + V(t)Z_k(t). Here  a_k is different from zero, judging by the Student T-test. Accept!
    4. We could run the autoregression with stochastic volatility terms  R(t) = a + bR(t-1) + cV(t) + V(t)Z_R(t) but this violates normality of  Z_R. Same would be true for the simple autoregression  R(t) = a + bR(t-1) + Z_R(t). Reject! Unfortunately, this means we must consider a non-stationary model.
    5. We could add volatility to bond returns as an additive factor, not just a multiplicative one. Thus we make regression  B(t)/B(t-1) - 0.01R(t-1) - 1 = -a - d(R(t) - R(t-1)) + bV(t) + V(t)Z_0(t). But this would fail the IID assumption. Reject!
    6. Writing  \ln(B(t)/B(t-1) - 0.01R(t-1)) = -a - d(R(t) - R(t-1)) + V(t)Z_0(t) we replace de facto arithmetic returns with geometric returns, but in a modified way. The IID Gaussian assumption holds. Accept!
    7. Adding volatility to bond returns as an additive factor, not just a multiplicative one: Similarly to 5, but with model as in 6, then the IID fails. Reject!

    The new valuation measure. Following previous blog posts, we consider comparing total annual returns with annual dividend growth and detrending it. Take cumulative quantities, which can be expressed using current dividends:  M(t) = Q_1(1) + \ldots + Q_1(t) - \ln D(t) and regress  M(t+1) - M(t) versus the previous value  M(t) and the time trend  t. We get: [/latex] M(t+1) – M(t) = \alpha + \beta M(t) + \gamma t + W(t). [/latex] Similarly to the article we rewrite this as a simple autoregression for detrended  H(t) = M(t) - mt. This autoregression will also have residuals  U(t). Such residuals are tested and they do not pass our tests: They are not IID. Reject!

    The long-short term spread. We considered this spread  S(t) between 10-year and 3-month average December annual rates, and its predecessors from 1927.

    • Classic autoregression:  S(t) = a + bS(t-1) + Z(t).
    • AR with stochastic volatility:  S(t) = a + bS(t-1) + cV(t) + Z(t)V(t).
    • AR with stochastic volatility but without volatility as an additive factor:  S(t) = a + bS(t-1) + Z(t)V(t).

    Unfortunately, none has Gaussian residuals. Reject!

    May 7, 2026

  • Bond Returns 1973-2025

    1. Introduction and data description
    2. Relation between price returns and wealth process
    3. Fitting linear regression
    4. Conclusion

    1. Introduction and data description. Continue the research after updating the data for 2025. In the previous post, we discussed total returns for S&P 500 in detail. Now we discuss bond returns. We take the same data as above, and for bond wealth process  B , we take FRED series BAMLCC0A0CMTRIV, last trading day of year 1972-2025. We start from  B(0) = 100 for the last trading day of 1972, and go on from there. This way we can compute log price returns for year  t.

    2. Relation between price returns and wealth process. We derive log price returns from this wealth process using bond math. Then we run linear regression of this log price returns versus rate change. This regression coefficient is called the duration. In continuous setting, this is defined as minus derivative of the log price with respect to the interest rate. More precisely, we use yield to maturity as this interest rate.

    Wealth at end of year  t-1 is  B(t-1) and at end of year  t is  B(t). Assuming the price of this bond at end of year  t-1 is  P(t-1) = B(t-1). Then the coupon paid during year  t is  B(t-1)R(t-1). The wealth at end of year  t is the sum of this coupon and the price  P(t) at end of year  t. Combining this, we get:  P(t) + B(t-1)R(t-1) = B(t).

    We have  P(t)/P(t-1) = B(t)/B(t-1) - R(t-1). This gives us the formula for log price returns:

     Q(t) = \ln(P(t)/P(t-1)) = \ln(B(t)/B(t-1) - R(t-1)).

    Since  B(t-1), B(t), R(t-1) are observed, we can compute  Q(t). If we fit this model, we can rewrite

     \ln(B(t)/B(t-1)) = \ln(\exp(Q(t)) + R(t-1)).

    3. Fitting linear regression. As mentioned above, the log price returns  Q(t) can be regressed upon  -(R(t) - R(t-1)). We do not include an intercept into this regression, since this has the meaning of a discrete approximation of a derivative. The regression coefficient is 6.1053, and the residual analysis is done by the plots below. These are Gaussian but not quite independent identically distributed. This is confirmed by the p-values of Shapiro-Wilk and Jarque-Bera normality tests:  92\% and  75\%. And by the Ljung-Box test for 5 lags for original and absolute values of residuals, which give us  2.8\% and  4.3\%.

    To make residuals independent identically distributed, we divide each term by annual volatility  V(t). This is equivalent to assuming that the residuals are heteroscedastic: These are white noise multiplied by  V(t). This gives us 5.96 regression coefficient, and  p = 86\% for Shapiro-Wilk normality test,  p =91\% for Jarque-Bera normality test. Also, the Ljung-Box test for the first 5 lags of original residuals has  p = 12\% and for absolute residuals has  p = 57\%. Finally, see the graphs for residuals below.

    4. Conclusion. We have  \ln(B(t)/B(t-1) - R(t-1)) = -d(R(t) - R(t-1)) + Z(t)V(t) where  Z are independent identically distributed Gaussian. We succeeded in modeling the bond market! Duration in both cases is around  d = 6. What is more, this is better than previous models:

    • We clearly explain the meaning of duration as dependence measure of the price upon yield to maturity, and not rely on approximate formulas, such as in this blog post
    • We used volatility in our model, but stock volatility used for bond returns is highly unusual
    • Residuals are independent identically distributed and Gaussian

    Together with the two previous blog posts, we can now create a complete model of:

    • BAA rates
    • Dividends
    • The volatility
    • The valuation measure
    • Total stock returns
    • Total bond returns

    These are 6 time series with 5 innovation sequences. Thus we can use it to create a simpler version of the financial simulator.

    February 2, 2026

  • New Valuation Measure based on Dividends

    1. Motivation of the new valuation measure
    2. Fit autoregression with linear trend as before
    3. Use this valuation measure for modeling returns
    4. Include bond rates and duration
    5. Conclusion

    1. Motivation of the new valuation measure. We continue the previous blog post. We replicate the valuation measure here. We use updated data for 2025. Previously we did this with 10-year earnings but now we wish to do this with 1-year dividends.

    We prefer dividends to earnings for the following reasons:

    • Dividends are the actual cash paid, and they are not disputable, but earnings depend on accounting standards
    • Dividends are more predictable, since companies do not like to cut them, but earnings are highly volatile
    • Earnings of companies can be negative, and thus suffer from the aggregation bias, but dividends are nonnegative

    2. Fit autoregression with linear trend as before: Take the index level  S(t) at end of year  t and dividends  D(t) paid at year  t. Total returns and dividend growth are given by  Q(t) = \ln(S(t)+D(t)) - \ln (S(t-1)) and  G(t) = \ln D(t) - \ln D(t-1).

    We model the cumulative difference  C(t) = Q(1) + \ldots + Q(t) - G(1) - \ldots - G(t) as a simple autoregression of order 1 with trend:  C(t) - ct = a + b(C(t-1) - c(t-1)) + Z(t) where  Z are innovations. The valuation measure then is defined as  H(t) = C(t) - ct.

    This can be written as  Q(t) - G(t) = \alpha + \beta t - \gamma C(t-1) + Z(t). We fit  \alpha = 0.0436, \beta = 0.0121, \gamma = 0.2443. The autoregression becomes the random walk (there is no mean-reversion) if  \gamma = 0 but this hypothesis has  p = 0.045\% which is very low. Next, the trend coefficient is zero if  \beta = 0 which has  p = 0.06\%.

    From here, we can deduce  a, b, c, and compute the valuation measure  H(t) = C(t) - ct. The measure, as before, shows us that the market is not overvalued, since it is average compared to the historical standard.

    Analysis of residuals: See the autocorrelation function plots for  Z and for  |Z| as well as the quantile-quantile plot for  Z. The Shapiro-Wilk and the Jarque-Bera test give us  p = 29\% and  25\%.

    We can approximately assume that residuals are independent identically distributed Gaussian, although the autocorrelation function for lag 1 for the absolute values of innovations raises questions.

    3. Use this valuation measure for modeling returns. We can model total stock returns  Q(t) with dividends.

    Model 1. Since we know how to model dividend growth from the previous blog post, together with annual volatility, we can simply model stock returns using three time series:

    • the new valuation measure  H as autoregression
    • volatility  V as another autoregression on the log scale
    • normalized dividend growth  F(t) = \ln(D(t)/D(t-1))/V(t) as yet another autoregression

    Model 2. However, we can also regress  Q(t) upon  H(t-1) as follows:

     Q(t) = h - kH(t-1) + W(t).

    We get  h = 0.0933, k = 0.131. Also the p-value for hypothesis  k = 0 is  p = 3.6\%. The plots for residuals  W are below. This is independent identically distributed but not normal. Same is confirmed by the two normality tests, which give us extremely low p-values.

    This model uses four time series, but with only three series of innovations:

    • returns  Q regressed upon last year’s new valuation measure  H(t-1)
    • the new valuation measure  H as the detrended difference of total returns and dividend growth
    • volatility  V as another autoregression on the log scale
    • normalized dividend growth  \ln(D(t)/D(t-1))/V(t) as yet another autoregression

    The second time series is without new innovations: Indeed, we simply write  H(t) = Q(t) - G(t) - c + H(t-1) from the definition of the new valuation measure; and this does not have any new innovations. We modeled  Q and  G separately.

    Model 3. Let us modify Model 2 to include division by volatility: We divide by  V both returns  Q and the right-hand side.

     Q(t)/V(t) = l/V(t) + m - kH(t-1)/V(t) + W(t).

    We get  l = 0.2468, m = -0.0147, k = 0.1576. The p-values are all  0.1\% or less. The normality tests for innovations  W show p-values above 90% and this is confirmed by the plot below. The values of W can be modeled as independent identically distributed Gaussian, therefore; see the three plots below.

    This model also uses four time series but with three series of innovations, as in Model 2.

    4. Include bond rates and duration. Following the previous blog post, we include rate change  R(t) - R(t-1) in our time series models. Here  R(t) is the BAA rate, December daily average for year  t.

    Model 1. Try to include this rate change as a factor in dividend growth model  F. The two other time series: the valuation measure  H and the volatility  V do not need rate change as the factor. We get:

     F(t) - F(t-1) = a - bF(t-1) - c(R(t) - R(t-1)) + U(t).

    But we run into problems: The coefficient  c is not significantly different from zero, with  p = 70\% and the autocorrelation function and quantile-quantile plots for residuals  U shows this is not independent identically distributed and not Gaussian, see below.

    Similar results are if  R(t) - R(t-1) is divided by  V(t). Thus we abandon this idea of including duration (dependence upon rate change) in normalized log dividend growth.

    Finally, try to include  R(t-1) instead of  R(t) - R(t-1). This means using rate itself instead of rate change as a factor. Or normalize this rate by volatility:  R(t-1)/V(t). In each case, still we have these plots as above for regression residuals.

    Conclusion: We failed to model normalized dividend growth using rate or rate change for BAA bonds.

    Model 2. Include duration in the regression for total returns, together with the valuation measure:

     Q(t) = k - hH(t-1) - d(R(t) - R(t-1)) + W(t).

    We get  k = 0.0945, h = 0.0960, d = 0.0834 with p-values 8.6% for valuation coefficient zero and less than 0.1% for intercept and duration. Also, the residuals are Gaussian, with Shapiro-Wilk and Jarque-Bera normality tests giving us  p = 16\% and  p =18\%. But not independent identically distributed. See the three graphs below.

    Conclusion: We failed to include duration in total returns modeling without normalizing by volatility.

    Model 3. Include duration in the regression for total returns, together with the valuation measure:

     Q(t)/V(t) = l/V(t) + m - hH(t-1)/V(t) - d(R(t) - R(t-1))/V(t) + W(t).

    We get a much better fit than without the duration or in Model 2:  l = 0.2296, m = -0.0129, h = 0.1303, d = 0.0553 with p-values 0.4% for valuation coefficient zero and 0.1% or less for others. Also, the residuals are Gaussian, with Shapiro-Wilk and Jarque-Bera normality tests giving us  p = 50\% and  p =77\%. Finally, looking at autocorrelation function plots for  W and for  |W| we see that residuals are independent identically distributed Gaussian.

    Conclusion: Here we succeeded in including the duration as a factor for regression modeling of total returns after normalizing.

    5. Conclusion: We can reasonably model the new valuation measure using one-year dividends, not trailing ten- or five-year earnings, as in previous articles or blog posts. This might be better, since in previous models we used both dividends and earnings, but here we use only dividends. It is useful to include rate change as a factor in a regression for total returns, but only after normalizing, and not for normalized dividend growth. This updates our blog post. In the next post, we consider total corporate bond returns modeling using bond rates.

    January 30, 2026

  • Updates for 2025

    Dear readers, after a long break, I am back. I updated the annual volatility and other data for S&P 500 for the year 2025. The data are available here.

    1. Data Updates
    2. New Graphs
    3. Total Returns
    4. Volatility Autoregression
    5. Price Returns
    6. BAA Bond Rates
    7. Dividend Growth
    8. Conclusion

    1. Data Updates. Annual volatility is computed as the empirical standard deviation of daily log changes multiplied by 1000 (for normalizing). The end-of-year price for S&P 500 in 2025 is also updated. We also add S&P 500 dividends for 2025. Now we have data on volatility for 1928-2025, on dividends for 1927-2025, and end-of-year level of S&P 500 for 1927-2025 too.

    We added the dividend data for 1927 as well, to increase the number of data points. This is fine, since S&P 90 (a predecessor for S&P 500) was created in 1926, and the data is taken from Robert Shiller’s data library.

    The volatility for 2025 is 11.77. This is higher than the long-term average 10.51, or the 2024 volatility, which is 7.98. See the original post with computations of Angel Piotrowski for 1928-2023 and its previous update for 2024.

    Dividends for 2025 are 78.92, which is significantly higher than dividends for 2024, which are 74.83.

    The S&P 500 increased a lot in 2025: End-of-year 2024 level is 5881.63, but end-of-year 2025 level is 6845.5.

    We could not yet provide earnings for 2025, since we have earnings for 2025 Quarter 4 reported only on 2026 Quarter 1, which is still ongoing. We will provide them as soon as we can.

    Finally, we added the BAA rate: December 2025 daily average. The BAA are lowest-rated investment-grade corporate bonds. The rate in December 2025 is 5.9, slightly higher than 5.8 for December 2024.

    2. New Graphs. We graph the index, dividend, rates, and volatility.

    Above, logarithmic plots of index levels and dividends for 1927-2025. Below, the annual volatility and December BAA rate.

    The data are published on my web page: We created a new tab named Financial Data Library on my web page. Let us now apply

    Let us replicate this post: Make stock returns IID Gaussian.

    We have the following notation:

    •  S(t) the S&P level at end-of-year  t.
    •  D(t) the dividend of S&P in year  t.
    •  R(t) December daily average BAA rate during year  t.
    •  V(t) annual realized volatility for the S&P for year  t.

    3. Total Returns. We continue this blog post. Compute total nominal geometric returns for the S&P 500:  Q(t) = \ln(S(t) + D(t)) - \ln S(t-1) for year  t. Below is the graph of returns 1928-2025.

    Now plot the autocorrelation function for these total returns  Q. And another autocorrelation function for their absolute values  |Q|. Both plots are below, and both are consistent with the white noise hypothesis. It is surprising that we, in fact, do not have to divide total returns by annual volatility to make it white noise.

    The quantile-quantile plot of these returns is shown as well. We see that the returns are not Gaussian. This is consistent with the normality testing. Shapiro-Wilk and Jarque-Bera tests give us  p = 0.02\% and  p = 3\cdot 10^{-5}.

    What if we do divide these total returns by annual volatility? We get  N(t) = Q(t)/V(t). Let us plot the autocorrelation function for  N and the autocorrelation function for  |N|.

    These are still consistent with white noise, although, in my view, the autocorrelation function values are greater. But the quantile-quantile plot versus the normal distribution is below. We get  p = 12\% and  p = 71\% for Shapiro-Wilk and Jarque-Bera normality tests.

    4. Volatility Autoregression. We continue this blog post. Let us now fit the auto-regression model for logarithm of volatility:

     \ln V(t) - ln V(t-1) = \alpha + \beta \ln V(t-1) + W(t).

    We fit  \beta = -0.3824 and  \alpha = 0.8569. Also, plotting the autocorrelation function of  W and of  |W| we see:

    This is consistent with the assumption that  W(t) are independent identically distributed. But it is more ambiguous to assume they are Gaussian, see the quantile-quantile plot below. The Shapiro-Wilk and Jarque-Bera tests give us  p = 1.1\% and  p = 7.5\% respectively.

    5. Price Returns. These are computed as  Q(t) = \ln S(t) - \ln S(t-1). We continue this blog post. These contain only price changes, not dividends. The autocorrelation function for these values and their absolute values is plotted below.

    Quite close to independent identically distributed! Next, the quantile-quantile plot versus the Gaussian distribution: This shows price returns are not Gaussian, similarly to total returns. This is confirmed by familiar Shapiro-Wilk and Jarque-Bera tests  p = 7\cdot 10^{-5} and  p = 1.4\cdot 10^{-6}.

    Let us divide price returns by volatility. Below we plot the autocorrelation function of  Q/V and of  |Q/V| and see that this is still consistent with being independent identically distributed. Only these values are slightly higher.

    The Shapiro-Wilk and Jarque-Bera tests give us  p = 9\% and  p = 75\% respectively. See also the quantile-quantile plot. This is much closer to normal distribution.

    Finally, let us plot price and total returns together. We see that, of course, total returns are greater than price returns.

    6. BAA Bond Rates. Continue this blog. We also fit a simple autoregression:

     R(t) - R(t-1) = a + bR(t-1) + Z(t).

    We get:  a = 0.43 and  b = -0.062. But the p-value for the null hypothesis when we have a random walk is  p = 8\% so we fail to reject the random walk hypothesis. This is not acceptable from the financial point of view, since the random walk implies  R will go negative eventually. Also, consider the graphs of autocorrelation function for  Z and for  |Z|. These are not independent identically distributed.

    Both p-values for normality tests of innovations  Z are less than 0.01%. The quantile-quantile plot is shown below for  Z. It is clear these are not Gaussian.

    Instead, like for the volatility, let us take the logarithm:

     \ln R(t) - \ln R(t-1) = a + b\ln R(t-1) + Z(t).

    We get  a = 0.11 and  b = -0.058 with  p = 9.2\% for the null hypothesis of random walk, which corresponds to  b = 0. Plot the autocorrelation function for  Z and for  |Z| below:

    Let us modify this to try a random walk model:  L(t) = \ln R(t) - \ln R(t-1) are they really independent identically distributed Gaussian? Below are the autocorrelation function plots for  L and for  |L| which show that these are independent identically distributed.

    Next, the quantile-quantile plot versus the normal distribution is much closer to the straight line than before for other models of the BAA rate. This is confirmed that the Shapiro-Wilk and Jarque-Bera tests give us  p = 0.4\% and  p = 10^{-5} which rejects the null hypothesis but are not as small as the previous ones.

    Next, try to make these independent identically distributed but non-Gaussian terms Gaussian. We do the same as in sections 1 and 3: Divide the log rate change by volatility. We get  N(t) = L(t)/V(t) = \ln(R(t)/R(t-1))/V(t). Below are autocorrelation function plots for  N and for  |N|.

    The quantile-quantile plot below shows these are Gaussian terms, and the same is shown by the Shapiro-Wilk and Jarque-Bera tests with  p = 46\% and  p = 91\%. This was done in the spirit of this blog post.

    7. Dividend Growth is computed as  G(t) = \ln D(t)/D(t-1). We continue this blog post. See below the autocorrelation function plots for  G and for  |G| which show lag 1, also the quantile-quantile plot.

    Define  N(t) = G(t)/V(t) and analyze it as well. The data are closer to the Gaussian distribution, with  p =0.03\% and  p = 0.01\%.

    See the plot of the dividend growth below. It is quite volatile but not as much as the stock returns. But we clearly see the persistence: It makes sense to model dividend growth or its normalized version as the simple autoregression. This is different from annual earnings growth, where dividing by volatility makes it independent identically distributed, see this blog post.

    Let us try the simple autoregression for normalized annual dividend growth  N(t) = \ln(D(t)/D(t-1))/V(t).

     N(t) - N(t-1) = k + mN(t-1) + Y(t)

    We have  k = 0.0044 and  m = -0.66 with  p < 10^{-9}. Shapiro-Wilk and Jarque-Bera normality tests give us  p = 6\cdot 10^{-5} and  p = 1.7\cdot 10^{-7}. See the graphs below, autocorrelation for  Y autocorrelation for  |Y| and the quantile-quantile plot for  Y.

    Here we have independent identically distributed but not Gaussian residuals  Y.

    8. Conclusion. Here, we found all time series Markov models for dividends, price and total returns, volatility, and the BAA rates. In the next post, we will discuss updates for the valuation measure based on one-year dividends instead of trailing 10-year earnings, and regression modeling using rate change and duration, continuing this post and this post.

    January 28, 2026

  • Advanced Version

    In the repository https://github.com/asarantsev/advanced we added another page with advanced version of the simulator, where we can pick initial conditions for model factors:

    • S&P 500 Annual Volatility (VIX)
    • S&P 500 Bubble Valuation Measure
    • Moody’s BAA Bond Rate
    • Treasury 10Y – 3M Long-Short Bond Spread

    We repainted the button Compute in orange. For the advanced option, we made the legend font and the web page font smaller. And we updated the default initial conditions for the main version of the simulator to make it for June 2025.

    See the historical graphs of these four measures below. We can see their historical range. We included them in the HTML page for the advanced version of the simulator.

    Below we see the autocorrelation function plots for original and absolute values, and the quantile-quantile plot versus the normal distribution, of residuals for each of the seven regressions. First, let us consider four factors: volatility, BAA rate, and spread (autoregressions for logarithms) and earnings growth divided by volatility (needed to compute the evolution of the bubble measure; here no regression).

    Then consider regression residuals for total returns of three asset classes: S&P 500, International Stocks, USA Corporate Bonds.

    July 10, 2025

Next Page

Blog at WordPress.com.

Loading Comments...

    • Subscribe Subscribed
      • My Finance
      • Already have a WordPress.com account? Log in now.
      • My Finance
      • Subscribe Subscribed
      • Sign up
      • Log in
      • Report this content
      • View site in Reader
      • Manage subscriptions
      • Collapse this bar