Research: My First Paper

So I will be too busy with finals to write for the next week or so, so I decided to leave something on here for people to think on during my absence.

I was fortunate enough to do research with one of my professors, who is quite literally a genius. We wanted to see if historical market prices were adequate predictors of current stock prices (after all, history does repeat itself).

If you’re looking for some light reading on a Friday night, this should do the trick! 🙂 Also, it’s kind of fun because it’s poking at the Efficient Market Hypothesis!

Using Resampled Stock Prices to Predict Stock Prices


Based on strong efficient market hypothesis, the market is a perfect predictor of stock prices, and thus, stock returns. But time and time again, this hypothesis is proven wrong as stocks perform above or below what the market would predict. Taking this into account, we created a model that predicts stock returns using historical, resampled data.

We use data from 500 different stocks to create a model that predicts these returns, and compare it to the returns that the market itself predicts. We find that (1) the model adds to the predictive power of the market, (2) In-the-Money options are predicted more accurately by the model than Out-of-the-Money options, and (3) variation in option payoffs was far greater than predicted by the model.


The option model developed in this paper works by taking historical option return data and calculating the expected payoff. It begins by setting the parameters for the specified option price, requiring that it has about a month to expiration (22 trading days). The model then simulates returns for each of the stocks 10,000 times, so we get an “average” return, based on the law of large numbers. We then compare the average return to the return that the market predicts, and then compare that number to the actual market prediction.

We are going to test the effectiveness of the model by examining two different regressions and two different R2 values. The first regression will be a comparison of the actual market price of an option to the price predicted by the model. If the model works, both prices should be highly correlated.

There is also a bit of data management that must be done concerning outliers. We can look at regressions and see if there are outliers that are skewing the data one way or the other. Upon removing these outliers, the data should be relatively close to the regression line, as stated previously.

The second regression will be comparing how well the market predicts the actual payoff. We will primarily be looking at the R2 value of this regression and comparing it to the third and final regression. The third regression is essentially a test to see if our model adds any predictive power to the market.

If the option price implied by our model does have any predictive power in addition to the observed market price, we should find that the added variable representing our model is statistically significant.  Another simple test of our model is to compare the R2 of second regression to the R2 of the third regression: If the options prices generated by our model have additional predictive power, the R2 of the third regression should be significantly higher.


The resampling option price model was tested on several subsets of depending on how far out of the money or in the money the options were. Table 1 shows the results for options that are 25% ITM (i.e. the strike is 25% below the current market price of the underlying stock). From the first column, we can see that our model is works well with an R2 value of 77%. That means most of the variation in market price of the calls (77%) can be explained by our model. Our model also adds rather significantly to the predictive power of the market, as referenced by comparing columns 2 and 3, increasing the R2 value by 5.5%. We can also see that the beta of the expected payoff variable is near one, indicating that the predictive power of the model is similar to that of the market.

Table 1
25% ITM
1 2 3
Expected Payoff 0.780 0.793
(0.001) (0.051)
Last Market Price 1.309 0.420
(0.0413) (0.0686)
n 1862 1744 1743
R2 0.770 0.395 0.450


However, for options that were 25% OTM, the R2 value was a bit lower, and the model did not add as much to the predictive power of the market (1.6%). The beta coefficient was also about 2 points lower for expected payoff.

Table 2

25% OTM
1 2 3
Expected Payoff 0.756 -0.577
(0.010) (0.096)
Last Market Price 1.275 1.820
0.05215 0.0966
n 2244 2244 2243
R2 0.715 0.210 0.226


10% ITM options and 10% OTM options followed a similar pattern with regards to how well the model worked, and how well it added to the predictive power of the market price. The model worked “better” for the ITM option by 9%, as indicated by Tables 3 and 4. However, the model only increased the predictive power of the market by 1.5% in both circumstances.


Table 3
10% ITM
1 2 3
Expected Payoff 0.748 0.255
(0.011) (0.097)
Last Market Price 1.306 1.071
0.05391 0.11652
n 1037 1007 1010
R2 0.805 0.368 0.373


Table 4

10% OTM
1 2 3
Expected Payoff 0.825 -0.530
(0.016) (0.123)
Last Market Price 0.991 1.457
0.07125 0.1294
n 1049 1020 1019
R2 0.715 0.160 0.175


Finally, when we look at ATM options, we can see that the model works quite well (R2 of 83%) and that it adds to the predictive power of the market by 1.7%

Table 5

1 2 3
Expected Payoff 0.856 -0.694
(0.012) (0.1314)
Last Market Price 1.293 1.975
0.05918 0.1417
n 1085 1085 1084
R2 0.830 0.306 0.323



Our data might be prone to biases, as the options expired in March 18th of 2016, which was the beginning of a mini-boom in the marketplace. Therefore, it is quite possible that the above results would be very different if the sample included a market downturn or just a flat market.

Also, due to the model design, we had overrepresentation of near price options. Because we selected a frame set of percentages, stocks that have pricing increments of 0.5 or 1, are overrepresented as compared to increments of 10 or 20.

In the model, we also tested the same stocks repeatedly, essentially a tautology, or duplication of the data: For example, for the ATM regression, there are 1085 observations. However, the data include information only on 500 stocks. The reason for more data points than stocks is that several options for the same underlying stock could be included in the sample. That means that certain stocks can be overrepresented, which could lead to a false support of our model.

To improve our model, we could rerun the regression with a different expiration that encapsulates more “normal” market behavior. That way, the returns could be more accurately predicted by market mechanisms.

Overall, the model predicts the returns of the market “better” than the market itself. That can be contributed to multiple factors, such as the fact that the month was an anomaly with regards to market performance. But, regardless, the basis of the efficient market hypothesis is brought under scrutiny with these results. If the market truly could factor in everything, and there were no opportunities for arbitrage, then there should have been no outliers for the market.

Disclaimer: These views are not investment advice, and should not be interpreted as such. These views are my own, and do not represent my employer. Trading has risk. Big risk. Make sure that you can balance your risk/reward, and trade small, and trade often.