Statisticians use Monte Carlo (MC) simulation method to test the performance if an estimator or its test statistic. The steps of MC are as follows:
- Use a data generating process, to replicate population estimator and its properties.
- Set the sample of estimation, to generate sample estimators
- Set the number of simulations to generate several sample estimators
- To compare the properties of sample estimator with population estimator
Today’s experiment to is to check where the OLS (ordinary least squares) method provides unbiased estimate for the slope coefficient using simulations on randomly generated data. These experiments will be done using STATA statistical software package. So now we proceed toward the experiment…
x is an explanatory random variable, uniformly distributed with mean 0 and standard deviation 1.
u is an error term from normal distribution, with mean 0 and standard deviation 2. Hence there is no issue regarding OLS assumptions which is included in this model that can disturb the results.
Now the population relationship of x and u with y is as follows
Now what Monte Carlo simulation will do is to run regression of x on y with sample size 200 to generate one intercept and one slope and repeat this process 1000 times to have 1000 values of intercept and slope each. The we can see if the mean of the 1000 slope coefficients will be almost same as the population slope coefficient of 2 or not.
T = (2.0185 – 2) / 0.7296
T = 0.02
The simulation results shows that the mean of 1000 slope coefficients is 2.0185. Which is statistically same as 2 confirmed from the T test. Following histogram shows how the sample slope coefficients are distributed around the mean of 2 and almost tracing the normal distribution shape. Hence from this it can be seen that the OLS estimates are unbiased.
Lets assume that we have wrongly estimated the model , instead of using intercept, we have excluded it, so we will see how how the estimate of the slope coefficient will become biased if we try to estimate a population model with an intercept with a sample model without intercept. The properties of the slope coefficients is following:
T = (3.9801 – 2) / 10.0686
T = 0.196
Statistically the slope is 2 but see how imprecise it has become, the minimum and the maximum value is now located to far from the actual mean of 2. A close inspection of the graph can show that this graph has long tails, it means that the range of values that the slope coefficients are increased considerably when the intercept has not been used.
where as if you had used the intercept in the sample model, tough the population model does not have it then still the slope coefficient have been precise as the case one,
T = (2.0162 – 2) / 0.6913
T = 0.023
Hence we can see that if we use the intercept even thought the population have it or does not have it, the sample estimate of the slope coefficient stays smooth and precise. So it can be concluded that using intercept in the model does not harm, but removing the intercept when population have it does harm the model.
STATA Practice DO Files: Monte Carlo simulation 1 biasness