Possible Issues of Regression Estimates – Applied Examples

First of all we will identify an economic theory for which we can explore to identify what aspects can be creating problems

Study Performance = α + β1 (no of hours studied) + β2 (age of the student) + μ

Cross sectional model Time series models Panel data models
Studying how hours and age of different studies effect their study performance (GPA) Studying how hours and age of same student affects his performance throughout time. Studying how hours and age of students effect performance in terms of their differences and aggregated throughout time
  • Micronumerosity

Using of non-representative data or too small sample size to create policy for all the population will represent the issue which is statistically called micronumerosity.

  • Multicollinearity

There can be case where the older student might be engaged in the family activities and jobs that’s why allocating less time in study. Such that

Study Hours = f(age of the student)

This will make the independent variables related to each other in reality which will be named at multicollinearity in statistics.

It can be of two types too, time series multicollinearity and cross sectional multicollinearity but since there is no way to calculate them separately so we do not distinguish them too.

  • Non normality

Normality suggests certain condition of the data set where there is a given amount of extreme values in the data and certain amount of homogeneity (we call it kurtosis) and data should not have any grouping other than the center (mean) of the data (we call it skewness).

So if the cross sectional data is based on too many heterogeneous students such that extreme values are beyond the limit then there will be normality issue.

In time series it can be explained in terms that there are too abrupt changes like student getting a fulltime job before he is adult that will make data non normal.

  • Mis-specified

There can be a nonlinear effect of age, like higher the age more experienced he becomes so more chances that he gets higher marks. So the square term of age has positive impact which is missing in the equation when we were following OLS assumptions that the model must be linear.

  • Hetroskedasticity

It means that the variance of the model is not constant; it becomes function of some factor most probably independent variables. It is also violence in OLS assumption.

  1. Cross sectional hetroskedasticity

It exists in cross sectional or panel data models only. It occurs because of difference in the cross sections i.e. students in this model. It shows that when we have too much heterogeneous sample and we have not incorporated their differences then model end up having this issue.

  1. Time series hetroskedasticity

This exists in the time series or panel data models only. It comes only of the individual is behaving differently in time. We call it error learning model like a tailor will make more errors at the start of his career but after few years he will make very few errors. In country wise data this problem can be indication of chance in technology level in the country.

  • Autocorrelation

It means that the residuals are function of its past. They are not random as depicted in OLS assumptions. It has two types.

  1. Cross sectional Autocorrelation

It only occurs in cross sectional and panel data models. This means the in the cross sectional model where each cross sectional observation is different person and in such case if the error are related to the other error. It can only occur if the two students are studying together one is intelligent and other is not so their marks are dependent on each other. So this shows missing variable of coordination between students. Even Gujrati says that if there is cross sectional autocorrelation it means that there is missing important variables.

  1. Time series Autocorrelation

This problem only comes in time series and panel data models. It means that the residuals are correlated to its past residuals of same person. This shows interdependency of the grading. May be because of the fact that his grading depends on the grading is past semester which is a prerequisite course.  If he performs well in prerequisite course he can perform well in advanced course. In statistics this issue is called non stationarity of the dependent variable and may be independent too.

  • Un-stability

Un-stability is sudden change in environment. Consider the case if the students gets a scholarship so that he do not have to spend much time in the job for his studies which can allow him spend more time in studies. So the results are different in before and after of this even if we do no incorporate this structural break the model will become unstable.

  • Endogeneity

Endogeneity comes when the dependent variable itself is causing one of the independent variable. For example higher grade is motivating the student to spend more time in the studies.

Time Spend in Studies = f(Study Performance)

  • Contemporaneous Correlation

This is a rare issue; it occurs if error terms of two different models are related to each other like if we can write two equations

Study Performance = α + β1 (no of hours studied) + β2 (age of the student) + μ1

Wages = λ + δ1 (no of hours spend) + δ2 (experience) + μ2

Here since the total time is limited and better wages can help him spend less time in job and more time in studies. Hence both residuals are expected to be correlated with each other for same person in cross sectional model and can also be same in time series or panel data model.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s