DNA of Regression Analysis

Consider a regression model with three (1 dependent and 2 independent) variables with n observations of each, when mixed together to form the regression equation, after coefficients and the R squared there are many other aspects that are generated, though not ideal to have them as they are considered as problems but within themselves they have a story for why there there.

Decomposition of information within Regression Equation
Decomposition of information within Regression Equation

Regression analysis splits the dependent variable Yt into two components one that is explained by independent variable and other rests in the residuals, this is termed as R squared.


We live in the fast pace world where even our water consumption patterns effect the rainfall, hence considering this everything effects everything. So having two of the independent variable being related (presence of multicollinearity) is inevitable. So we only hope it to be minimum possible.


With time Humans learn, and operate will reduce mistakes, same is with the processes we design and policies we make. Hence there might always be some qualities that are not included in the regression (present in error term) that are related with independent variables causing Hetroscedasticity. Taking logs to the variables often considered a safe option to minimize this problem.

Auto-correlation and Stationarity:

What we are today is lot to do with what we were yesterday, and what we have today will take time to disappear. This concept is called inertia in economics, and this causes correlation of the variable with its past. Presence of too strong correlation will make the variable non-stationary (mean to say its mean and variance are not independent now). This correlation if present in residuals is called auto-correlation. There are some techniques that are robust to stationarity issue that econometricians use.

components of regression


Better Government produce better institutions or better institutions make better Government, testing one relationship when in reality the opposite one is true, it causes the independent variables to correlate with the error term, this creates the problem of Endogeniety. This problem is not easy to solve if the relationship is two way, the possible solutions are to make a system of simultaneous equations or use the instruments to break the reverse relationship.


Normality tells how many outliers are present in the data series, and how far is the median from mean showing the tendency of the individual observation to fall.

A concise conclusion, these issues should not be considered as head aches, they are sources of information, that can help in interpreting the reality better.

Please provide your views below and share it to others.

2 thoughts on “DNA of Regression Analysis

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s