# DNA of Regression Analysis

Consider a regression model with three (1 dependent and 2 independent) variables with n observations of each, when mixed together to form the regression equation, after coefficients and the R squared there are many other aspects that are generated, though not ideal to have them as they are considered as problems but within themselves they have a story for why there there.

Regression analysis splits the dependent variable Yt into two components one that is explained by independent variable and other rests in the residuals, this is termed as R squared.

Multicollinearity:

We live in the fast pace world where even our water consumption patterns effect the rainfall, hence considering this everything effects everything. So having two of the independent variable being related (presence of multicollinearity) is inevitable. So we only hope it to be minimum possible.

Hetroscedasticity:

With time Humans learn, and operate will reduce mistakes, same is with the processes we design and policies we make. Hence there might always be some qualities that are not included in the regression (present in error term) that are related with independent variables causing Hetroscedasticity. Taking logs to the variables often considered a safe option to minimize this problem.

Auto-correlation and Stationarity:

What we are today is lot to do with what we were yesterday, and what we have today will take time to disappear. This concept is called inertia in economics, and this causes correlation of the variable with its past. Presence of too strong correlation will make the variable non-stationary (mean to say its mean and variance are not independent now). This correlation if present in residuals is called auto-correlation. There are some techniques that are robust to stationarity issue that econometricians use.

Endogeniety:

Better Government produce better institutions or better institutions make better Government, testing one relationship when in reality the opposite one is true, it causes the independent variables to correlate with the error term, this creates the problem of Endogeniety. This problem is not easy to solve if the relationship is two way, the possible solutions are to make a system of simultaneous equations or use the instruments to break the reverse relationship.

Normality:

Normality tells how many outliers are present in the data series, and how far is the median from mean showing the tendency of the individual observation to fall.

A concise conclusion, these issues should not be considered as head aches, they are sources of information, that can help in interpreting the reality better.