What assumptions are needed for OLS estimates to have good p
What assumptions are needed for OLS estimates to have good properties (A1-A4 minimum)? What do these assumptions mean (intuitvely)? If we add A5 (homoskedasticity), what additional property does OLS have in cross section data?
Solution
Following are the assumptions required -
A1: Linearity
Yi = 0 + Xi 1 + i
Y = X +
Note that this assumption requires linearity in the parameters (), and not necessarily in the variables (X). If your model has a nonlinear variable (e.g., Yi = 0 + ln(Xi)1 + i), simply define a new variable (e.g., Wi = ln(Xi)), such that the model will be linear in the new variable (e.g., 0 + Wi 1 + i). Similarly, i can be defined as the difference between Yi and 0 + Xi 1.
We call i the disturbance term. It captures all of the variables that we do not observe (and therefore cannot account for) that affect Yi, conditional on Xi.
------------------------------------------
A2: There is “enough” variation in the explanatory variables (The Identification Condition)
In the univariate case, this assumption requires that the data contain at least two different values for X.
In the multivariate case, this assumption requires that X be an N by K matrix with rank K. This not only means that there must be variation in each individual variable (other than the constant), but also that the variables must be linearly independent. For example, if income is always exactly 10% of wealth, it would be impossible for a regression to estimate separate coefficients for the effects of income and wealth on consumption.
-------------------------------------------------------
A3: Exogeneity
E[i | X1, X2 … XN] = 0
E[ | X] = 0
If assumption 3 is satisfied, we say that X is exogenous. If it is not satisfied, we say that X is endogenous. Note that E[ | Y ] does not equal zero (it equals Y - 0 - 1 Xbar), so we say that Y is endogenous.
How can this assumption fail? If there are omitted variables (reflected in the disturbance term) that are correlated with X, then the expected value of the disturbance term, conditional on X, will be a function of X, and hence will not equal zero. If we omit the constant B0, and in fact the true 0 does not equal zero, then E[i] = 0.
--------------------------------------------------------------
A4: Normality
i ~ N(0, 2)
This assumption tells us that, under the stronger assumption of normality, OLS not only has lower variance than all other estimators that are unbiased and linear, it also has lower variance than all non-linear unbiased estimators.
___________________________________________________________________
A5: Homoscedasticity
Var[i | X1, X2 … XN] = 2
Var[ | X] = 2
This assumption requires that the variance of the disturbance be a constant that does not vary with the explanatory variables.
This tells us that, because the disturbances are random variables, any estimator based in part of the observed values of Y will also be a random variable. Like all random variables, such estimators have a mean (i.e., an expected value) and a variance.

