What is the concern with omitted variables What 2 conditions
What is the concern with omitted variables? What 2 conditions have to hold?
Solution
Omitted-variable bias (OVB) occurs when a model is created which incorrectly leaves out one or more important factors. The \"bias\" is created when the model compensates for the missing factor by over- or underestimating the effect of one of the other factors.
The main issue here is the nature of the omitted variable bias. Wikipedia states:
Two conditions must hold true for omitted-variable bias to exist in linear regression:
It\'s important to carefully note the second criterion. Your betas will only be biased under certain circumstances. Specifically, if there are two variables that contribute to the response that are correlated with each other, but you only include one of them, then (in essence) the effects of both will be attributed to the included variable, causing bias in the estimation of that parameter. So perhaps only some of your betas are biased, not necessarily all of them.
Another disturbing possibility is that your sample if your sample is not representative of the population (which it rarely really is), and you omit a relevant variable, even if it\'s uncorrelated with the other variables, this could cause a vertical shift which biases your estimate of the intercept. For example, imagine a variable, Z, increases the level of the response, and that your sample is drawn from the upper half of the Z distribution, but Z is not included in your model. Then, your estimate of the population mean response (and the intercept) will be biased high despite the fact that Z is uncorrelated with the other variables. Additionally, there is the possibility that there is an interaction between Z and variables in your model. This can also cause bias without Z being correlated with your variables.
