What is the difference between linear regression and ANOVA H
What is the difference between linear regression and ANOVA? How do I know when to use which one?
Solution
Mathematically there is no difference. As Adrian nicely pointed out: the ANOVA model is a special case of a regression model in which all the predictors are categorical.
But there is a difference in the application of \"ANOVA\" and \"regression (analysis)\" (deliberatily without \"model\") that has not been addressed in the answers above:
ANOVA is a tool to check how much the residual variance is reduced by predictors in (nested regression) models, whereas the regression analysis aims to quantify effect sizes in terms of \"how much is the response expected to change when the predictor(s) change by a given amount?\". For categorical predictors this reduces to the question to \"what is the expected difference in the response between different groups/categories?\". For continuous predictors this is the questions for a slope.
To clarify: ANOVA can be applied to any regression model (no matter if the model contains only continuous, only categorical, or both kinds of predictors). ANOVA allows to assess the impact of a predictor or a whole set of predictors on the residuals: who much of the variance in the data can be explained by these predictors? The regression analysis, on the other hand, is a complementary tool to asses the quantitative relation between a predictor and the response.
Generalization:
Regression models are (general) linear models (LM). (Multiple) linear regression is the analysis of a special case of a linear model (a model with one or several continuous predictor/s), The term \"linear\" in \"linear model\" refers to the estimated coefficients and *not* to the relationship between predictor and response. A linear model can model a curved relationship; only the estimated parameters of the model must be untransformed (i.e.: linear).*
Since \"regression analysis\" and \"regression models\" are often confused, I\'d prefer to use the terrm \"linear models\". One may estimate parameters in such linear models and/or analyze the variance reduction due to (sets of) predictor(s) (-> ANOVA).
Further generalization:
For statistical models with non-normal errors, the term \"variance\" makes not much sense and is substituted by the more general \"deviance\" (http://en.wikipedia.org/wiki/Deviance_%28statistics%29). Models with error-distributions from the exponential family are then termed generalized linear models (GLM), and the pendant to the ANOVA is the analysis of deviance. It gets even more universal when we consider generalized additive models (GAM) and verctorizes generalized or additive models (VGLM/VGAM).
* Hence, Y = A*sin(X) + B is a linear model for a non-linear relationship between X and Y, whereas Y = sin(A*X) + B is not a linear model (the parameter A is \"inside\" a function, so it is not linear anymore and the model can not be formulated to have both parametersin their linear form.
For example, psychology has historically utilized ANOVA models, whereas economics has historically utilized regression. I personally find that regression is more flexible and intuitive, and rarely use ANOVA, except when comparing balance in baseline characteristics between multiple groups.
