What are some of the problems that could arise if you simply
What are some of the problems that could arise if you simply run a regression, find a fairly strong correlation and use the results to make predictions about the future? Give at least two possible problems you should keep in mind.
Solution
The first problem that arises is that the regression might be spurious, that is, X and Y may be highly correlated, but there is no logical relationship between them through which X can help explain variation in Y. For example, higher humidity may cause more sweating, and higher humidity may reduce the speed of sound, so with high humidity, there is more sweating and less speed of sound. So, sweating and speed of sound has a high negative correlation, but one variable does not cause the other, so there is no logical causal relationship.
Another problem is that the data you used for the regression may not be random, it may be biased, and hence, it maybe highly unsuitable for future predictions. Also, in future, the relationship between X and Y may not remain the same as observed in the data used for regression. This will lead to high errors in future predictions.
