I have an excel sheet with data to make regression analysis

I have an excel sheet with data to make regression analysis on.

RUN1: I run a regression with Y (sold products) as dependend variable and X1,X2,X3,X4 as independent (is that linear?). The X\'s are incomes and price of the product but also price of another product.

RUN2: I generate lnY as dependent and the independent are the same as in RUN1 (is that log-linear?).

RUN3: I take the natural logarithm of the variables too: lnY and lnX1, lnX2, lnX3 except the last which is a dummy (1 or 0 value if the consumer lives in the city or not) which I keep as a X4.

How can I compare these models and choose which is the best model to use? Please describe thoroughly, I have little statistic knowledge. Depending on the model and if it\'s log-linear or linear, the interpretation of the coefficients changes don\'t they?

Solution

How can I compare these models and choose which is the best model to use?

Generally, you choose the models that have higher adjusted and predicted R-squared values. These statistics are designed to avoid a key problem with regular R-squared—it increases every time you add a predictor and can trick you into specifying an overly complex model.

OR

A parsimonious model is a model that accomplishes a desired level of explanation or prediction with as few predictor variables as possible.

For model evaluation there are different methods depending on what you want to know.

There are generally two ways of evaluating a model: Based on predictions and based on goodness of fit on the current data.

In the first case you want to know if your model adequately predicts new data, in the second you want to know whether your model adequatelly describes the relations in your current data. Those are two different things.

Evaluating based on predictions

The best way to evaluate models used for prediction, is crossvalidation.

Very briefly, you cut your dataset in eg. 10 different pieces, use 9 of them to build the model and predict the outcomes for the tenth dataset.

A simple mean squared difference between the observed and predicted values give you a measure for the prediction accuracy.

As you repeat this ten times, you calculate the mean squared difference over all ten iterations to come to a general value with a standard deviation.

This allows you again to compare two models on their prediction accuracy using standard statistical techniques (t-test or ANOVA).

A variant on the theme is the PRESS criterion (Prediction Sum of Squares).

This criterion is especially useful if you don\'t have much data. In that case, splitting your data like in the crossvalidation approach might result in subsets of data that are too small for a stable fitting.

Evaluating based on goodness of fit

Let me first state that this really differs depending on the model framework you use.

For example, a likelihood-ratio test can work for Generalized Additive Mixed Models when using the classic gaussian for the errors, but is meaningless in the case of the binomial variant.

First you have the more intuitive methods of comparing models.

You can use the Aikake Information Criterion (AIC) or the Bayesian Information Criterion (BIC) to compare the goodness of fit for two models. But nothing tells you that both models really differ.

Another one is the Mallow\'s Cp criterion. This essentially checks for possible bias in your model, by comparing the model with all possible submodels.

This is the multiple regression situation whereas there are three constants (B1, B2 and B3).

The estimated rate of change of the conditional mean of Y with respect to x1, when x2 and x3 is fixed.

The estimated rate of change of the conditional mean of Y with respect to x2, when x1 and x3 is fixed.

The estimated rate of change of the conditional mean of Y with respect to x3, when x1 and x2 is fixed.

t-test is used for which variable is statistically significant.

I have an excel sheet with data to make regression analysis on. RUN1: I run a regression with Y (sold products) as dependend variable and X1,X2,X3,X4 as indepen
I have an excel sheet with data to make regression analysis on. RUN1: I run a regression with Y (sold products) as dependend variable and X1,X2,X3,X4 as indepen

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site