why is WMW test suggested instead of a t testSolutionCertain

why is WMW test suggested instead of a t test

Solution

Certain hypotheses can be tested using Student\'s t-test (maybe using Welch\'s correction for unequal variances in the two-sample case), or by a non-parametric test like the Wilcoxon paired signed rank test, the Wilcoxon-Mann-Whitney U test, or the paired sign test. How can we make a principleddecision about which test is most appropriate, particularly if the sample size is \"small\"?

Many introductory textbooks and lecture notes give a \"flowchart\" approach where normality is checked (either – inadvisedly – by normality test, or more broadly by QQ plot or similar) to decide between a t-test or non-parametric test. For the unpaired two-sample t-test there may be a further check for homogeneity of variance to decide whether to apply Welch\'s correction. One issue with this approach is the way the decision on which test to apply depends on the observed data, and how this affects the performance (power, Type I error rate) of the selected test.

Another problem is how hard checking normality is in small data sets: formal testing has low power so violations may well not be detected, but similar issues apply eyeballing the data on a QQ plot. Even egregious violations could go undetected, e.g. if the distribution is mixed but no observations were drawn from one component of the mixture. Unlike for large n, we can\'t lean on the safety-net of the Central Limit Theorem, and the asymptotic normality of the test statistic and t distribution.

One principled response to this is \"safety first\": with no way to reliably verify the normality assumption in a small sample, stick to non-parametric methods. Another is to consider any grounds for assuming normality, theoretically (e.g. variable is sum of several random components and CLT applies) or empirically (e.g. previous studies with larger n suggest variable is normal), and using a t-test only if such grounds exist. But this usually only justifies approximate normality, and on low degrees of freedom it\'s hard to judge how near normal it needs to be to avoid invalidating a t-test.

Most guides to choosing a t-test or non-parametric test focus on the normality issue. But small samples also throw up some side-issues:

If performing an \"unrelated samples\" or \"unpaired\" t-test, whether to use a Welch correction? Some people use a hypothesis test for equality of variances, but here it would have low power; others check whether SDs are \"reasonably\" close or not (by various criteria). Is it safer simply to always use the Welch correction for small samples, unless there is some good reason to believe population variances are equal?

If you see the choice of methods as a trade-off between power and robustness, claims about the asymptotic efficiency of the non-parametric methods are unhelpful. The rule of thumb that \"Wilcoxon tests have about 95% of the power of a t-test if the data really are normal, and are often far more powerful if the data is not, so just use a Wilcoxon\" is sometimes heard, but if the 95% only applies to large n, this is flawed reasoning for smaller samples.

Small samples may make it very difficult, or impossible, to assess whether a transformation is appropriate for the data since it\'s hard to tell whether the transformed data belong to a (sufficiently) normal distribution. So if a QQ plot reveals very positively skewed data, which look more reasonable after taking logs, is it safe to use a t-test on the logged data? On larger samples this would be very tempting, but with small n I\'d probably hold off unless there had been grounds to expect a log-normal distribution in the first place.

What about checking assumptions for the non-parametrics? Some sources recommend verifying a symmetric distribution before applying a Wilcoxon test (treating it as a test for location rather than stochastic dominance), which brings up similar problems to checking normality. If the reason we are applying a non-parametric test in the first place is a blind obedience to the mantra of \"safety first\", then the difficulty assessing skewness from a small sample would apparently lead us to the lower power of a paired sign test.

With these small-sample issues in mind, is there a good - hopefully citable - procedure to work through when deciding between t and non-parametric tests?

There have been several excellent answers, but a response considering other alternatives to rank tests, such as permutation tests, would also be welcome.

why is WMW test suggested instead of a t testSolutionCertain hypotheses can be tested using Student\'s t-test (maybe using Welch\'s correction for unequal varia

Get Help Now

Submit a Take Down Notice

Tutor
Tutor: Dr Jack
Most rated tutor on our site