compare and contrast the concepts of effect size and statist
compare and contrast the concepts of effect size and statistical significance.
Solution
Introduction
This article discusses the overuse of significance testing and the underuse of effect sizes for reporting on the effects of intervention programs.
Effect sizes are now a standard and expected part of much statistical reporting and should be more commonly reported and understood (Thompson, 2000).
Significance testing
There are two main issues with relying on significance testing for evaluating the effects of interventions:
Significance tests should only be used in attempts to generalise the sample\'s results to a population. For example, based on a sample of 10% of our clients, we want to use the sample data to make generalised conclusions for all of our clients. In this case, the results of the sample are of little intrinsic interest. Of primary interest is what the sample data might represent about the target population.
Program evaluations not uncommonly access data from the entire population of interest, e.g., all participants in a specific program. In such situations, the full population data set is available and there is no need or interest in inferential testing because there is no need to generalise beyond the sample. In these situations, descriptive statistics and effect sizes are all that is needed.
Program evaluation studies with less than approximately 50 participants tend to lack sufficient statistical power (a function of sample size, effect size, and p level) for detecting small, medium or possibly even large effects. In such situations, the results of significance tests can be misleading because of being subject to Type II errors (incorrectly failing to reject the null hypothesis). In these situations, it can be more informative to use the effect sizes, possibly with confidence intervals.
For studies involving large sample sizes (e.g., > ~ 400), a different problem occurs with significance testing because even small effects are likely to become significant significant, although these effects may be trivial. In these situations, more attention should be paid to effect sizes than to statistical significance testing.
To sum up:
When there is no interest in generalising (e.g., we are only interested in the results for the sample), there is no need for significance testing. In these situations, effect sizes are sufficient and suitable.
When examining effects using small sample sizes, significance testing can be misleading because its subject to Type II errors. Contrary to popular opinion, statistical significance is not a direct indicator of size of effect, but rather it is a function of sample size, effect size and p level. In these situations, effect sizes and confidence intervals are more informative than significance testing.
When examining effects using large samples, significant testing can be misleading because even small or trivial effects are likely to produce statistically significant results. This can be dealt with by reporting and emphasising significance test results.
Effect sizes
Types of questions
One of the most common question driving evaluation of intervention programs is \"did we get an effect?\". However, very often, the question(s) could be more usefully phrased as \"how much effect did this program have?\" and perhaps also \"how does this effect compare with the effects of other interventions?\".
To answer the \"did we get an effect?\" question it is necessary to compare the observed result against 0. If the results are to be generalised, the comparison against 0 or a control group can be done inferentially, using statistics such as paired samples t-tests, repeated measures ANOVAs, etc. are used. Alternatively, confidence intervals can be used. If the results are not to be generalised (i.e., where the population data is available), then the question can be answered simply by examining the effect sizes.
To answer the \"how much effect did the program get?\" question, effect sizes can be used.
To answer the \"how does the effect compare with the effects of other interventions?\" question, effect sizes can be used because they are standardised and allow ready comparison with benchmarks such as effect sizes for other types of interventions obtained from meta-analyses.
Use of effect sizes can also be combined with other data, such as cost, to provide a measure of cost-effectiveness. In other words, \"how much bang (effect size) for the buck (cost)?\". This is not an uncommon question that government or philanthropic funders may wish to ask and increasingly there is a demand for such \"proven evidence\" of outcomes and cost-effectiveness in psycho-social intervention programs.
Advantages and disadvantages
Some advantages of effect size reporting are that:
It tends to be easier for practitioners to intuitively relate to effect sizes (once it is explained) than significance testing
Effect sizes facilitate ready comparison with internal or external benchmarks
Confidence intervals can be placed around effect sizes (providing an equivalent to significance testing if desired)
The main disadvantage of using effect sizes include that:
Research culture and software packages are still in transition from habitual significance testing to habitual effect size reporting. Thus, commonly used statistical packages surprisingly still tend to offer limited functionality for creating effect sizes.
Most undergraduate and postgraduate research methods and statistics courses tend to teach and overemphasise classical test theory and inferential statistical methods, and to underemphasise effect sizes and confidence intervals. In response, there has been a campaign since the 1980s to educate social scientists about the misuse of significance testing and the need for more common reporting of effect sizes. Significantly, these endeavours were recognised in changes to the 5th edition of the American Psychological Association publication manual which states that research that doesn\'t report effect sizes is inferior.
If confused about what types of statistics to report, you can report effect sizes, confidence intervals and significant test results.
Types of Effect Size
There are several types of effect size, based on either difference scores or correlations. For more information, see Valentine and Cooper (2003), Wikipedia, and Wikiversity. Effect sizes can be converted into other types of effect sizes, thus the choice of effect size is somewhat arbitrary. For program evaluation, however, standardised mean effect sizes are more commonly used.
Standardised mean effect size
Standardised mean effect sizes (such as Cohen\'s d and Hedge\'s g) are basically z scores. These effect sizes indicate the mean difference between two variables expressed in standard deviation units. A score of 0 represents no change and effect size scores can be negative or positive. The meaning of an effect size varies is dependent on the measurement context, so rules of thumb should be treated cautiously. A well-known guide is offered by Cohen (1988):
Percentile scores, based on the properties of the normal distribution, can be used to aid interpretation of standardised mean effect sizes. Percentiles can be used to indicate, for example, where someone who started on the 50th percentile could, on average, expect to end up after the intervention (compared to people who didn\'t experience the intervention). For example, imagine a child of average reading ability who participates in an intensive reading program. If the program had a small effect (ES = .2), this would raise the child\'s reading performance to the 58th percentile, a moderate effect (ES = .5) would raise the child to the 69th percentile, and a large effect (ES = .8) would raise the child to the 79th percentile. These calculations can be easily done using a free web calculator, such as at HyperStat Online.
Correlational effect sizes
Correlations (r) are effect sizes. Correlational and standardised mean effect sizes cannot be directly compared, but they can be converted. In the context of an intervention program, a correlational effect size would indicate the standardised covariance between Time (independent variable) and Outcome (dependent variable). As with standardised mean effect sizes, interpretation of the size of correlational effects needs to occur within the context of the study, but a general guide (Cohen, 1988) is that:


