Define and describe commonly used statistics for categorical
Define and describe commonly used statistics for categorical and continuous variables to test for a statistically significant difference between two samples or measures (e.g. chi-square, t-tests, binomial proportions)
Solution
T TESTS
The two-sample t-test is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment.
There are several variations on this test.
where N1 and N2 are the sample sizes, Y1¯ and Y2¯ are the sample means, and s21 and s22 are the sample variances.
If equal variances are assumed, then the formula reduces to:
or 2-Sample t, the hypotheses are:
Null hypothesis
Alternative hypothesis
Choose one:
The difference between the population means (1- 2) is less than the hypothesized difference (0).
CHI SQUARE TEST
where the summation is for bin 1 to k, Ri is the observed frequency for bin i for sample 1, and Si is the observed frequency for bin i for sample 2. K1 and K2 are scaling constants that are used to adjust for unequal sample sizes. Specifically,
K2=ki=1Riki=1Si
This test is sensitive to the choice of bins. Most reasonable choices should produce similar, but not identical, results.
Therefore, the hypothesis that the distribution is from the specified distribution is rejected if
where CHSPPF is the chi-square percent point function with k - c degrees of freedom and a significance level of .
Dataplot supports the chi-square two sample test for either binned or unbinned data.
For unbinned data, Dataplot automatically generates binned data using the same rule as for histograms. That is, the class width is 0.3*s where s is the sample standard deviation. The upper and lower limits are the mean plus or minus 6 times the sample standard deviation (any zero frequency bins in the tails are omitted). Note that the binning computations are performed with the combined data set. As with the HISTOGRAM command, you can override these defaults using the CLASS WIDTH, CLASS UPPER, and CLASS LOWER commands.
The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives.
Syntax 1:
This syntax is used for unbinned data.
Syntax 2:
This syntax is used for binned data. The and variables contain bin frequencies and contains the bin midpoints.
Examples:
BINOMIAL PROPORTIONS
Given a set of N1 observations in a variable X1 and a set of N2 observations in a variable X2, we can compute a normal approximation test that the two proportions are equal (or alternatively, that the difference of the two proportions is equal to 0). In the following, let p1 and p2 be the population proportion of successes for samples one and two, respectively.
The hypothesis test that the two binomial proportions are equal is
where p^ is the proportion of successes for the combined sample and
p^==n1p1^+n2p2^n1+n2X1+X2n1+n2
Z>1(1/2)
Z<1(/2)
For a lower tailed test
Z<1()
For an upper tailed test
Z>1(1)
| T TESTS The two-sample t-test is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment. There are several variations on this test.
| |||||||||||
| Definition | The two-sample t-test for unpaired data is defined as:
|


