Define and describe commonly used statistics for categorical

Define and describe commonly used statistics for categorical and continuous variables to test for a statistically significant difference between two samples or measures (e.g. chi-square, t-tests, binomial proportions)

T TESTS

The two-sample t-test is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment.

There are several variations on this test.

where N₁ and N₂ are the sample sizes, Y1¯ and Y2¯ are the sample means, and s21 and s22 are the sample variances.

If equal variances are assumed, then the formula reduces to:

or 2-Sample t, the hypotheses are:

Null hypothesis

Alternative hypothesis

Choose one:

The difference between the population means (₁- ₂) is less than the hypothesized difference (₀).

CHI SQUARE TEST

where the summation is for bin 1 to k, R_i is the observed frequency for bin i for sample 1, and S_i is the observed frequency for bin i for sample 2. K1 and K2 are scaling constants that are used to adjust for unequal sample sizes. Specifically,

K2=ki=1Riki=1Si

This test is sensitive to the choice of bins. Most reasonable choices should produce similar, but not identical, results.

Therefore, the hypothesis that the distribution is from the specified distribution is rejected if

where CHSPPF is the chi-square percent point function with k - c degrees of freedom and a significance level of .

Dataplot supports the chi-square two sample test for either binned or unbinned data.

For unbinned data, Dataplot automatically generates binned data using the same rule as for histograms. That is, the class width is 0.3*s where s is the sample standard deviation. The upper and lower limits are the mean plus or minus 6 times the sample standard deviation (any zero frequency bins in the tails are omitted). Note that the binning computations are performed with the combined data set. As with the HISTOGRAM command, you can override these defaults using the CLASS WIDTH, CLASS UPPER, and CLASS LOWER commands.

The quantile-quantile plot, bihistogram, and Tukey mean-difference plot are graphical alternatives.

Syntax 1:

This syntax is used for unbinned data.

Syntax 2:

This syntax is used for binned data. The and variables contain bin frequencies and contains the bin midpoints.

Examples:

BINOMIAL PROPORTIONS

Given a set of N₁ observations in a variable X₁ and a set of N₂ observations in a variable X₂, we can compute a normal approximation test that the two proportions are equal (or alternatively, that the difference of the two proportions is equal to 0). In the following, let p₁ and p₂ be the population proportion of successes for samples one and two, respectively.

The hypothesis test that the two binomial proportions are equal is

where p^ is the proportion of successes for the combined sample and

p^==n1p1^+n2p2^n1+n2X1+X2n1+n2

Z>1(1/2)
Z<1(/2)

For a lower tailed test

Z<1()

For an upper tailed test

Z>1(1)

T TESTS

The two-sample t-test is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment.

There are several variations on this test.

The data may either be paired or not paired. By paired, we mean that there is a one-to-one correspondence between the values in the two samples. That is, if X₁, X₂, ..., X_n and Y₁, Y₂, ... , Y_n are the two samples, then X_i corresponds to Y_i. For paired samples, the difference X_i - Y_i is usually calculated. For unpaired samples, the sample sizes for the two samples may or may not be equal. The formulas for paired data are somewhat simpler than the formulas for unpaired data.
The variances of the two samples may be assumed to be equal or unequal. Equal variances yields somewhat simpler formulas, although with computers this is no longer a significant issue.
In some applications, you may want to adopt a new process or treatment only if it exceeds the current treatment by some threshold. In this case, we can state the null hypothesis in the form that the difference between the two populations means is equal to some constant 12=d0 where the constant is the desired threshold.

Definition

The two-sample t-test for unpaired data is defined as:

H₀:	1=2
H_a:	12
Test Statistic:	T=Y1¯Y2¯s21/N1+s22/N2 where N₁ and N₂ are the sample sizes, Y1¯ and Y2¯ are the sample means, and s21 and s22 are the sample variances. If equal variances are assumed, then the formula reduces to: T=Y1¯Y2¯sp1/N1+1/N2 where s2p=(N11)s21+(N21)s22N1+N22
Significance Level:	.
Critical Region:	Reject the null hypothesis that the two means are equal if \|T\| > t_1-/2, where t_1-/2, is the critical value of the t distribution with degrees of freedom where =(s21/N1+s22/N2)2(s21/N1)2/(N11)+(s22/N2)2/(N21) If equal variances are assumed, then = N₁ + N₂ - 2

Define and describe commonly used statistics for categorical and continuous variables to test for a statistically significant difference between two samples or

Define and describe commonly used statistics for categorical

Solution

Get Help Now

Submit a Take Down Notice