if the true mean is really 20, what sample size must be used to ensure that the probability of type

Problems in Estimating Sample Size for Hypothesis Testing

In the module on hypothesis testing for means and proportions, we introduced techniques for means, proportions, differences in means, and differences in proportions. While each test involved details that were specific to the outcome of interest (e.thou., continuous or dichotomous) and to the number of comparison groups (ane, two, more than two), there were common elements to each test. For case, in each exam of hypothesis, at that place are two errors that can exist committed. The offset is called a Blazon I error and refers to the situation where we incorrectly reject H₀ when in fact it is truthful. In the commencement pace of any test of hypothesis, we select a level of significance, α , and α = P(Blazon I error) = P(Reject H₀ | H₀ is true). Because we purposely select a small value for α , we command the probability of committing a Type I error. The 2d blazon of error is chosen a Type Two error and it is defined as the probability nosotros do not decline H₀ when information technology is simulated. The probability of a Type II error is denoted β , and β =P(Blazon 2 error) = P(Practice non Reject H₀ | H₀ is simulated). In hypothesis testing, we usually focus on ability, which is defined as the probability that we refuse H₀ when it is false, i.e., power = 1- β = P(Reject H₀ | H₀ is false). Ability is the probability that a exam correctly rejects a false zilch hypothesis. A good examination is one with low probability of committing a Type I error (i.e., minor α ) and high power (i.e., small β, high power).

Here nosotros nowadays formulas to decide the sample size required to ensure that a exam has high ability. The sample size computations depend on the level of significance, aα, the desired ability of the test (equivalent to i-β), the variability of the outcome, and the effect size. The effect size is the departure in the parameter of interest that represents a clinically meaningful divergence. Like to the margin of mistake in confidence interval applications, the issue size is determined based on clinical or practical criteria and non statistical criteria.

The concept of statistical ability tin can be difficult to grasp. Before presenting the formulas to make up one's mind the sample sizes required to ensure high power in a examination, we will offset discuss power from a conceptual point of view.

Suppose we want to test the following hypotheses at aα=0.05: H₀: μ = 90 versus H₁: μ ≠ ninety. To test the hypotheses, suppose we select a sample of size northward=100. For this instance, assume that the standard departure of the consequence is σ=20. We compute the sample mean and then must decide whether the sample mean provides bear witness to support the culling hypothesis or not. This is done by computing a test statistic and comparing the test statistic to an appropriate disquisitional value. If the nothing hypothesis is truthful (μ=90), then we are likely to select a sample whose mean is shut in value to 90. However, it is besides possible to select a sample whose mean is much larger or much smaller than 90. Recall from the Primal Limit Theorem (encounter folio 11 in the module on Probability), that for large n (here n=100 is sufficiently big), the distribution of the sample means is approximately normal with a mean of

equation image indicator

and

equation image indicator

If the null hypothesis is true, it is possible to observe any sample mean shown in the figure beneath; all are possible under H₀: μ = 90.

Normal distribution of X when the mean of X is 90. A bell-shaped curve with a value of X-90 at the center.

When nosotros set up the decision rule for our exam of hypothesis, we decide disquisitional values based on α=0.05 and a ii-sided test. When we run tests of hypotheses, nosotros usually standardize the data (e.g., convert to Z or t) and the disquisitional values are advisable values from the probability distribution used in the test. To facilitate interpretation, we will continue this give-and-take with as opposed to Z. The critical values for a two-sided test with α=0.05 are 86.06 and 93.92 (these values correspond to -1.96 and i.96, respectively, on the Z calibration), so the decision rule is every bit follows: Reject H₀ if < 86.06 or if > 93.92. The rejection region is shown in the tails of the figure beneath.

Rejection Region for Test H₀: μ = ninety versus H_i: μ ≠ 90 at α =0.05

Standard normal distribution showing a mean of 90. The rejection areas are in the two tails at the extremes above and below the mean. If the alpha level is 0.05, then each tail accounts for an arean of 0.025. .

The areas in the ii tails of the curve represent the probability of a Blazon I Error, α= 0.05. This concept was discussed in the module on Hypothesis Testing.

Now, suppose that the alternative hypothesis, H_i, is truthful (i.eastward., μ ≠ xc) and that the true hateful is really 94. The effigy below shows the distributions of the sample mean under the null and alternative hypotheses.The values of the sample mean are shown along the horizontal axis.

Distribution of equation image indicator Under H₀: μ = ninety and Under H_ane: μ = 94

Two overlapping normal distributions, one depicting the null hypothesis with a mean of 90 and the other showing the alternative hypothesis with a mean of 94. A more complete explanation of the figure is provided in the text below the figure.

If the truthful mean is 94, then the culling hypothesis is true. In our test, we selected α = 0.05 and reject H₀ if the observed sample hateful exceeds 93.92 (focusing on the upper tail of the rejection region for now). The disquisitional value (93.92) is indicated by the vertical line. The probability of a Blazon II mistake is denoted β, and β = P(Practice not Reject H₀ | H₀ is faux), i.e., the probability of not rejecting the cipher hypothesis if the nada hypothesis were true. β is shown in the effigy above as the surface area under the rightmost curve (H₁) to the left of the vertical line (where we do not reject H₀). Power is defined as 1- β = P(Pass up H₀ | H₀ is false) and is shown in the figure as the surface area under the rightmost curve (H_ane) to the right of the vertical line (where nosotros refuse H₀ ).

Annotation that β and power are related to α, the variability of the outcome and the effect size. From the effigy above we can meet what happens to β and power if we increase α. Suppose, for case, we increase α to α=0.10.The upper critical value would be 92.56 instead of 93.92. The vertical line would shift to the left, increasing α, decreasing β and increasing power. While a better test is one with college power, it is not advisable to increase α as a means to increase power. Nonetheless, there is a direct relationship between α and power (as α increases, and so does power).

β and power are also related to the variability of the result and to the effect size. The issue size is the difference in the parameter of interest (e.chiliad., μ) that represents a clinically meaningful divergence. The figure higher up graphically displays α, β, and power when the difference in the mean under the null as compared to the alternative hypothesis is 4 units (i.east., ninety versus 94). The figure below shows the same components for the state of affairs where the hateful under the culling hypothesis is 98.

Figure - Distribution of equation image indicator Nether H₀: μ = 90 and Nether H₁: μ = 98.

Overlapping bell-shaped distributions - one with a mean of 90 and the other with a mean of 98

Notice that there is much college ability when there is a larger divergence between the mean under H₀ as compared to H₁ (i.e., ninety versus 98). A statistical test is much more likely to decline the null hypothesis in favor of the alternative if the true mean is 98 than if the true hateful is 94. Notice also in this instance that in that location is piddling overlap in the distributions under the zero and alternative hypotheses. If a sample mean of 97 or college is observed it is very unlikely that it came from a distribution whose mean is ninety. In the previous effigy for H₀: μ = xc and H₁: μ = 94, if we observed a sample mean of 93, for example, it would not be equally clear every bit to whether it came from a distribution whose hateful is 90 or one whose mean is 94.

Ensuring That a Test Has Loftier Power

In designing studies most people consider power of 80% or 90% (just every bit we mostly apply 95% every bit the conviction level for confidence interval estimates). The inputs for the sample size formulas include the desired power, the level of significance and the outcome size. The effect size is selected to stand for a clinically meaningful or practically of import departure in the parameter of involvement, as we volition illustrate.

The formulas we present below produce the minimum sample size to ensure that the test of hypothesis will have a specified probability of rejecting the null hypothesis when it is simulated (i.e., a specified power). In planning studies, investigators once more must account for attrition or loss to follow-up. The formulas shown beneath produce the number of participants needed with complete information, and we will illustrate how attrition is addressed in planning studies.

return to meridian | previous page | next folio

stanekhastrout89.blogspot.com

Source: https://sphweb.bumc.bu.edu/otlt/mph-modules/bs/bs704_power/BS704_Power6.html

if the true mean is really 20, what sample size must be used to ensure that the probability of type

Problems in Estimating Sample Size for Hypothesis Testing

Ensuring That a Test Has Loftier Power

0 Response to "if the true mean is really 20, what sample size must be used to ensure that the probability of type"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel