P-Value: Definition, Calculation, Interpretation, and Examples
Key takeaways
* A p-value measures how likely the observed data (or more extreme) would be if the null hypothesis were true.
* Smaller p-values indicate stronger evidence against the null hypothesis and in favor of the alternative.
* A common threshold for “statistical significance” is 0.05, but thresholds are arbitrary and context-dependent.
* P-values do not measure effect size, the probability the null hypothesis is true, or the practical importance of a result.
Explore More Resources
What is a p-value?
A p-value (probability value) quantifies the probability of obtaining results at least as extreme as those observed, assuming the null hypothesis is true. It is used in hypothesis testing to summarize the compatibility between observed data and the null hypothesis.
How p-values are used
* Assess evidence against a null hypothesis: the smaller the p-value, the less compatible the data are with the null.
* Report results without committing to a single significance threshold: readers can judge significance for themselves.
* Compare relative strength of evidence across studies or tests.
Explore More Resources
Intuition for how a p-value is calculated
1. Specify a null hypothesis (H0) and an alternative hypothesis (H1).
2. Choose a test statistic that summarizes the relevant departure from H0 (e.g., difference in means, t-statistic).
3. Under H0, determine the probability distribution of that statistic (depends on assumptions and degrees of freedom).
4. Compute the probability (area under the distribution curve) of observing a value as extreme or more extreme than the observed statistic:
* For a lower-tailed test: probability of observing a value ≤ observed.
* For an upper-tailed test: probability of observing a value ≥ observed.
* For a two-tailed test: probability of observing a value at least as far from the reference in either direction.
Statistical software typically performs these calculations.
Interpreting p-values and significance levels
* If p ≤ α (preselected significance level, commonly 0.05), researchers often reject H0; if p > α, they fail to reject H0.
* The p-value itself is a continuous measure of evidence: 0.001 provides stronger evidence against H0 than 0.04.
* A low p-value does not prove H1 true, it only indicates that the observed data are unlikely under H0.
* P-values depend on sample size: very large samples can produce small p-values for trivially small effects; small samples may miss substantial effects.
* P-values are not the probability that the null hypothesis is true, nor do they measure the size or importance of an effect.
Explore More Resources
Common misconceptions and limitations
* Not proof: a small p-value suggests evidence against H0 but is not definitive proof.
* Not effect size: p-values do not indicate how large or meaningful an effect is.
* Sensitive to sample size: larger samples often produce smaller p-values for small differences.
* Multiple comparisons: conducting many tests inflates the chance of small p-values by chance—adjustments (e.g., Bonferroni) are needed.
* Dependence on model assumptions: incorrect assumptions about distribution, independence, or variance can invalidate p-value calculations.
Example: portfolio performance vs. S&P 500
Suppose an investor tests whether their portfolio returns equal the S&P 500 (two-tailed test).
* H0: portfolio returns = S&P 500 returns
* H1: portfolio returns ≠ S&P 500 returns
If the calculated p-value is 0.001, there is a 0.1% probability of observing returns that different (or more extreme) if H0 were true—strong evidence against H0. If p = 0.08, the evidence is weaker; under a 0.05 threshold H0 would not be rejected.
Explore More Resources
Quick FAQs
Q: Is a p-value of 0.05 significant?
A: By convention, yes: p ≤ 0.05 is often treated as statistically significant. However, this threshold is arbitrary—context and consequences should guide interpretation.
Q: What does a p-value of 0.001 mean?
A: It means that, assuming the null hypothesis is true, the chance of observing data as extreme as (or more extreme than) what was observed is 0.1%.
Explore More Resources
Q: How do you compare two p-values?
A: The smaller p-value indicates stronger evidence against the corresponding null hypothesis. For example, p = 0.001 provides stronger evidence than p = 0.04. Both might be “significant” at α = 0.05, but the smaller p-value is more compelling.
Conclusion
P-values are a useful tool for quantifying how surprising observed data are under a specific null hypothesis. They help guide decisions about rejecting or not rejecting H0, but they must be interpreted carefully alongside effect sizes, confidence intervals, sample size, study design, and domain-specific considerations.