Statistical Significance Tests in Python Using SciPy
Last updated 3 months, 3 weeks ago | 283 views 75 5

In the world of data science and research, statistical significance tests are essential for validating hypotheses, comparing datasets, and drawing conclusions from data.
Python’s SciPy
library provides a robust set of tools for performing various significance tests via the scipy.stats
module.
In this article, you will learn:
-
What significance tests are
-
The most common types of significance tests
-
How to use
scipy.stats
for t-tests, chi-squared tests, ANOVA, and more -
Code examples, tips, and common pitfalls
What Are Significance Tests?
Significance tests help determine whether the observed results in your data could have occurred by random chance. These tests typically return:
-
A test statistic: a numerical value summarizing the difference between groups
-
A p-value: the probability of obtaining results as extreme as the observed ones under the null hypothesis
If the p-value is less than a chosen significance level (commonly 0.05), the result is considered statistically significant.
Prerequisites
Install SciPy and NumPy if not already installed:
pip install scipy numpy
Import the necessary modules:
from scipy import stats
import numpy as np
1. One-Sample t-Test
Tests if the mean of a single sample is significantly different from a known value.
data = np.array([2.9, 3.0, 2.5, 2.6, 3.2])
result = stats.ttest_1samp(data, popmean=3)
print("t-statistic:", result.statistic)
print("p-value:", result.pvalue)
✅ Use When: Comparing a sample mean to a population mean.
2. Two-Sample (Independent) t-Test
Tests if two independent groups have significantly different means.
group1 = np.array([20, 22, 19, 24])
group2 = np.array([30, 29, 31, 33])
result = stats.ttest_ind(group1, group2)
print("t-statistic:", result.statistic)
print("p-value:", result.pvalue)
✅ Use When: Comparing means from two independent samples.
3. Paired t-Test
Used for related samples (e.g., before and after measurements).
before = np.array([100, 102, 98, 105])
after = np.array([110, 108, 103, 107])
result = stats.ttest_rel(before, after)
print("t-statistic:", result.statistic)
print("p-value:", result.pvalue)
✅ Use When: Comparing means from the same group at different times.
4. Chi-Square Test (Goodness-of-Fit)
Checks if observed frequencies match expected frequencies.
observed = [20, 30, 50]
expected = [25, 25, 50]
chi2, p = stats.chisquare(f_obs=observed, f_exp=expected)
print("Chi-square:", chi2)
print("p-value:", p)
✅ Use When: You want to test distributions of categorical variables.
5. Chi-Square Test of Independence
Used to determine if two categorical variables are related.
# Contingency table
data = np.array([[10, 20], [20, 40]])
chi2, p, dof, expected = stats.chi2_contingency(data)
print("Chi-square:", chi2)
print("p-value:", p)
✅ Use When: You have a contingency table and want to test independence.
6. One-Way ANOVA
Tests if the means of three or more independent groups are different.
group1 = [23, 21, 19]
group2 = [30, 32, 29]
group3 = [25, 28, 24]
f_stat, p_val = stats.f_oneway(group1, group2, group3)
print("F-statistic:", f_stat)
print("p-value:", p_val)
✅ Use When: Testing differences between more than two group means.
7. Mann-Whitney U Test
A non-parametric test for comparing two independent samples.
x = [14, 15, 16]
y = [20, 21, 22]
stat, p = stats.mannwhitneyu(x, y)
print("U statistic:", stat)
print("p-value:", p)
✅ Use When: Your data is not normally distributed.
Interpreting the p-value
p-value | Interpretation |
---|---|
< 0.01 | Strong evidence against null |
< 0.05 | Moderate evidence against null |
> 0.05 | Weak evidence; fail to reject |
Always define your null hypothesis and alternative hypothesis before testing.
✅ Tips for Performing Significance Tests
Tip | Explanation |
---|---|
Check assumptions | Some tests require normality or equal variance |
Visualize data | Use histograms or box plots to check distributions |
Use non-parametric tests when necessary | e.g., Mann-Whitney, Wilcoxon |
Report effect size | p-values don't tell you the size of the effect |
Beware multiple testing | Use corrections like Bonferroni for many tests |
⚠️ Common Pitfalls
Pitfall | Solution |
---|---|
Blindly trusting p-values | Always combine with domain knowledge and effect size |
Not checking test assumptions | Validate normality and variance conditions |
Using the wrong test | Choose based on data type, distribution, and sample relation |
Misinterpreting non-significance | Lack of significance ≠ no effect |
Summary Table
Test | Use Case | SciPy Function |
---|---|---|
One-sample t-test | Compare sample to population mean | ttest_1samp() |
Two-sample t-test | Compare means of two independent samples | ttest_ind() |
Paired t-test | Compare related samples | ttest_rel() |
Chi-square goodness-of-fit | Compare observed vs expected | chisquare() |
Chi-square independence | Test relationship in contingency table | chi2_contingency() |
ANOVA | Compare more than two means | f_oneway() |
Mann-Whitney U | Non-parametric two-sample test | mannwhitneyu() |
Final Thoughts
Understanding and correctly applying significance tests is critical for statistical analysis and scientific reporting. SciPy's scipy.stats
module makes it simple and effective to conduct these tests in Python.
Whether you're testing hypotheses in academic research, validating business metrics, or building data-driven applications, SciPy helps you stay statistically sound.