The Chi-Square (χ²) distribution is widely used in statistical inference, especially for hypothesis testing and confidence intervals involving variances and categorical data.
With NumPy, generating and working with Chi-Square distributed values is simple and efficient for simulations and analysis.
What is the Chi-Square Distribution?
The Chi-Square distribution is a continuous probability distribution that describes the sum of the squares of independent standard normal variables.
Mathematically:
If Z1,Z2,...,Zk∼N(0,1)Z_1, Z_2, ..., Z_k \sim \mathcal{N}(0, 1) (standard normal), then:
χ2=Z12+Z22+⋯+Zk2\chi^2 = Z_1^2 + Z_2^2 + \dots + Z_k^2
This follows a Chi-Square distribution with k degrees of freedom (df).
Key Properties
Property | Description |
---|---|
Range | [0,∞)[0, \infty) |
Shape | Right-skewed (less skew with higher df) |
Mean | μ=k\mu = k |
Variance | σ2=2k\sigma^2 = 2k |
Degrees of Freedom | The number of squared standard normals |
NumPy's chisquare()
Function
numpy.random.Generator.chisquare(df, size=None)
Parameters
Parameter | Description |
---|---|
df |
Degrees of freedom (must be > 0) |
size |
Number of random samples (int/tuple) |
✅ Returns
An array of random values from a Chi-Square distribution.
✅ Example: Generate Chi-Square Data
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
rng = np.random.default_rng(seed=42)
# Generate 1000 chi-square values with 5 degrees of freedom
data = rng.chisquare(df=5, size=1000)
print(data[:5]) # First few samples
Visualizing the Distribution
sns.histplot(data, bins=40, kde=True, color='skyblue')
plt.title("Chi-Square Distribution (df=5)")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.grid(True)
plt.show()
You’ll notice a right-skewed shape typical of the Chi-Square distribution.
Varying the Degrees of Freedom
Let’s compare Chi-Square distributions with different df
values:
dfs = [1, 3, 5, 10, 20]
for df in dfs:
data = rng.chisquare(df=df, size=1000)
sns.kdeplot(data, label=f'df={df}')
plt.title("Chi-Square Distributions with Varying Degrees of Freedom")
plt.xlabel("Value")
plt.ylabel("Density")
plt.grid(True)
plt.legend()
plt.show()
Observation:
-
Lower df: Highly skewed
-
Higher df: Approaches a normal distribution
Practical Use Case: Goodness-of-Fit Test (Conceptual)
While NumPy simulates the distribution, statistical libraries like SciPy use it to perform hypothesis tests.
Chi-Square distribution is commonly used in:
-
Goodness-of-fit tests (is data distributed as expected?)
-
Test of independence in contingency tables
-
Variance tests
Example (not using NumPy directly):
from scipy.stats import chisquare
observed = [18, 22, 20, 25, 15]
expected = [20, 20, 20, 20, 20]
stat, p = chisquare(f_obs=observed, f_exp=expected)
print(f"Chi-Square statistic: {stat:.2f}, p-value: {p:.3f}")
Full Simulation Example: Generate and Analyze Chi-Square Samples
# Simulate
samples = rng.chisquare(df=10, size=10000)
# Summary
print("Mean:", np.mean(samples))
print("Expected Mean:", 10)
print("Variance:", np.var(samples))
print("Expected Variance:", 2 * 10)
# Plot
sns.histplot(samples, bins=60, kde=True, color='orange')
plt.title("Chi-Square Distribution (df=10, 10k samples)")
plt.xlabel("Value")
plt.ylabel("Density")
plt.grid(True)
plt.show()
You’ll notice the empirical mean/variance aligns closely with theory.
Tips for Using Chi-Square in NumPy
Tip | Benefit |
---|---|
✅ Use larger sample sizes | More stable visualization and analysis |
✅ Understand “df” | Higher df = smoother and more normal-like |
✅ Use with standard normal samples | Can manually compute χ² from np.random.normal |
✅ Seed the RNG for reproducibility | Ensures consistent results across runs |
⚠️ Common Pitfalls
Pitfall | Explanation |
---|---|
❌ Using negative df |
Degrees of freedom must be > 0 |
❌ Confusing with normal | Chi-Square is derived from normal, but not symmetric |
❌ Forgetting to square normal values | Manual chi-square = sum of squared standard normals |
❌ Using for means instead of variances | Chi-Square tests variances and categorical frequencies, not means |
Relation to Other Distributions
Distribution | Relation to Chi-Square |
---|---|
Normal | χ² is the sum of squares of normals |
Gamma | Chi-Square is a special case of the Gamma distribution |
F-distribution | Ratio of two scaled chi-square distributions |
Student’s t | Based on a normal and a chi-square distribution |
Conclusion
The Chi-Square distribution is vital for statistical modeling, especially when analyzing categorical data and variances. NumPy provides an easy and efficient way to generate data and perform simulations.
Summary
Feature | Value |
---|---|
Function | rng.chisquare(df, size) |
Parameters | Degrees of freedom (df > 0) |
Shape | Skewed right (less with higher df) |
Use Cases | Goodness-of-fit, test of independence |
Related to | Normal, Gamma, F-distribution |