In probability and statistics, the Binomial Distribution is essential for modeling scenarios with two possible outcomes: success or failure. If you’ve ever flipped a coin, taken a multiple-choice test, or measured yes/no responses, you’ve encountered a binomial event.
Python’s NumPy library makes it easy to generate, simulate, and analyze binomial distributions.
What is a Binomial Distribution?
A binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success.
Formula
P(X=k)=(nk)⋅pk⋅(1−p)n−kP(X = k) = \binom{n}{k} \cdot p^k \cdot (1 - p)^{n - k}
Where:
-
n
: number of trials -
k
: number of successes -
p
: probability of success -
P(X=k)
: probability of getting exactlyk
successes inn
trials
Real-Life Examples
Scenario | Success | n | p |
---|---|---|---|
Coin flip | Heads | 10 | 0.5 |
Email open | Opened | 100 | 0.2 |
Exam question | Correct answer | 20 | 0.25 |
NumPy Binomial Distribution
NumPy provides a simple way to simulate binomial events using:
Syntax
numpy.random.Generator.binomial(n, p, size=None)
Parameters
Parameter | Description |
---|---|
n |
Number of trials |
p |
Probability of success (0 ≤ p ≤ 1) |
size |
Number of experiments (samples to generate) |
Return
An integer or array of counts representing how many successes occurred per experiment.
✅ Getting Started
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Create a random generator
rng = np.random.default_rng(seed=42)
# Simulate 1000 experiments: 10 coin flips each with p=0.5
data = rng.binomial(n=10, p=0.5, size=1000)
Visualizing the Binomial Distribution
sns.histplot(data, bins=range(0, 12), discrete=True, kde=False, color='lightblue')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()
Each bar shows how often a certain number of "successes" occurred in 1000 simulations.
Try Different Parameters
1. Biased Coin
data = rng.binomial(n=10, p=0.8, size=1000)
sns.histplot(data, bins=range(0, 12), discrete=True, color='green')
plt.title("Binomial Distribution (n=10, p=0.8)")
plt.show()
Notice the skew toward more successes.
2. Multiple-Choice Test Simulation
# 20 questions, 25% chance to guess correctly
guesses = rng.binomial(n=20, p=0.25, size=1000)
sns.histplot(guesses, bins=range(0, 22), discrete=True, color='orange')
plt.title("Guessing on a 20-Question Test (p=0.25)")
plt.xlabel("Correct Answers")
plt.show()
Expected Mean and Variance
Mean=n⋅pVariance=n⋅p⋅(1−p)\text{Mean} = n \cdot p \quad\quad \text{Variance} = n \cdot p \cdot (1 - p)
n, p = 10, 0.5
sample = rng.binomial(n, p, 10000)
print("Expected Mean:", n * p)
print("Sample Mean:", round(np.mean(sample), 2))
print("Expected Variance:", n * p * (1 - p))
print("Sample Variance:", round(np.var(sample), 2))
Use Case: Email Campaign Simulation
# You send 1000 emails, each has 30% chance of being opened
open_rates = rng.binomial(n=1, p=0.3, size=1000)
opened = np.sum(open_rates)
print(f"Emails Opened: {opened}/1000")
Here, n=1
simulates one trial per email—perfect for yes/no outcomes.
Full Code Example
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Parameters
n = 15 # number of trials
p = 0.6 # probability of success
size = 1000 # number of experiments
# Generator
rng = np.random.default_rng(seed=123)
# Simulate binomial outcomes
data = rng.binomial(n=n, p=p, size=size)
# Plot
sns.histplot(data, bins=range(0, n+2), discrete=True, color='skyblue')
plt.title(f"Binomial Distribution (n={n}, p={p})")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.axvline(np.mean(data), color='red', linestyle='--', label='Sample Mean')
plt.legend()
plt.show()
# Stats
print("Sample Mean:", round(np.mean(data), 2))
print("Sample Variance:", round(np.var(data), 2))
print("Expected Mean:", n * p)
print("Expected Variance:", n * p * (1 - p))
Tips
-
✅ Use
default_rng()
for better random number generation. -
✅ Set
n=1
for simulating Bernoulli trials. -
✅ Visualize large simulations to see true distribution shape.
-
✅ Compare sample vs expected mean/variance to validate.
⚠️ Common Pitfalls
Pitfall | Why it matters |
---|---|
❌ Using p > 1 or p < 0 |
p must be between 0 and 1 |
❌ Forgetting size |
Without size , you only get one result |
❌ Confusing binomial() with binom from scipy.stats |
NumPy returns samples, SciPy gives the probability mass function |
❌ Expecting perfect bell curves with small samples | You need large size for smooth histograms |
Conclusion
The Binomial Distribution is foundational in understanding probability, and with NumPy, simulating and visualizing it is effortless.
Key Concept | Value |
---|---|
Trials (n ) |
Fixed number of repetitions |
Success Probability (p ) |
Chance of one success |
Outcome | Number of successes in n trials |
Use Cases | Coin tosses, surveys, A/B tests, email opens |
Understanding the binomial distribution is crucial for hypothesis testing, simulations, and many real-world decision-making scenarios in data science, machine learning, and business analytics.