Python NumPy: Binomial Distribution Explained

Last updated 3 weeks, 4 days ago | 91 views 75     5

Tags:- Python NumPy

In probability and statistics, the Binomial Distribution is essential for modeling scenarios with two possible outcomes: success or failure. If you’ve ever flipped a coin, taken a multiple-choice test, or measured yes/no responses, you’ve encountered a binomial event.

Python’s NumPy library makes it easy to generate, simulate, and analyze binomial distributions.


What is a Binomial Distribution?

A binomial distribution models the number of successes in a fixed number of independent trials, each with the same probability of success.

Formula

P(X=k)=(nk)⋅pk⋅(1−p)n−kP(X = k) = \binom{n}{k} \cdot p^k \cdot (1 - p)^{n - k}

Where:

  • n: number of trials

  • k: number of successes

  • p: probability of success

  • P(X=k): probability of getting exactly k successes in n trials


Real-Life Examples

Scenario Success n p
Coin flip Heads 10 0.5
Email open Opened 100 0.2
Exam question Correct answer 20 0.25

NumPy Binomial Distribution

NumPy provides a simple way to simulate binomial events using:

Syntax

numpy.random.Generator.binomial(n, p, size=None)

Parameters

Parameter Description
n Number of trials
p Probability of success (0 ≤ p ≤ 1)
size Number of experiments (samples to generate)

Return

An integer or array of counts representing how many successes occurred per experiment.


✅ Getting Started

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Create a random generator
rng = np.random.default_rng(seed=42)

# Simulate 1000 experiments: 10 coin flips each with p=0.5
data = rng.binomial(n=10, p=0.5, size=1000)

Visualizing the Binomial Distribution

sns.histplot(data, bins=range(0, 12), discrete=True, kde=False, color='lightblue')
plt.title("Binomial Distribution (n=10, p=0.5)")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.show()

Each bar shows how often a certain number of "successes" occurred in 1000 simulations.


Try Different Parameters

1. Biased Coin

data = rng.binomial(n=10, p=0.8, size=1000)
sns.histplot(data, bins=range(0, 12), discrete=True, color='green')
plt.title("Binomial Distribution (n=10, p=0.8)")
plt.show()

Notice the skew toward more successes.


2. Multiple-Choice Test Simulation

# 20 questions, 25% chance to guess correctly
guesses = rng.binomial(n=20, p=0.25, size=1000)

sns.histplot(guesses, bins=range(0, 22), discrete=True, color='orange')
plt.title("Guessing on a 20-Question Test (p=0.25)")
plt.xlabel("Correct Answers")
plt.show()

Expected Mean and Variance

Mean=n⋅pVariance=n⋅p⋅(1−p)\text{Mean} = n \cdot p \quad\quad \text{Variance} = n \cdot p \cdot (1 - p)

n, p = 10, 0.5
sample = rng.binomial(n, p, 10000)

print("Expected Mean:", n * p)
print("Sample Mean:", round(np.mean(sample), 2))

print("Expected Variance:", n * p * (1 - p))
print("Sample Variance:", round(np.var(sample), 2))

Use Case: Email Campaign Simulation

# You send 1000 emails, each has 30% chance of being opened
open_rates = rng.binomial(n=1, p=0.3, size=1000)

opened = np.sum(open_rates)
print(f"Emails Opened: {opened}/1000")

Here, n=1 simulates one trial per email—perfect for yes/no outcomes.


Full Code Example

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Parameters
n = 15      # number of trials
p = 0.6     # probability of success
size = 1000 # number of experiments

# Generator
rng = np.random.default_rng(seed=123)

# Simulate binomial outcomes
data = rng.binomial(n=n, p=p, size=size)

# Plot
sns.histplot(data, bins=range(0, n+2), discrete=True, color='skyblue')
plt.title(f"Binomial Distribution (n={n}, p={p})")
plt.xlabel("Number of Successes")
plt.ylabel("Frequency")
plt.axvline(np.mean(data), color='red', linestyle='--', label='Sample Mean')
plt.legend()
plt.show()

# Stats
print("Sample Mean:", round(np.mean(data), 2))
print("Sample Variance:", round(np.var(data), 2))
print("Expected Mean:", n * p)
print("Expected Variance:", n * p * (1 - p))

Tips

  1. ✅ Use default_rng() for better random number generation.

  2. ✅ Set n=1 for simulating Bernoulli trials.

  3. ✅ Visualize large simulations to see true distribution shape.

  4. ✅ Compare sample vs expected mean/variance to validate.


⚠️ Common Pitfalls

Pitfall Why it matters
❌ Using p > 1 or p < 0 p must be between 0 and 1
❌ Forgetting size Without size, you only get one result
❌ Confusing binomial() with binom from scipy.stats NumPy returns samples, SciPy gives the probability mass function
❌ Expecting perfect bell curves with small samples You need large size for smooth histograms

Conclusion

The Binomial Distribution is foundational in understanding probability, and with NumPy, simulating and visualizing it is effortless.

Key Concept Value
Trials (n) Fixed number of repetitions
Success Probability (p) Chance of one success
Outcome Number of successes in n trials
Use Cases Coin tosses, surveys, A/B tests, email opens

Understanding the binomial distribution is crucial for hypothesis testing, simulations, and many real-world decision-making scenarios in data science, machine learning, and business analytics.