The Poisson Distribution is a fundamental statistical tool used to model the number of times an event occurs within a fixed interval of time or space, given a known average rate and independence between events.
In Python, we can easily simulate and analyze Poisson-distributed data using NumPy.
What is the Poisson Distribution?
The Poisson distribution expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.
Probability Mass Function (PMF)
P(X=k)=e−λ⋅λkk!P(X = k) = \frac{e^{-\lambda} \cdot \lambda^k}{k!}
Where:
-
kk: Number of events (0, 1, 2, ...)
-
λ\lambda: Average number of events in a fixed interval
-
ee: Euler's number (~2.71828)
Real-Life Examples
Scenario | λ (Expected Events) |
---|---|
Number of emails received per hour | 5 |
Number of cars passing a checkpoint per minute | 3 |
Number of server crashes per month | 1 |
Number of printing errors per page | 0.2 |
NumPy’s Poisson Function
Syntax
numpy.random.Generator.poisson(lam=1.0, size=None)
Parameters
Parameter | Description |
---|---|
lam |
The expected value (λ), average rate of occurrence |
size |
Output shape (number of samples or a tuple) |
✅ Returns
Random samples following a Poisson distribution.
✅ Example: Basic Usage
import numpy as np
# Create a random generator
rng = np.random.default_rng(seed=42)
# Simulate 1000 events with λ = 5
data = rng.poisson(lam=5, size=1000)
Visualizing Poisson Distribution
import seaborn as sns
import matplotlib.pyplot as plt
sns.histplot(data, bins=range(0, max(data)+2), discrete=True, color='skyblue')
plt.title("Poisson Distribution (λ = 5)")
plt.xlabel("Number of Events")
plt.ylabel("Frequency")
plt.show()
This histogram shows how often each count of events occurred across the 1000 simulations.
Varying λ Values
λ = 2 (Low Rate)
data = rng.poisson(lam=2, size=1000)
sns.histplot(data, bins=range(0, 10), discrete=True, color='lightgreen')
plt.title("Poisson Distribution (λ = 2)")
plt.show()
λ = 10 (Higher Rate)
data = rng.poisson(lam=10, size=1000)
sns.histplot(data, bins=range(0, 20), discrete=True, color='orange')
plt.title("Poisson Distribution (λ = 10)")
plt.show()
As λ increases, the distribution becomes more symmetric and starts to resemble a normal distribution.
Statistical Properties
For a Poisson distribution:
Mean=λVariance=λ\text{Mean} = \lambda \quad\quad \text{Variance} = \lambda
data = rng.poisson(lam=7, size=10000)
print("Sample Mean:", np.mean(data))
print("Sample Variance:", np.var(data))
Real-World Use Case: Web Server Traffic
# Average 20 requests per second
requests = rng.poisson(lam=20, size=60) # Simulate 1 minute of traffic
plt.plot(requests, marker='o')
plt.title("Web Server Requests per Second")
plt.xlabel("Second")
plt.ylabel("Requests")
plt.grid(True)
plt.show()
This simulates a traffic pattern that might help with load testing or capacity planning.
Full Code Example
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Random generator
rng = np.random.default_rng(seed=123)
# Parameters
lambda_val = 4
sample_size = 1000
# Generate Poisson-distributed data
data = rng.poisson(lam=lambda_val, size=sample_size)
# Plot
sns.histplot(data, bins=range(0, max(data)+2), discrete=True, color='lightblue')
plt.title(f"Poisson Distribution (λ = {lambda_val})")
plt.xlabel("Number of Events")
plt.ylabel("Frequency")
plt.axvline(np.mean(data), color='red', linestyle='--', label='Mean')
plt.legend()
plt.show()
# Stats
print("Sample Mean:", round(np.mean(data), 2))
print("Sample Variance:", round(np.var(data), 2))
print("Expected (λ):", lambda_val)
✅ Tips
-
✅ Use
default_rng()
for modern random number generation. -
✅ Use
size=(n, m)
to generate multidimensional data. -
✅ Large λ → distribution approaches normal.
-
✅ Best for rare event modeling over time or space.
-
✅ Set
seed
during development for reproducibility.
⚠️ Common Pitfalls
Pitfall | Description |
---|---|
❌ Negative λ values | lam must be ≥ 0 |
❌ Confusing Poisson with Binomial | Poisson has no "fixed number of trials" |
❌ Expecting decimal output | Poisson returns integers (event counts) |
❌ Small samples looking non-random | Use ≥ 1000 samples for reliable patterns |
Poisson vs. Binomial vs. Normal
Distribution | Used For | Key Parameter |
---|---|---|
Poisson | Count of independent events per interval | λ |
Binomial | Fixed number of trials (success/failure) | n, p |
Normal | Continuous, symmetric bell-curve data | μ, σ |
Conclusion
The Poisson distribution is a powerful tool for modeling event counts in a fixed time or space interval. Whether you're monitoring website traffic, predicting system failures, or analyzing arrival rates, NumPy’s poisson()
makes simulation easy.
Feature | Description |
---|---|
Function | rng.poisson(lam, size) |
Applications | Traffic, queues, system failures |
Key Parameter | λ = average rate of events |
Output | Count of occurrences (integers) |