Python NumPy: Poisson Distribution Explained

Last updated 3 weeks, 4 days ago | 90 views 75     5

Tags:- Python NumPy

The Poisson Distribution is a fundamental statistical tool used to model the number of times an event occurs within a fixed interval of time or space, given a known average rate and independence between events.

In Python, we can easily simulate and analyze Poisson-distributed data using NumPy.


What is the Poisson Distribution?

The Poisson distribution expresses the probability of a given number of events occurring in a fixed interval of time or space if these events occur with a known constant mean rate and independently of the time since the last event.

Probability Mass Function (PMF)

P(X=k)=e−λ⋅λkk!P(X = k) = \frac{e^{-\lambda} \cdot \lambda^k}{k!}

Where:

  • kk: Number of events (0, 1, 2, ...)

  • λ\lambda: Average number of events in a fixed interval

  • ee: Euler's number (~2.71828)


Real-Life Examples

Scenario λ (Expected Events)
Number of emails received per hour 5
Number of cars passing a checkpoint per minute 3
Number of server crashes per month 1
Number of printing errors per page 0.2

NumPy’s Poisson Function

Syntax

numpy.random.Generator.poisson(lam=1.0, size=None)

Parameters

Parameter Description
lam The expected value (λ), average rate of occurrence
size Output shape (number of samples or a tuple)

✅ Returns

Random samples following a Poisson distribution.


✅ Example: Basic Usage

import numpy as np

# Create a random generator
rng = np.random.default_rng(seed=42)

# Simulate 1000 events with λ = 5
data = rng.poisson(lam=5, size=1000)

Visualizing Poisson Distribution

import seaborn as sns
import matplotlib.pyplot as plt

sns.histplot(data, bins=range(0, max(data)+2), discrete=True, color='skyblue')
plt.title("Poisson Distribution (λ = 5)")
plt.xlabel("Number of Events")
plt.ylabel("Frequency")
plt.show()

This histogram shows how often each count of events occurred across the 1000 simulations.


Varying λ Values

λ = 2 (Low Rate)

data = rng.poisson(lam=2, size=1000)
sns.histplot(data, bins=range(0, 10), discrete=True, color='lightgreen')
plt.title("Poisson Distribution (λ = 2)")
plt.show()

λ = 10 (Higher Rate)

data = rng.poisson(lam=10, size=1000)
sns.histplot(data, bins=range(0, 20), discrete=True, color='orange')
plt.title("Poisson Distribution (λ = 10)")
plt.show()

As λ increases, the distribution becomes more symmetric and starts to resemble a normal distribution.


Statistical Properties

For a Poisson distribution:

Mean=λVariance=λ\text{Mean} = \lambda \quad\quad \text{Variance} = \lambda

data = rng.poisson(lam=7, size=10000)
print("Sample Mean:", np.mean(data))
print("Sample Variance:", np.var(data))

Real-World Use Case: Web Server Traffic

# Average 20 requests per second
requests = rng.poisson(lam=20, size=60)  # Simulate 1 minute of traffic

plt.plot(requests, marker='o')
plt.title("Web Server Requests per Second")
plt.xlabel("Second")
plt.ylabel("Requests")
plt.grid(True)
plt.show()

This simulates a traffic pattern that might help with load testing or capacity planning.


Full Code Example

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Random generator
rng = np.random.default_rng(seed=123)

# Parameters
lambda_val = 4
sample_size = 1000

# Generate Poisson-distributed data
data = rng.poisson(lam=lambda_val, size=sample_size)

# Plot
sns.histplot(data, bins=range(0, max(data)+2), discrete=True, color='lightblue')
plt.title(f"Poisson Distribution (λ = {lambda_val})")
plt.xlabel("Number of Events")
plt.ylabel("Frequency")
plt.axvline(np.mean(data), color='red', linestyle='--', label='Mean')
plt.legend()
plt.show()

# Stats
print("Sample Mean:", round(np.mean(data), 2))
print("Sample Variance:", round(np.var(data), 2))
print("Expected (λ):", lambda_val)

✅ Tips

  1. ✅ Use default_rng() for modern random number generation.

  2. ✅ Use size=(n, m) to generate multidimensional data.

  3. ✅ Large λ → distribution approaches normal.

  4. ✅ Best for rare event modeling over time or space.

  5. ✅ Set seed during development for reproducibility.


⚠️ Common Pitfalls

Pitfall Description
❌ Negative λ values lam must be ≥ 0
❌ Confusing Poisson with Binomial Poisson has no "fixed number of trials"
❌ Expecting decimal output Poisson returns integers (event counts)
❌ Small samples looking non-random Use ≥ 1000 samples for reliable patterns

Poisson vs. Binomial vs. Normal

Distribution Used For Key Parameter
Poisson Count of independent events per interval λ
Binomial Fixed number of trials (success/failure) n, p
Normal Continuous, symmetric bell-curve data μ, σ

Conclusion

The Poisson distribution is a powerful tool for modeling event counts in a fixed time or space interval. Whether you're monitoring website traffic, predicting system failures, or analyzing arrival rates, NumPy’s poisson() makes simulation easy.

Feature Description
Function rng.poisson(lam, size)
Applications Traffic, queues, system failures
Key Parameter λ = average rate of events
Output Count of occurrences (integers)