The Multinomial Distribution is a generalization of the binomial distribution. While a binomial distribution deals with the probability of success/failure over trials, a multinomial distribution deals with more than two possible outcomes — like rolling a die or choosing a color.
With NumPy, it's simple to simulate and work with multinomial outcomes for experiments, games, and probability modeling.
What is a Multinomial Distribution?
The multinomial distribution describes the probability of counts of multiple outcomes from a fixed number of independent trials, where each trial has more than two possible outcomes.
Probability Mass Function (PMF)
P(X1=x1,...,Xk=xk)=n!x1!⋅x2!⋅⋯⋅xk!⋅p1x1⋅p2x2⋯pkxkP(X_1 = x_1, ..., X_k = x_k) = \frac{n!}{x_1! \cdot x_2! \cdot \dots \cdot x_k!} \cdot p_1^{x_1} \cdot p_2^{x_2} \cdots p_k^{x_k}
Where:
-
nn is the total number of trials
-
xix_i is the number of times outcome ii occurred
-
pip_i is the probability of outcome ii
-
∑xi=n\sum x_i = n and ∑pi=1\sum p_i = 1
Real-Life Examples
Scenario | Outcomes |
---|---|
Rolling a die 10 times | Each side of the die |
Voting in an election | Each candidate as an outcome |
Product selection | Red, Blue, Green choices |
Survey responses | Multiple answer categories |
NumPy's multinomial()
Function
NumPy allows you to sample from a multinomial distribution using:
numpy.random.Generator.multinomial(n, pvals, size=None)
Parameters
Parameter | Description |
---|---|
n |
Total number of trials |
pvals |
List of probabilities (should sum to 1) |
size |
Number of samples to draw |
✅ Returns
An array of counts of outcomes.
✅ Example: Simulate Rolling a Die
import numpy as np
rng = np.random.default_rng(seed=42)
# Roll a fair 6-sided die 10 times
outcomes = rng.multinomial(n=10, pvals=[1/6]*6)
print("Die roll outcome counts:", outcomes)
Each element in outcomes
represents how many times each face appeared.
Visualizing the Results
import matplotlib.pyplot as plt
labels = ['1', '2', '3', '4', '5', '6']
plt.bar(labels, outcomes, color='skyblue')
plt.title("Die Roll Simulation (10 Trials)")
plt.xlabel("Die Face")
plt.ylabel("Frequency")
plt.grid(True, axis='y')
plt.show()
Multiple Simulations
You can simulate this experiment multiple times using the size
parameter:
results = rng.multinomial(n=10, pvals=[1/6]*6, size=1000)
print("Shape:", results.shape) # (1000, 6)
This gives you a 1000x6 matrix — each row is a single 10-trial simulation.
Plot Average Frequencies
avg_outcomes = results.mean(axis=0)
plt.bar(labels, avg_outcomes, color='orange')
plt.title("Average Die Frequencies (1000 Simulations)")
plt.xlabel("Die Face")
plt.ylabel("Average Count per 10 Rolls")
plt.grid(True)
plt.show()
Each bar should approach 10 × (1/6) = 1.67
if the die is fair.
Another Example: Voting Poll
Let’s say an election has 3 candidates with these probabilities:
-
Alice: 50%
-
Bob: 30%
-
Charlie: 20%
We survey 100 voters:
votes = rng.multinomial(n=100, pvals=[0.5, 0.3, 0.2])
candidates = ['Alice', 'Bob', 'Charlie']
plt.bar(candidates, votes, color='green')
plt.title("Simulated Votes for Candidates (n=100)")
plt.ylabel("Votes")
plt.show()
print(dict(zip(candidates, votes)))
Full Simulation: Candy Bag Problem
Imagine a bag of candies with colors:
-
Red: 40%
-
Green: 35%
-
Blue: 25%
Let’s simulate opening 500 candy bags, each containing 20 candies.
colors = ['Red', 'Green', 'Blue']
probs = [0.4, 0.35, 0.25]
# Simulate
bags = rng.multinomial(n=20, pvals=probs, size=500)
# Average count per color
avg_counts = bags.mean(axis=0)
plt.bar(colors, avg_counts, color=['red', 'green', 'blue'])
plt.title("Average Candies per Color (500 Bags)")
plt.ylabel("Average Count")
plt.grid(True)
plt.show()
print("Expected average per bag:", [p*20 for p in probs])
print("Simulated average per bag:", avg_counts.round(2))
Tips
Tip | Why It’s Important |
---|---|
✅ Ensure pvals sum to 1 |
Otherwise, NumPy will raise an error or normalize |
✅ Use large size for stable averages |
More simulations yield smoother distributions |
✅ Combine with pandas | Great for tabular representation and stats |
✅ Use seed during development |
Ensures reproducibility |
⚠️ Common Pitfalls
Pitfall | Explanation |
---|---|
❌ pvals don’t sum to 1 |
You’ll get a ValueError or skewed results |
❌ Wrong n value |
n must match the number of trials, not outcomes |
❌ Forgetting axis shape with size |
Output is (size, len(pvals)) , not just len(pvals) |
❌ Confusing with categorical distribution | multinomial returns counts, not individual choices |
Multinomial vs Binomial vs Categorical
Distribution | Outcomes | Returns | Use Case |
---|---|---|---|
Binomial | 2 | Single value | Yes/No, Success/Fail |
Multinomial | >2 | Count vector | Dice rolls, Voting |
Categorical | >2 | One choice per trial | Sampling categories (use choice ) |
Conclusion
The multinomial distribution is essential for simulating and modeling multi-category outcomes over repeated trials. NumPy’s multinomial()
function makes it easy to:
-
Simulate dice, polls, or surveys
-
Run multiple trials at scale
-
Analyze outcome distributions
Summary
Feature | Value |
---|---|
Function | np.random.multinomial(n, pvals, size) |
Input | Total trials (n ), outcome probabilities |
Output | Counts for each outcome |
Use cases | Dice rolls, surveys, classification |