Python NumPy: Multinomial Distribution Explained

Last updated 1 month, 3 weeks ago | 115 views 75     5

Tags:- Python NumPy

The Multinomial Distribution is a generalization of the binomial distribution. While a binomial distribution deals with the probability of success/failure over trials, a multinomial distribution deals with more than two possible outcomes — like rolling a die or choosing a color.

With NumPy, it's simple to simulate and work with multinomial outcomes for experiments, games, and probability modeling.


What is a Multinomial Distribution?

The multinomial distribution describes the probability of counts of multiple outcomes from a fixed number of independent trials, where each trial has more than two possible outcomes.

Probability Mass Function (PMF)

P(X1=x1,...,Xk=xk)=n!x1!⋅x2!⋅⋯⋅xk!⋅p1x1⋅p2x2⋯pkxkP(X_1 = x_1, ..., X_k = x_k) = \frac{n!}{x_1! \cdot x_2! \cdot \dots \cdot x_k!} \cdot p_1^{x_1} \cdot p_2^{x_2} \cdots p_k^{x_k}

Where:

  • nn is the total number of trials

  • xix_i is the number of times outcome ii occurred

  • pip_i is the probability of outcome ii

  • ∑xi=n\sum x_i = n and ∑pi=1\sum p_i = 1


Real-Life Examples

Scenario Outcomes
Rolling a die 10 times Each side of the die
Voting in an election Each candidate as an outcome
Product selection Red, Blue, Green choices
Survey responses Multiple answer categories

NumPy's multinomial() Function

NumPy allows you to sample from a multinomial distribution using:

numpy.random.Generator.multinomial(n, pvals, size=None)

Parameters

Parameter Description
n Total number of trials
pvals List of probabilities (should sum to 1)
size Number of samples to draw

✅ Returns

An array of counts of outcomes.


✅ Example: Simulate Rolling a Die

import numpy as np

rng = np.random.default_rng(seed=42)

# Roll a fair 6-sided die 10 times
outcomes = rng.multinomial(n=10, pvals=[1/6]*6)
print("Die roll outcome counts:", outcomes)

Each element in outcomes represents how many times each face appeared.


Visualizing the Results

import matplotlib.pyplot as plt

labels = ['1', '2', '3', '4', '5', '6']
plt.bar(labels, outcomes, color='skyblue')
plt.title("Die Roll Simulation (10 Trials)")
plt.xlabel("Die Face")
plt.ylabel("Frequency")
plt.grid(True, axis='y')
plt.show()

Multiple Simulations

You can simulate this experiment multiple times using the size parameter:

results = rng.multinomial(n=10, pvals=[1/6]*6, size=1000)
print("Shape:", results.shape)  # (1000, 6)

This gives you a 1000x6 matrix — each row is a single 10-trial simulation.


Plot Average Frequencies

avg_outcomes = results.mean(axis=0)

plt.bar(labels, avg_outcomes, color='orange')
plt.title("Average Die Frequencies (1000 Simulations)")
plt.xlabel("Die Face")
plt.ylabel("Average Count per 10 Rolls")
plt.grid(True)
plt.show()

Each bar should approach 10 × (1/6) = 1.67 if the die is fair.


Another Example: Voting Poll

Let’s say an election has 3 candidates with these probabilities:

  • Alice: 50%

  • Bob: 30%

  • Charlie: 20%

We survey 100 voters:

votes = rng.multinomial(n=100, pvals=[0.5, 0.3, 0.2])
candidates = ['Alice', 'Bob', 'Charlie']

plt.bar(candidates, votes, color='green')
plt.title("Simulated Votes for Candidates (n=100)")
plt.ylabel("Votes")
plt.show()

print(dict(zip(candidates, votes)))

Full Simulation: Candy Bag Problem

Imagine a bag of candies with colors:

  • Red: 40%

  • Green: 35%

  • Blue: 25%

Let’s simulate opening 500 candy bags, each containing 20 candies.

colors = ['Red', 'Green', 'Blue']
probs = [0.4, 0.35, 0.25]

# Simulate
bags = rng.multinomial(n=20, pvals=probs, size=500)

# Average count per color
avg_counts = bags.mean(axis=0)

plt.bar(colors, avg_counts, color=['red', 'green', 'blue'])
plt.title("Average Candies per Color (500 Bags)")
plt.ylabel("Average Count")
plt.grid(True)
plt.show()

print("Expected average per bag:", [p*20 for p in probs])
print("Simulated average per bag:", avg_counts.round(2))

Tips

Tip Why It’s Important
✅ Ensure pvals sum to 1 Otherwise, NumPy will raise an error or normalize
✅ Use large size for stable averages More simulations yield smoother distributions
✅ Combine with pandas Great for tabular representation and stats
✅ Use seed during development Ensures reproducibility

⚠️ Common Pitfalls

Pitfall Explanation
pvals don’t sum to 1 You’ll get a ValueError or skewed results
❌ Wrong n value n must match the number of trials, not outcomes
❌ Forgetting axis shape with size Output is (size, len(pvals)), not just len(pvals)
❌ Confusing with categorical distribution multinomial returns counts, not individual choices

Multinomial vs Binomial vs Categorical

Distribution Outcomes Returns Use Case
Binomial 2 Single value Yes/No, Success/Fail
Multinomial >2 Count vector Dice rolls, Voting
Categorical >2 One choice per trial Sampling categories (use choice)

Conclusion

The multinomial distribution is essential for simulating and modeling multi-category outcomes over repeated trials. NumPy’s multinomial() function makes it easy to:

  • Simulate dice, polls, or surveys

  • Run multiple trials at scale

  • Analyze outcome distributions


Summary

Feature Value
Function np.random.multinomial(n, pvals, size)
Input Total trials (n), outcome probabilities
Output Counts for each outcome
Use cases Dice rolls, surveys, classification