Python NumPy: Normal Distribution Explained

Last updated 1 month, 3 weeks ago | 123 views 75     5

Tags:- Python NumPy

The Normal Distribution, also known as the Gaussian Distribution, is one of the most important concepts in statistics and data science. It models many real-world phenomena like heights, weights, test scores, and measurement errors.

In Python, the NumPy library provides easy-to-use tools for generating and working with data that follows a normal distribution.


What is a Normal Distribution?

A normal distribution is a bell-shaped and symmetric probability distribution. It is defined by two parameters:

  • Mean (μ): The center of the distribution.

  • Standard Deviation (σ): Measures the spread; higher σ means more spread out.

The probability density function (PDF) is:

f(x)=1σ2πe−(x−μ)22σ2f(x) = \frac{1}{\sigma \sqrt{2\pi}} e^{ -\frac{(x - \mu)^2}{2\sigma^2} }


NumPy and Normal Distribution

NumPy provides a function to generate samples from a normal distribution:

Syntax

numpy.random.Generator.normal(loc=0.0, scale=1.0, size=None)

Parameters

Parameter Description
loc Mean (μ) of the distribution
scale Standard deviation (σ)
size Number of samples (shape of output)

Getting Started

import numpy as np
import matplotlib.pyplot as plt

# Create a random generator
rng = np.random.default_rng(seed=42)

# Generate normal distribution data
data = rng.normal(loc=0, scale=1, size=1000)

This generates 1000 samples from a standard normal distribution (mean=0, std=1).


Visualizing the Distribution

Use matplotlib or seaborn to understand the shape:

import seaborn as sns

sns.histplot(data, bins=30, kde=True, color='skyblue')
plt.title("Normal Distribution (μ=0, σ=1)")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
  • kde=True adds a smooth density line.


Example: Custom Mean and Standard Deviation

# μ = 50, σ = 10
data = rng.normal(loc=50, scale=10, size=1000)

sns.histplot(data, kde=True, color='lightgreen')
plt.title("Normal Distribution (μ=50, σ=10)")
plt.show()

Real-World Simulation

Simulate Students' Test Scores

# Average score: 70, Std dev: 12
scores = rng.normal(loc=70, scale=12, size=500)

# Clip scores to a 0-100 range
scores = np.clip(scores, 0, 100)

sns.histplot(scores, bins=20, kde=True)
plt.title("Simulated Student Scores")
plt.xlabel("Score")
plt.ylabel("Number of Students")
plt.show()

This creates a more realistic dataset for practical analysis.


Check Statistical Properties

print("Mean:", np.mean(scores))
print("Standard Deviation:", np.std(scores))

This verifies if the generated data aligns with the specified distribution parameters.


Generating Multidimensional Data

data_2d = rng.normal(loc=0, scale=1, size=(3, 5))
print(data_2d)

Generates a 3×5 matrix of normally distributed values.


✅ Use Case: Normal vs Non-Normal Comparison

# Normal distribution
normal_data = rng.normal(0, 1, 1000)

# Uniform distribution for comparison
uniform_data = rng.uniform(-3, 3, 1000)

# Plot
sns.kdeplot(normal_data, label="Normal", shade=True)
sns.kdeplot(uniform_data, label="Uniform", shade=True, color='orange')
plt.title("Normal vs Uniform Distribution")
plt.legend()
plt.show()

✅ Tips

  1. Use default_rng(): It’s the modern and recommended way to create random generators.

  2. ✅ Use kde=True to visualize the density shape.

  3. ✅ Always set seed for reproducibility during testing.

  4. ✅ Clip values if simulating bounded data (e.g., scores 0–100).

  5. ✅ Use large enough sample sizes (≥ 500) for smooth distributions.


⚠️ Common Pitfalls

Pitfall Explanation
❌ Confusing scale with variance scale = standard deviation (not variance!)
❌ Using legacy np.random.normal() in new projects Use default_rng().normal() instead
❌ Forgetting to set size If omitted, only a single float is returned
❌ Assuming small samples show perfect bell curve You need large enough samples to approximate a bell shape

Summary

Feature Description
Function rng.normal(loc=μ, scale=σ, size=n)
Mean (loc) Controls the center of the distribution
Std dev (scale) Controls the spread (width)
Use cases Simulating real-world numeric data
Tools NumPy, Matplotlib, Seaborn

The normal distribution is the cornerstone of statistical modeling, and with NumPy, generating and analyzing normally distributed data is both easy and powerful.


Full Code Example

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Generator
rng = np.random.default_rng(seed=123)

# Generate normal data
data = rng.normal(loc=60, scale=15, size=1000)

# Plot histogram and KDE
sns.histplot(data, bins=30, kde=True, color='skyblue')
plt.title("Normal Distribution (μ=60, σ=15)")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.axvline(np.mean(data), color='red', linestyle='--', label='Mean')
plt.legend()
plt.show()

# Print statistics
print("Sample Mean:", round(np.mean(data), 2))
print("Sample Standard Deviation:", round(np.std(data), 2))