A histogram is a powerful plot for visualizing the distribution of numerical data. It displays data by grouping values into bins and showing the frequency of values within each bin.
With Matplotlib, creating and customizing histograms is straightforward and flexible. This article covers everything from basic usage to advanced customization.
What Is a Histogram?
A histogram plots the frequency of values falling within certain ranges (called bins). It's ideal for:
-
Understanding data distribution
-
Detecting skewness, modality, and outliers
-
Comparing datasets
Unlike bar charts (which show categorical data), histograms show the distribution of continuous numerical data.
Creating a Basic Histogram in Matplotlib
import matplotlib.pyplot as plt
data = [22, 87, 5, 43, 56, 73, 55, 54, 11, 20, 51, 5, 79, 31, 27]
plt.hist(data)
plt.title("Basic Histogram")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
By default, Matplotlib chooses 10 bins.
Customizing Bins
Set Number of Bins
plt.hist(data, bins=5)
Custom Bin Edges
bins = [0, 20, 40, 60, 80, 100]
plt.hist(data, bins=bins)
Customizing Appearance
Bar Color
plt.hist(data, color='skyblue')
Edge Color
plt.hist(data, edgecolor='black')
Transparency
plt.hist(data, color='orange', alpha=0.7)
Full Example with Labels and Grid
data = [22, 87, 5, 43, 56, 73, 55, 54, 11, 20, 51, 5, 79, 31, 27]
plt.hist(data, bins=8, color='teal', edgecolor='black')
plt.title("Data Distribution")
plt.xlabel("Data Range")
plt.ylabel("Frequency")
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
Histogram with Density (Probability) Plot
Show normalized frequencies using density=True
:
plt.hist(data, bins=8, density=True, color='coral', edgecolor='black')
plt.title("Probability Distribution")
Comparing Multiple Histograms
Use label
and alpha
for overlayed histograms:
import numpy as np
data1 = np.random.normal(60, 10, 1000)
data2 = np.random.normal(70, 15, 1000)
plt.hist(data1, bins=20, alpha=0.5, label='Dataset 1')
plt.hist(data2, bins=20, alpha=0.5, label='Dataset 2')
plt.title("Histogram Comparison")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.legend()
plt.show()
Histogram Orientation: Horizontal
plt.hist(data, orientation='horizontal')
Cumulative Histogram
plt.hist(data, bins=8, cumulative=True, color='slateblue')
Accessing Histogram Data
You can unpack the returned values:
counts, bin_edges, _ = plt.hist(data, bins=5)
print("Counts:", counts)
print("Bin Edges:", bin_edges)
Histogram with KDE (Using Seaborn)
For smoother density visualization:
import seaborn as sns
sns.histplot(data, kde=True)
plt.title("Histogram with KDE")
Tips for Using Histograms
Tip | Benefit |
---|---|
Use custom bins | Reveals more or less granularity |
Use density=True |
For probability distributions |
Add edgecolor |
Improves bar clarity |
Use alpha when overlaying |
Helps distinguish histograms |
Use KDE | Smoothes out the distribution |
⚠️ Common Pitfalls
Pitfall | Solution |
---|---|
Too few/many bins | Try different bins values |
Misinterpreted histogram | Add grid, labels, and legends |
Overlayed histograms hard to read | Use alpha and different colors |
Histogram instead of bar chart | Use plt.bar() for categorical data |
Summary Table
Parameter | Description | Example |
---|---|---|
data |
Input data | plt.hist(data) |
bins |
Bin count or edges | bins=10 or bins=[0,20...] |
color |
Fill color | color='skyblue' |
edgecolor |
Outline color | edgecolor='black' |
alpha |
Transparency (0 to 1) | alpha=0.6 |
density=True |
Normalize to probability | density=True |
orientation |
'vertical' or 'horizontal' |
orientation='horizontal' |
cumulative |
Accumulate values | cumulative=True |
✅ Complete Example
import matplotlib.pyplot as plt
import numpy as np
# Simulated exam scores
scores = np.random.normal(75, 10, 1000)
plt.figure(figsize=(10, 6))
plt.hist(scores, bins=20, color='steelblue', edgecolor='black', alpha=0.7)
plt.title("Distribution of Exam Scores")
plt.xlabel("Score")
plt.ylabel("Number of Students")
plt.grid(axis='y', linestyle='--', alpha=0.6)
plt.tight_layout()
plt.show()
Conclusion
Histograms in Matplotlib provide a clear view of how your data is distributed. With simple customizations and flexible parameters, you can craft professional and insightful visualizations for both basic and advanced use cases.
What's Next?
-
Try stacked histograms with
histtype='stepfilled'
-
Combine histograms with box plots for richer analysis
-
Use interactive widgets with
ipywidgets
for bin control