When working with numerical data in Python—especially using NumPy—visualization is essential to explore patterns, relationships, and trends. While matplotlib
is powerful, Seaborn offers a simpler and more elegant interface for statistical plotting.
In this article, you’ll learn how to:
-
Generate numerical data using NumPy
-
Create attractive statistical plots using Seaborn
-
Integrate NumPy and Seaborn for real-world data exploration
What is Seaborn?
Seaborn is a Python data visualization library built on top of matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
✅ Key Features:
-
Beautiful default themes
-
Integration with Pandas and NumPy
-
Easy plotting of statistical relationships
-
Built-in support for distributions, regression, categories, and heatmaps
Installing Seaborn
If you don’t have Seaborn installed:
pip install seaborn
Importing Required Libraries
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
Generating Data with NumPy
We’ll use NumPy to generate synthetic data for plotting.
Example: Generating Normally Distributed Data
rng = np.random.default_rng(seed=42)
data = rng.normal(loc=50, scale=15, size=1000)
This generates 1000 values centered around 50 with a standard deviation of 15.
Visualizing NumPy Data with Seaborn
1. Histogram & KDE
sns.histplot(data, kde=True, color='skyblue')
plt.title("Histogram with KDE")
plt.xlabel("Value")
plt.ylabel("Frequency")
plt.show()
-
histplot
: Histogram -
kde=True
: Adds a smooth Kernel Density Estimate (KDE)
2. Box Plot
sns.boxplot(data=data, color='lightgreen')
plt.title("Box Plot")
plt.xlabel("Value")
plt.show()
-
Useful for detecting outliers and visualizing data spread.
3. Strip Plot (Jittered points)
sns.stripplot(data=data, jitter=True, color='orange')
plt.title("Strip Plot")
plt.show()
-
Displays all data points along a single axis with slight jitter.
4. Violin Plot
sns.violinplot(data=data, color='purple')
plt.title("Violin Plot")
plt.show()
-
Combines box plot and KDE to show full distribution.
Working with 2D NumPy Data
Generate Multivariate Data
x = rng.normal(loc=0, scale=1, size=1000)
y = 2 * x + rng.normal(scale=0.5, size=1000)
Plot Relationships
sns.scatterplot(x=x, y=y, color='teal')
plt.title("Scatter Plot with NumPy Data")
plt.xlabel("X")
plt.ylabel("Y")
plt.show()
Regression Line with Seaborn
import pandas as pd
df = pd.DataFrame({'X': x, 'Y': y})
sns.lmplot(data=df, x='X', y='Y', line_kws={'color': 'red'})
plt.title("Linear Regression Line")
plt.show()
Note: Seaborn prefers Pandas DataFrames for labeled data. You can convert NumPy arrays to DataFrames for compatibility.
✅ Full Example: NumPy + Seaborn
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Generate synthetic data
rng = np.random.default_rng(seed=123)
height = rng.normal(170, 10, 500)
weight = rng.normal(65, 15, 500)
# Combine into DataFrame
df = pd.DataFrame({'Height': height, 'Weight': weight})
# Pairplot to show multiple plots
sns.pairplot(df)
plt.suptitle("Pairwise Relationships", y=1.02)
plt.show()
# Joint plot with KDE
sns.jointplot(data=df, x='Height', y='Weight', kind='kde', fill=True, cmap='coolwarm')
plt.show()
Tips
-
✅ Convert NumPy arrays to Pandas DataFrames for labeled Seaborn plots.
-
✅ Use
seed
when generating NumPy data to ensure reproducibility. -
✅ Use
sns.set_theme()
for customizing plot aesthetics globally.
sns.set_theme(style="whitegrid")
-
✅ Use Seaborn’s
pairplot()
orjointplot()
for exploring multiple variables.
⚠️ Common Pitfalls
Pitfall | Explanation |
---|---|
❌ Passing raw NumPy arrays to lmplot or pairplot |
These expect DataFrames with column names. |
❌ Forgetting to plt.show() |
Especially when using basic editors or scripts, no output will appear without it. |
❌ Ignoring axis labels | Labeled data helps with plot readability and interpretation. |
❌ Using NumPy arrays without understanding shape | For 2D plots, make sure x and y are of the same length and 1D. |
Conclusion
Seaborn + NumPy is a powerful combination for quick, effective, and aesthetically pleasing data visualizations. While NumPy gives you fast and flexible numerical operations, Seaborn makes it easy to plot the results for analysis or storytelling.
As a best practice:
-
Use NumPy for data generation and manipulation
-
Use Seaborn for visualization
-
Use Pandas as a bridge to label and structure your data when needed