Certainly! Here's a detailed article on Python Matplotlib Scatter, covering how to create scatter plots, customize appearance, add labels and colors, as well as tips, code examples, and common pitfalls.
Python Matplotlib Scatter – A Complete Guide
A scatter plot is one of the most useful types of plots in data analysis and visualization. It displays values for two variables as a collection of points, revealing patterns, correlations, and distributions in datasets.
This article walks you through creating and customizing scatter plots in Matplotlib with code examples, advanced options, tips, and common pitfalls.
What Is a Scatter Plot?
A scatter plot (also known as an XY plot) shows the relationship between two continuous variables. Each point in the plot represents a single data observation with an X (horizontal) and Y (vertical) coordinate.
Basic Scatter Plot in Matplotlib
To create a scatter plot, use:
matplotlib.pyplot.scatter(x, y)
Example
import matplotlib.pyplot as plt
x = [5, 7, 8, 7, 2, 17, 2, 9]
y = [99, 86, 87, 88, 100, 86, 103, 87]
plt.scatter(x, y)
plt.title("Basic Scatter Plot")
plt.xlabel("X Axis")
plt.ylabel("Y Axis")
plt.grid(True)
plt.show()
Customizing Scatter Plot Appearance
Change Marker Color
plt.scatter(x, y, color='red')
Change Marker Size
Use the s
parameter:
plt.scatter(x, y, s=100) # size in points^2
You can even vary the size by data:
sizes = [20, 50, 100, 80, 60, 150, 40, 70]
plt.scatter(x, y, s=sizes)
Change Marker Style
Use the marker
parameter:
Marker | Description |
---|---|
'o' |
Circle (default) |
'^' |
Triangle Up |
's' |
Square |
'*' |
Star |
'D' |
Diamond |
plt.scatter(x, y, marker='^')
Set Marker Transparency
Use the alpha
parameter (0 to 1):
plt.scatter(x, y, alpha=0.6)
Color Mapping Based on Data
Use a colormap to color points based on a third variable.
import numpy as np
x = np.random.rand(50)
y = np.random.rand(50)
colors = np.random.rand(50) # Values between 0 and 1
plt.scatter(x, y, c=colors, cmap='viridis')
plt.colorbar() # Show color scale
plt.title("Color-Mapped Scatter Plot")
plt.show()
Adding Labels and Titles
plt.title("Scatter Example")
plt.xlabel("Age")
plt.ylabel("Income")
Full Example: Multi-Dimensional Scatter Plot
import matplotlib.pyplot as plt
import numpy as np
# Random data
x = np.random.randint(10, 100, 50)
y = np.random.randint(20, 200, 50)
colors = np.random.rand(50)
sizes = np.random.randint(50, 300, 50)
plt.figure(figsize=(10, 6))
scatter = plt.scatter(x, y, c=colors, s=sizes, alpha=0.7, cmap='plasma', edgecolors='black')
plt.colorbar(scatter, label="Color Intensity")
plt.title("Multi-Dimensional Scatter Plot")
plt.xlabel("X Value")
plt.ylabel("Y Value")
plt.grid(True)
plt.show()
Scatter Plot with Annotations
You can add labels to individual points:
for i in range(len(x)):
plt.text(x[i] + 1, y[i] + 1, f"({x[i]}, {y[i]})")
Comparison: plt.plot()
vs plt.scatter()
Feature | plt.plot() |
plt.scatter() |
---|---|---|
Joins points? | Yes (lines between) | No (points only) |
Vary size/color? | Limited | Fully supported |
Suited for | Line graphs | Correlation/distribution |
Tips for Using Scatter Plots
Tip | Why It Helps |
---|---|
Use transparency (alpha ) |
Prevents overlapping clutter |
Use color maps (cmap ) |
Visualize third variable dimension |
Add edgecolors for better point visibility |
Improves clarity |
Use plt.grid(True) |
Enhances readability |
Use plt.tight_layout() |
Prevents layout issues |
⚠️ Common Pitfalls
Pitfall | Solution |
---|---|
Overlapping points | Use smaller s or alpha for transparency |
Marker size too big/small | Adjust s accordingly |
Points not colored | Ensure c and cmap are used correctly |
Marker edges not showing | Use edgecolors='black' |
Missing colorbar | Add with plt.colorbar() |
Summary Table
Parameter | Description | Example |
---|---|---|
x , y |
Coordinates | plt.scatter(x, y) |
s |
Size | s=50 |
c |
Color | c=values |
alpha |
Transparency | alpha=0.6 |
marker |
Shape | marker='^' |
cmap |
Color map | cmap='viridis' |
edgecolors |
Border | edgecolors='black' |
Conclusion
The scatter plot is an essential visualization tool in Python for revealing correlations, distributions, and clusters. With Matplotlib, you can easily create scatter plots, customize their appearance, and even add multidimensional data using color and size encodings.
What’s Next?
-
Try scatter plots with Pandas DataFrames
-
Overlay regression lines using libraries like Seaborn or NumPy
-
Create interactive scatter plots using Plotly