In data manipulation, machine learning, or statistics, it's often necessary to shuffle data—either for splitting datasets, randomizing the order of records, or avoiding bias. NumPy’s random.permutation()
function is an essential tool for this.
This article explains what random.permutation()
does, how to use it effectively, and common mistakes to avoid.
What Is random.permutation()
?
numpy.random.Generator.permutation()
returns a randomly permuted sequence, either of:
-
A range of numbers
-
An array you supply
Unlike shuffle()
, which modifies the input array in-place, permutation()
returns a new array and leaves the original intact.
Syntax
numpy.random.Generator.permutation(x)
Parameters:
-
x
: Can be either:-
An integer
n
– returns a permutation ofnp.arange(n)
-
A sequence or ndarray – returns a permuted copy of the sequence
-
Returns:
-
A permuted array or list
Getting Started
import numpy as np
rng = np.random.default_rng()
Using the recommended generator-based API ensures reproducibility and future-proof code.
Examples
Example 1: Permute a Range
# Permute numbers 0 through 9
permuted = rng.permutation(10)
print(permuted)
Output:
[2 9 1 7 3 6 0 8 5 4]
(varies each run)
Example 2: Permute a List or Array
arr = np.array([10, 20, 30, 40])
result = rng.permutation(arr)
print("Original:", arr)
print("Permuted:", result)
Output:
Original: [10 20 30 40]
Permuted: [20 10 40 30]
The original array remains unchanged.
Example 3: Permute 2D Array Rows
data = np.array([[1, 2], [3, 4], [5, 6]])
shuffled = rng.permutation(data)
print("Shuffled Rows:\n", shuffled)
This shuffles the rows, not individual elements.
Comparison: permutation()
vs shuffle()
Feature | permutation() |
shuffle() |
---|---|---|
Returns | New array | None |
Modifies in-place | ❌ No | ✅ Yes |
Suitable for pure functions | ✅ Yes | ❌ No |
Safer for preserving original | ✅ Yes | ❌ No |
✅ Practical Use Case: Shuffle a Dataset
import numpy as np
rng = np.random.default_rng(seed=42)
# Sample dataset (features and labels)
features = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
labels = np.array([0, 1, 0, 1])
# Permutation indices
indices = rng.permutation(len(features))
# Apply to both arrays
shuffled_features = features[indices]
shuffled_labels = labels[indices]
print("Shuffled Features:\n", shuffled_features)
print("Shuffled Labels:\n", shuffled_labels)
This is a common technique in machine learning preprocessing to ensure features and labels stay in sync after shuffling.
Tips
-
✅ Use
default_rng()
to access modernpermutation()
API. -
✅ Use
permutation()
when you don’t want to modify the original data. -
✅ For in-place shuffling, use
rng.shuffle()
. -
✅ For reproducible shuffles, set a seed.
⚠️ Common Pitfalls
Pitfall | Why it's wrong |
---|---|
❌ Using np.random.permutation() instead of default_rng() |
Old API; not recommended in new code |
❌ Assuming permutation() modifies original array |
It returns a new array; original is unchanged |
❌ Applying permutation() directly to multidimensional arrays expecting element-wise shuffle |
It only permutes the first axis (e.g., rows in 2D) |
Bonus: Shuffle Columns Instead of Rows
To shuffle columns in a 2D array (not default behavior):
arr = np.array([[1, 2, 3], [4, 5, 6]])
shuffled_columns = arr[:, rng.permutation(arr.shape[1])]
print(shuffled_columns)
Conclusion
NumPy’s random.permutation()
is a powerful and safe way to shuffle arrays or generate random orderings. It’s ideal for:
-
Shuffling datasets
-
Generating random orders
-
Maintaining original data integrity
Whether you're preprocessing for machine learning, simulating random sampling, or simply scrambling sequences, permutation()
is the tool of choice when you need a non-destructive shuffle.