Python NumPy random.permutation: A Complete Guide

Last updated 3 weeks, 4 days ago | 107 views 75     5

Tags:- Python NumPy

In data manipulation, machine learning, or statistics, it's often necessary to shuffle data—either for splitting datasets, randomizing the order of records, or avoiding bias. NumPy’s random.permutation() function is an essential tool for this.

This article explains what random.permutation() does, how to use it effectively, and common mistakes to avoid.


What Is random.permutation()?

numpy.random.Generator.permutation() returns a randomly permuted sequence, either of:

  • A range of numbers

  • An array you supply

Unlike shuffle(), which modifies the input array in-place, permutation() returns a new array and leaves the original intact.


Syntax

numpy.random.Generator.permutation(x)

Parameters:

  • x: Can be either:

    • An integer n – returns a permutation of np.arange(n)

    • A sequence or ndarray – returns a permuted copy of the sequence

Returns:

  • A permuted array or list


Getting Started

import numpy as np

rng = np.random.default_rng()

Using the recommended generator-based API ensures reproducibility and future-proof code.


Examples

Example 1: Permute a Range

# Permute numbers 0 through 9
permuted = rng.permutation(10)
print(permuted)

Output: [2 9 1 7 3 6 0 8 5 4] (varies each run)

Example 2: Permute a List or Array

arr = np.array([10, 20, 30, 40])
result = rng.permutation(arr)

print("Original:", arr)
print("Permuted:", result)

Output:

Original: [10 20 30 40]
Permuted: [20 10 40 30]

The original array remains unchanged.

Example 3: Permute 2D Array Rows

data = np.array([[1, 2], [3, 4], [5, 6]])
shuffled = rng.permutation(data)

print("Shuffled Rows:\n", shuffled)

This shuffles the rows, not individual elements.


Comparison: permutation() vs shuffle()

Feature permutation() shuffle()
Returns New array None
Modifies in-place ❌ No ✅ Yes
Suitable for pure functions ✅ Yes ❌ No
Safer for preserving original ✅ Yes ❌ No

✅ Practical Use Case: Shuffle a Dataset

import numpy as np

rng = np.random.default_rng(seed=42)

# Sample dataset (features and labels)
features = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
labels = np.array([0, 1, 0, 1])

# Permutation indices
indices = rng.permutation(len(features))

# Apply to both arrays
shuffled_features = features[indices]
shuffled_labels = labels[indices]

print("Shuffled Features:\n", shuffled_features)
print("Shuffled Labels:\n", shuffled_labels)

This is a common technique in machine learning preprocessing to ensure features and labels stay in sync after shuffling.


Tips

  1. Use default_rng() to access modern permutation() API.

  2. ✅ Use permutation() when you don’t want to modify the original data.

  3. ✅ For in-place shuffling, use rng.shuffle().

  4. ✅ For reproducible shuffles, set a seed.


⚠️ Common Pitfalls

Pitfall Why it's wrong
❌ Using np.random.permutation() instead of default_rng() Old API; not recommended in new code
❌ Assuming permutation() modifies original array It returns a new array; original is unchanged
❌ Applying permutation() directly to multidimensional arrays expecting element-wise shuffle It only permutes the first axis (e.g., rows in 2D)

Bonus: Shuffle Columns Instead of Rows

To shuffle columns in a 2D array (not default behavior):

arr = np.array([[1, 2, 3], [4, 5, 6]])
shuffled_columns = arr[:, rng.permutation(arr.shape[1])]
print(shuffled_columns)

Conclusion

NumPy’s random.permutation() is a powerful and safe way to shuffle arrays or generate random orderings. It’s ideal for:

  • Shuffling datasets

  • Generating random orders

  • Maintaining original data integrity

Whether you're preprocessing for machine learning, simulating random sampling, or simply scrambling sequences, permutation() is the tool of choice when you need a non-destructive shuffle.