Filtering is a powerful technique used to extract specific elements from a dataset that meet certain conditions. In NumPy, array filtering allows you to quickly isolate data points of interest without writing loops — making your code cleaner, faster, and more efficient.
This article covers:
-
✅ What filtering means in NumPy
-
How to filter arrays using Boolean indexing
-
Combining multiple conditions
-
Full working examples
-
Tips and Common pitfalls
What is Array Filtering?
In NumPy, filtering means extracting elements from an array that satisfy a specific condition. It is typically done using Boolean indexing, where a Boolean array is used to filter the elements of another array.
Creating a Filter in NumPy
You create a filter by applying a condition to a NumPy array. This returns a Boolean array that you use to get the values that meet the condition.
✅ Basic Example:
import numpy as np
arr = np.array([10, 20, 30, 40, 50])
filter_arr = arr > 25
print(filter_arr)
print(arr[filter_arr])
Output:
[False False True True True]
[30 40 50]
Direct Filtering (One-liner)
You can also apply the condition directly without creating a separate Boolean array:
arr = np.array([5, 10, 15, 20])
print(arr[arr >= 10])
Output:
[10 15 20]
Combining Multiple Conditions
You can use logical operators to combine multiple conditions.
✅ AND condition (&
):
arr = np.array([5, 10, 15, 20, 25])
filtered = arr[(arr > 10) & (arr < 25)]
print(filtered)
Output:
[15 20]
✅ OR condition (|
):
filtered = arr[(arr == 10) | (arr == 25)]
print(filtered)
Output:
[10 25]
⚠️ Note: Always wrap conditions in parentheses due to operator precedence.
Filtering with Custom Functions
You can also apply conditions through custom functions using np.vectorize()
:
def is_even(n):
return n % 2 == 0
arr = np.array([1, 2, 3, 4, 5, 6])
even_mask = np.vectorize(is_even)(arr)
print(arr[even_mask])
Output:
[2 4 6]
Full Working Example
import numpy as np
# Original array
arr = np.array([10, 15, 20, 25, 30, 35, 40])
# Filter values greater than 20
greater_than_20 = arr[arr > 20]
print("Greater than 20:", greater_than_20)
# Filter even numbers
even = arr[arr % 2 == 0]
print("Even numbers:", even)
# Filter numbers between 20 and 35 (inclusive)
between_20_35 = arr[(arr >= 20) & (arr <= 35)]
print("Between 20 and 35:", between_20_35)
Output:
Greater than 20: [25 30 35 40]
Even numbers: [10 20 30 40]
Between 20 and 35: [20 25 30 35]
Real-World Use Case: Filtering Missing or Invalid Values
arr = np.array([10, np.nan, 20, 0, np.nan, 30])
# Remove NaNs
cleaned = arr[~np.isnan(arr)]
print("Without NaNs:", cleaned)
Output:
[10. 20. 0. 30.]
Tips for Efficient Filtering
Tip | Benefit |
---|---|
Use Boolean indexing for readability and performance | More concise than loops |
Use bitwise operators (& , ` |
`) for combining conditions |
Always wrap conditions in parentheses | Prevents operator precedence errors |
Use ~ to invert a condition |
E.g., ~np.isnan(arr) to get non-NaN values |
Common Pitfalls
Pitfall | How to Avoid |
---|---|
Using and/or instead of & /` |
` |
Not wrapping conditions in parentheses | Always wrap expressions: (arr > 10) & (arr < 50) |
Forgetting that filters must be same shape | Ensure Boolean mask matches the array shape |
Applying filters to multidimensional arrays without specifying axis | Flatten or use advanced indexing appropriately |
Conclusion
Filtering arrays in NumPy is:
-
Fast
-
Easy to read
-
Very powerful
You can apply it to clean data, extract features, and perform conditional operations — all without writing loops.
Once you're comfortable with Boolean indexing, NumPy filtering becomes one of your best tools for manipulating and analyzing large datasets.
What’s Next?