Extracting data is a core task in any data analysis workflow. Whether you want to retrieve specific rows, columns, or values based on conditions — Pandas offers powerful tools to extract exactly what you need, quickly and efficiently.
In this article, you’ll learn:
-
✅ What extracting data means in Pandas
-
✅ How to extract columns, rows, and individual values
-
✅ Extracting data with conditions
-
✅ Using
.loc[]
,.iloc[]
, and Boolean indexing -
✅ Full working examples
-
✅ Tips and common pitfalls
What Is Data Extraction in Pandas?
In Pandas, data extraction refers to retrieving subsets of data from a DataFrame or Series. This includes:
-
Columns (features)
-
Rows (records)
-
Cell values (individual items)
-
Subsets based on logical conditions
Step 1: Import Pandas and Create Sample Data
import pandas as pd
# Sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 45],
'Score': [88, 92, 85, 90, 95]
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age Score
0 Alice 25 88
1 Bob 30 92
2 Charlie 35 85
3 David 40 90
4 Eva 45 95
Extracting Columns
Single Column:
df['Name']
Multiple Columns:
df[['Name', 'Score']]
Use a list inside brackets to select multiple columns.
Extracting Rows
Using Slicing:
df[1:4] # Rows at index 1, 2, and 3
Using .loc[]
(label-based):
df.loc[0:2] # Includes both 0 and 2
Using .iloc[]
(position-based):
df.iloc[1:4] # Includes index positions 1, 2, and 3
Extracting Specific Values
Extract a cell value using .loc[]
:
df.loc[2, 'Score'] # Score of row with label 2
Using .iloc[]
:
df.iloc[2, 2] # Value at row 2, column 2
Extracting Data with Conditions (Boolean Indexing)
Example: Extract rows where Score
> 90
df[df['Score'] > 90]
Example: Extract rows where Age
is between 30 and 40
df[(df['Age'] >= 30) & (df['Age'] <= 40)]
Example: Extract only the names of students with Score
> 90
df[df['Score'] > 90]['Name']
Full Working Example
import pandas as pd
# Create sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
'Age': [25, 30, 35, 40, 45],
'Score': [88, 92, 85, 90, 95]
}
df = pd.DataFrame(data)
# Extract a single column
names = df['Name']
# Extract rows 1 to 3
subset_rows = df.iloc[1:4]
# Extract a specific cell value
value = df.loc[2, 'Score']
# Extract rows where Score > 90
high_scores = df[df['Score'] > 90]
# Extract only names of those with Score > 90
top_scorers = df[df['Score'] > 90]['Name']
# Display extracted data
print("Names Column:\n", names)
print("\nRows 1 to 3:\n", subset_rows)
print("\nScore at row 2:", value)
print("\nHigh Scorers:\n", high_scores)
print("\nNames with High Scores:\n", top_scorers)
Tips & Best Practices
-
Always use
.loc[]
for label-based extraction. -
Use
.iloc[]
when you want to access by position, not labels. -
Chain operations carefully to avoid performance issues and SettingWithCopy warnings.
-
Combine Boolean conditions with
&
,|
, and use parentheses()
.
⚠️ Common Pitfalls
Problem | Solution |
---|---|
KeyError for column name | Make sure column names are correct and case-sensitive |
Chained indexing issues | Use .loc[] or .iloc[] to avoid ambiguous assignments |
Forgetting parentheses in multiple conditions | Always wrap each condition in () when using & or ` |
Summary
Pandas makes extracting data simple and efficient, whether you're retrieving a single value, a full column, or rows that meet a condition. Mastering these techniques is crucial for filtering, transforming, and analyzing your data.
Key Takeaways:
-
Use direct slicing,
.loc[]
,.iloc[]
, and Boolean indexing for flexible extraction -
Always confirm the type of data you’re working with (row, column, value)
-
Combine extraction with filtering for powerful data operations