JSON (JavaScript Object Notation) is a popular format for exchanging data — commonly used in APIs, web services, and configuration files. In Python, working with JSON is simple and efficient using the Pandas library.
In this guide, you’ll learn:
-
✅ What JSON is and why it matters
-
✅ How to read JSON files with
pandas.read_json()
-
✅ Handling nested JSON
-
✅ Common parameters and use cases
-
✅ Full working code examples
-
✅ Tips and common pitfalls
What is JSON?
JSON represents data in a hierarchical format using key-value pairs. For example:
{
"name": "Alice",
"age": 25,
"skills": ["Python", "SQL"]
}
This format is readable by both humans and machines and is widely used in web APIs and configuration files.
Why Use Pandas to Read JSON?
Pandas offers powerful methods to convert JSON directly into a DataFrame — making it easier to analyze, filter, and manipulate structured data.
pandas.read_json()
reads JSON strings or files and returns a DataFrame or Series.
Basic Syntax
import pandas as pd
df = pd.read_json('data.json')
print(df.head())
-
If
data.json
is a local file, Pandas will read it and convert it into a DataFrame. -
If the JSON is properly formatted, you'll get a table with columns and rows ready to work with.
Reading JSON from Different Sources
1️⃣ Reading from a Local File
df = pd.read_json('employees.json')
2️⃣ Reading from a JSON String
import json
json_data = '''
[
{"name": "Alice", "age": 25},
{"name": "Bob", "age": 30}
]
'''
df = pd.read_json(json_data)
print(df)
File Structure Matters
Example 1: JSON Array (Common Case)
[
{"id": 1, "name": "Alice"},
{"id": 2, "name": "Bob"}
]
✅ Works directly with pd.read_json()
.
Example 2: Nested JSON
{
"employees": [
{"name": "Alice", "age": 25},
{"name": "Bob", "age": 30}
]
}
❌ This won’t work directly — use json_normalize()
:
import json
from pandas import json_normalize
with open('nested.json') as f:
data = json.load(f)
df = json_normalize(data['employees'])
print(df)
Common Parameters in read_json()
Parameter | Description |
---|---|
path_or_buf |
File path, URL, or string |
orient |
Format of JSON string (records , split , index , columns , values , table ) |
typ |
Return type: frame (default) or series |
convert_dates |
Try to parse datetimes |
lines |
Set to True for JSON lines format |
encoding |
Encoding for the input (e.g., 'utf-8' ) |
Reading JSON Lines (.jsonl
or .ndjson
)
Some large files are stored as JSON Lines — one record per line.
{"id":1,"name":"Alice"}
{"id":2,"name":"Bob"}
Use:
df = pd.read_json('data.jsonl', lines=True)
Full Working Example
Let’s say you have a JSON file students.json
:
[
{"name": "Alice", "scores": {"math": 90, "english": 85}},
{"name": "Bob", "scores": {"math": 78, "english": 82}}
]
Step-by-step Code
import json
import pandas as pd
from pandas import json_normalize
# Load the JSON data from file
with open('students.json') as file:
data = json.load(file)
# Normalize nested fields
df = json_normalize(data, sep='_')
print(df)
Output
name scores_math scores_english
0 Alice 90 85
1 Bob 78 82
You can now perform any Pandas operations on this DataFrame!
⚠️ Common Pitfalls
Problem | Solution |
---|---|
❌ ValueError: Expected object or value |
File might be empty or invalid JSON |
❌ Nested JSON isn’t flattened | Use json_normalize() to flatten |
❌ JSON lines format not working | Set lines=True in read_json() |
❌ Wrong orientation | Try changing orient='records' or 'index' |
Pro Tips
-
Use
json_normalize()
for nested structures and hierarchical JSON. -
Combine with
requests
to load JSON from web APIs:import requests url = 'https://api.example.com/users' response = requests.get(url) df = pd.json_normalize(response.json())
-
Always inspect your DataFrame with
.info()
and.head()
before doing analysis.
Summary
Reading JSON with Pandas is a fast and efficient way to load and analyze structured data. Whether it's a flat file or nested structure, Pandas provides all the tools you need.
✅ Key Takeaways:
-
Use
pd.read_json()
for flat JSON -
Use
json_normalize()
for nested JSON -
Use
lines=True
for.jsonl
or newline-delimited JSON -
Handle API data or web sources using
requests