Bulk inserting data from CSV files is a common and efficient method to load large datasets into PostgreSQL databases. This guide walks you through importing CSV data using two popular libraries:
-
✅
psycopg2
(COPY
andexecute_values
) -
✅
SQLAlchemy
with Pandas
Table of Contents
-
Prerequisites
-
Sample CSV Data
-
Using
psycopg2
withCOPY
-
Using
psycopg2.extras.execute_values()
-
Using
SQLAlchemy
+pandas.to_sql()
-
Full Examples
-
Common Pitfalls
✅ 1. Prerequisites
Install required packages:
pip install psycopg2-binary sqlalchemy pandas
Prepare a PostgreSQL database and a table to insert into.
2. Sample CSV Data: students.csv
name,age
Alice,22
Bob,24
Carol,23
3. Method 1 – psycopg2
+ COPY (Fastest for Pure Bulk)
import psycopg2
conn = psycopg2.connect(
dbname="school",
user="postgres",
password="your_pass",
host="localhost",
port="5432"
)
cur = conn.cursor()
with open('students.csv', 'r') as f:
# Skip header line with next(f)
next(f)
cur.copy_from(f, 'students', sep=',', columns=('name', 'age'))
conn.commit()
cur.close()
conn.close()
⚠️ Ensure the table students(name, age)
already exists and matches the CSV structure.
⚙️ 4. Method 2 – psycopg2.extras.execute_values()
(Flexible)
This reads the CSV and inserts in batch:
import csv
import psycopg2
from psycopg2.extras import execute_values
conn = psycopg2.connect(
dbname="school",
user="postgres",
password="your_pass",
host="localhost",
port="5432"
)
cur = conn.cursor()
with open('students.csv', 'r') as f:
reader = csv.DictReader(f)
data = [(row['name'], int(row['age'])) for row in reader]
execute_values(cur,
"INSERT INTO students (name, age) VALUES %s",
data
)
conn.commit()
cur.close()
conn.close()
5. Method 3 – Using SQLAlchemy
and Pandas
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine('postgresql://postgres:your_pass@localhost/school')
df = pd.read_csv('students.csv')
df.to_sql('students', engine, if_exists='append', index=False)
✅ This method is very readable and integrates well with data processing.
6. Full Working Example with COPY
import psycopg2
conn = psycopg2.connect(
dbname="school",
user="postgres",
password="your_pass",
host="localhost",
port="5432"
)
cur = conn.cursor()
# Ensure table exists:
cur.execute("""
CREATE TABLE IF NOT EXISTS students (
id SERIAL PRIMARY KEY,
name TEXT,
age INTEGER
)
""")
with open('students.csv', 'r') as f:
next(f) # Skip the header
cur.copy_from(f, 'students', sep=',', columns=('name', 'age'))
conn.commit()
cur.close()
conn.close()
⚠️ 7. Common Pitfalls
Problem | Solution |
---|---|
CSV has header row | Use next(f) to skip header |
Incorrect column types | Validate CSV data before insert |
Encoding issues (e.g., UTF-8) | Use open(file, encoding='utf-8') if needed |
Mismatched column names | Ensure CSV headers match DB column names |
to_sql doesn't auto-create PK |
Use SQLAlchemy model for full control (if needed) |
Conclusion
For large data loads:
-
Use
COPY
for best performance -
Use
execute_values()
for more control -
Use
pandas.to_sql()
for quick integration with data analysis