Python PostgreSQL – Bulk Insert from CSV File

Last updated 1 month, 4 weeks ago | 135 views 75     5

Bulk inserting data from CSV files is a common and efficient method to load large datasets into PostgreSQL databases. This guide walks you through importing CSV data using two popular libraries:

  • psycopg2 (COPY and execute_values)

  • SQLAlchemy with Pandas


Table of Contents

  1. Prerequisites

  2. Sample CSV Data

  3. Using psycopg2 with COPY

  4. Using psycopg2.extras.execute_values()

  5. Using SQLAlchemy + pandas.to_sql()

  6. Full Examples

  7. Common Pitfalls


✅ 1. Prerequisites

Install required packages:

pip install psycopg2-binary sqlalchemy pandas

Prepare a PostgreSQL database and a table to insert into.


2. Sample CSV Data: students.csv

name,age
Alice,22
Bob,24
Carol,23

3. Method 1 – psycopg2 + COPY (Fastest for Pure Bulk)

import psycopg2

conn = psycopg2.connect(
    dbname="school",
    user="postgres",
    password="your_pass",
    host="localhost",
    port="5432"
)
cur = conn.cursor()

with open('students.csv', 'r') as f:
    # Skip header line with next(f)
    next(f)
    cur.copy_from(f, 'students', sep=',', columns=('name', 'age'))

conn.commit()
cur.close()
conn.close()

⚠️ Ensure the table students(name, age) already exists and matches the CSV structure.


⚙️ 4. Method 2 – psycopg2.extras.execute_values() (Flexible)

This reads the CSV and inserts in batch:

import csv
import psycopg2
from psycopg2.extras import execute_values

conn = psycopg2.connect(
    dbname="school",
    user="postgres",
    password="your_pass",
    host="localhost",
    port="5432"
)
cur = conn.cursor()

with open('students.csv', 'r') as f:
    reader = csv.DictReader(f)
    data = [(row['name'], int(row['age'])) for row in reader]

execute_values(cur,
    "INSERT INTO students (name, age) VALUES %s",
    data
)

conn.commit()
cur.close()
conn.close()

5. Method 3 – Using SQLAlchemy and Pandas

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('postgresql://postgres:your_pass@localhost/school')

df = pd.read_csv('students.csv')
df.to_sql('students', engine, if_exists='append', index=False)

✅ This method is very readable and integrates well with data processing.


6. Full Working Example with COPY

import psycopg2

conn = psycopg2.connect(
    dbname="school",
    user="postgres",
    password="your_pass",
    host="localhost",
    port="5432"
)

cur = conn.cursor()

# Ensure table exists:
cur.execute("""
CREATE TABLE IF NOT EXISTS students (
    id SERIAL PRIMARY KEY,
    name TEXT,
    age INTEGER
)
""")

with open('students.csv', 'r') as f:
    next(f)  # Skip the header
    cur.copy_from(f, 'students', sep=',', columns=('name', 'age'))

conn.commit()
cur.close()
conn.close()

⚠️ 7. Common Pitfalls

Problem Solution
CSV has header row Use next(f) to skip header
Incorrect column types Validate CSV data before insert
Encoding issues (e.g., UTF-8) Use open(file, encoding='utf-8') if needed
Mismatched column names Ensure CSV headers match DB column names
to_sql doesn't auto-create PK Use SQLAlchemy model for full control (if needed)

Conclusion

For large data loads:

  • Use COPY for best performance

  • Use execute_values() for more control

  • Use pandas.to_sql() for quick integration with data analysis