Benchmarking Bulk Inserts in PostgreSQL with Python

Last updated 1 month ago | 112 views 75     5

Benchmarking insert performance helps you understand the best strategy when inserting large volumes of data into PostgreSQL. Let’s walk through how to benchmark different bulk insert methods in Python, including timing comparisons between:

  • psycopg2.executemany()

  • psycopg2.extras.execute_values()

  • psycopg2.copy_from()

  • SQLAlchemy + pandas.to_sql()


Benchmarking Bulk Inserts in PostgreSQL with Python

Goal:

Measure and compare the time it takes to insert 10,000 rows using different methods.


⚙️ Setup

Install dependencies:

pip install psycopg2-binary sqlalchemy pandas

Sample Data Generator:

import random

def generate_data(n):
    return [(f"Name{i}", random.randint(18, 30)) for i in range(n)]

⏱️ Benchmark Helper

import time

def benchmark(func, label, *args):
    start = time.time()
    func(*args)
    end = time.time()
    print(f"{label:<35}: {end - start:.4f} seconds")

Method 1: executemany() (Slowest)

def insert_executemany(data):
    import psycopg2
    conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
    cur = conn.cursor()
    cur.execute("DELETE FROM students")  # clean slate
    cur.executemany("INSERT INTO students (name, age) VALUES (%s, %s)", data)
    conn.commit()
    conn.close()

Method 2: execute_values() (Faster)

def insert_execute_values(data):
    import psycopg2
    from psycopg2.extras import execute_values
    conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
    cur = conn.cursor()
    cur.execute("DELETE FROM students")
    execute_values(cur, "INSERT INTO students (name, age) VALUES %s", data)
    conn.commit()
    conn.close()

Method 3: copy_from() (Fastest for clean data)

def insert_copy_from(data):
    import psycopg2
    conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
    cur = conn.cursor()
    cur.execute("DELETE FROM students")

    with open("temp_data.csv", "w") as f:
        for row in data:
            f.write(f"{row[0]},{row[1]}\n")

    with open("temp_data.csv", "r") as f:
        cur.copy_from(f, "students", sep=",", columns=("name", "age"))
    conn.commit()
    conn.close()

Method 4: pandas.to_sql() (Good for DataFrames)

def insert_pandas_to_sql(data):
    import pandas as pd
    from sqlalchemy import create_engine

    engine = create_engine("postgresql://postgres:your_pass@localhost/school")
    df = pd.DataFrame(data, columns=["name", "age"])
    with engine.begin() as conn:
        conn.execute("DELETE FROM students")
        df.to_sql("students", con=conn, if_exists="append", index=False)

Run the Benchmark

if __name__ == "__main__":
    data = generate_data(10000)

    benchmark(insert_executemany, "executemany()", data)
    benchmark(insert_execute_values, "execute_values()", data)
    benchmark(insert_copy_from, "copy_from()", data)
    benchmark(insert_pandas_to_sql, "pandas.to_sql()", data)

Example Output (On Average)

Method Time (10k rows)
executemany() 6.5s
execute_values() 0.8s
copy_from() 0.4s
pandas.to_sql() 1.0s

Your actual results may vary depending on system specs and PostgreSQL tuning.


✅ Summary

Method Best For Speed Notes
executemany() Simplicity Slow Avoid for large datasets
execute_values() Balanced choice Fast Flexible and fast
copy_from() CSV-style data ⚡ Fastest Requires file I/O, no validation
pandas.to_sql() Pandas workflows ✅ Good Easy for dataframes