Benchmarking Bulk Inserts in PostgreSQL with Python

Last updated 7 months, 2 weeks ago | 502 views 75 5

Benchmarking Bulk Inserts in PostgreSQL with Python

Benchmarking insert performance helps you understand the best strategy when inserting large volumes of data into PostgreSQL. Let’s walk through how to benchmark different bulk insert methods in Python, including timing comparisons between:

psycopg2.executemany()
psycopg2.extras.execute_values()
psycopg2.copy_from()
SQLAlchemy + pandas.to_sql()

Benchmarking Bulk Inserts in PostgreSQL with Python

Goal:

Measure and compare the time it takes to insert 10,000 rows using different methods.

⚙️ Setup

Install dependencies:

pip install psycopg2-binary sqlalchemy pandas

Sample Data Generator:

import random

def generate_data(n):
    return [(f"Name{i}", random.randint(18, 30)) for i in range(n)]

⏱️ Benchmark Helper

import time

def benchmark(func, label, *args):
    start = time.time()
    func(*args)
    end = time.time()
    print(f"{label:<35}: {end - start:.4f} seconds")

Method 1: `executemany()` (Slowest)

def insert_executemany(data):
    import psycopg2
    conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
    cur = conn.cursor()
    cur.execute("DELETE FROM students")  # clean slate
    cur.executemany("INSERT INTO students (name, age) VALUES (%s, %s)", data)
    conn.commit()
    conn.close()

Method 2: `execute_values()` (Faster)

def insert_execute_values(data):
    import psycopg2
    from psycopg2.extras import execute_values
    conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
    cur = conn.cursor()
    cur.execute("DELETE FROM students")
    execute_values(cur, "INSERT INTO students (name, age) VALUES %s", data)
    conn.commit()
    conn.close()

Method 3: `copy_from()` (Fastest for clean data)

def insert_copy_from(data):
    import psycopg2
    conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
    cur = conn.cursor()
    cur.execute("DELETE FROM students")

    with open("temp_data.csv", "w") as f:
        for row in data:
            f.write(f"{row[0]},{row[1]}\n")

    with open("temp_data.csv", "r") as f:
        cur.copy_from(f, "students", sep=",", columns=("name", "age"))
    conn.commit()
    conn.close()

Method 4: `pandas.to_sql()` (Good for DataFrames)

def insert_pandas_to_sql(data):
    import pandas as pd
    from sqlalchemy import create_engine

    engine = create_engine("postgresql://postgres:your_pass@localhost/school")
    df = pd.DataFrame(data, columns=["name", "age"])
    with engine.begin() as conn:
        conn.execute("DELETE FROM students")
        df.to_sql("students", con=conn, if_exists="append", index=False)

Run the Benchmark

if __name__ == "__main__":
    data = generate_data(10000)

    benchmark(insert_executemany, "executemany()", data)
    benchmark(insert_execute_values, "execute_values()", data)
    benchmark(insert_copy_from, "copy_from()", data)
    benchmark(insert_pandas_to_sql, "pandas.to_sql()", data)

Example Output (On Average)

Method	Time (10k rows)
`executemany()`	6.5s
`execute_values()`	0.8s
`copy_from()`	0.4s
`pandas.to_sql()`	1.0s

Your actual results may vary depending on system specs and PostgreSQL tuning.

✅ Summary

Method	Best For	Speed	Notes
`executemany()`	Simplicity	Slow	Avoid for large datasets
`execute_values()`	Balanced choice	Fast	Flexible and fast
`copy_from()`	CSV-style data	⚡ Fastest	Requires file I/O, no validation
`pandas.to_sql()`	Pandas workflows	✅ Good	Easy for dataframes

From The Article

PHP Comments: Best Practices, Syntax, and Real-World Examples

Python String Formatting Tutorial: Modern Ways to Format Strings

Convert Object To Array in PHP

Testing Django REST APIs with APITestCase: A Complete Guide

How to Set the Default Value of a Text Input Field Using JavaScript

Python PostgreSQL Tutorial – Updating Records with UPDATE

Benchmarking Bulk Inserts in PostgreSQL with Python

Benchmarking Bulk Inserts in PostgreSQL with Python

Benchmarking Bulk Inserts in PostgreSQL with Python

Goal:

⚙️ Setup

Sample Data Generator:

⏱️ Benchmark Helper

Method 1: `executemany()` (Slowest)

Method 2: `execute_values()` (Faster)

Method 3: `copy_from()` (Fastest for clean data)

Method 4: `pandas.to_sql()` (Good for DataFrames)

Run the Benchmark

Example Output (On Average)

✅ Summary

From The Article

Trending View All

How to show data values on top of each bar …

A non-numeric value encountered in PHP

The view account.views.register did not return an HttpResponse object. It …

Input type number maxlength not working

Uncaught TypeError: e.indexOf is not a function in JQuery

How to start array index from 1 in PHP

Interview Questions

PHP Interview Question

PayPal Interview Question

MySQL Interview Question

PHP-MySQL Interview Question

SQL Interview Question

CodeIgniter Interview Question

JQuery Interview Question

htaccess Interview Question

JavaScript Interview Question

HTML Interview Question

Python Interview Question

Django Interview Question

Benchmarking Bulk Inserts in PostgreSQL with Python

Benchmarking Bulk Inserts in PostgreSQL with Python

Benchmarking Bulk Inserts in PostgreSQL with Python

Goal:

⚙️ Setup

Sample Data Generator:

⏱️ Benchmark Helper

Method 1: executemany() (Slowest)

Method 2: execute_values() (Faster)

Method 3: copy_from() (Fastest for clean data)

Method 4: pandas.to_sql() (Good for DataFrames)

Run the Benchmark

Example Output (On Average)

✅ Summary

From The Article

Trending View All

How to show data values on top of each bar …

A non-numeric value encountered in PHP

The view account.views.register did not return an HttpResponse object. It …

Input type number maxlength not working

Uncaught TypeError: e.indexOf is not a function in JQuery

How to start array index from 1 in PHP

Interview Questions

PHP Interview Question

PayPal Interview Question

MySQL Interview Question

PHP-MySQL Interview Question

SQL Interview Question

CodeIgniter Interview Question

JQuery Interview Question

htaccess Interview Question

JavaScript Interview Question

HTML Interview Question

Python Interview Question

Django Interview Question

Method 1: `executemany()` (Slowest)

Method 2: `execute_values()` (Faster)

Method 3: `copy_from()` (Fastest for clean data)

Method 4: `pandas.to_sql()` (Good for DataFrames)