Benchmarking insert performance helps you understand the best strategy when inserting large volumes of data into PostgreSQL. Let’s walk through how to benchmark different bulk insert methods in Python, including timing comparisons between:
-
psycopg2.executemany()
-
psycopg2.extras.execute_values()
-
psycopg2.copy_from()
-
SQLAlchemy + pandas.to_sql()
Benchmarking Bulk Inserts in PostgreSQL with Python
Goal:
Measure and compare the time it takes to insert 10,000 rows using different methods.
⚙️ Setup
Install dependencies:
pip install psycopg2-binary sqlalchemy pandas
Sample Data Generator:
import random
def generate_data(n):
return [(f"Name{i}", random.randint(18, 30)) for i in range(n)]
⏱️ Benchmark Helper
import time
def benchmark(func, label, *args):
start = time.time()
func(*args)
end = time.time()
print(f"{label:<35}: {end - start:.4f} seconds")
Method 1: executemany()
(Slowest)
def insert_executemany(data):
import psycopg2
conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
cur = conn.cursor()
cur.execute("DELETE FROM students") # clean slate
cur.executemany("INSERT INTO students (name, age) VALUES (%s, %s)", data)
conn.commit()
conn.close()
Method 2: execute_values()
(Faster)
def insert_execute_values(data):
import psycopg2
from psycopg2.extras import execute_values
conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
cur = conn.cursor()
cur.execute("DELETE FROM students")
execute_values(cur, "INSERT INTO students (name, age) VALUES %s", data)
conn.commit()
conn.close()
Method 3: copy_from()
(Fastest for clean data)
def insert_copy_from(data):
import psycopg2
conn = psycopg2.connect(dbname="school", user="postgres", password="your_pass", host="localhost")
cur = conn.cursor()
cur.execute("DELETE FROM students")
with open("temp_data.csv", "w") as f:
for row in data:
f.write(f"{row[0]},{row[1]}\n")
with open("temp_data.csv", "r") as f:
cur.copy_from(f, "students", sep=",", columns=("name", "age"))
conn.commit()
conn.close()
Method 4: pandas.to_sql()
(Good for DataFrames)
def insert_pandas_to_sql(data):
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine("postgresql://postgres:your_pass@localhost/school")
df = pd.DataFrame(data, columns=["name", "age"])
with engine.begin() as conn:
conn.execute("DELETE FROM students")
df.to_sql("students", con=conn, if_exists="append", index=False)
Run the Benchmark
if __name__ == "__main__":
data = generate_data(10000)
benchmark(insert_executemany, "executemany()", data)
benchmark(insert_execute_values, "execute_values()", data)
benchmark(insert_copy_from, "copy_from()", data)
benchmark(insert_pandas_to_sql, "pandas.to_sql()", data)
Example Output (On Average)
Method | Time (10k rows) |
---|---|
executemany() |
6.5s |
execute_values() |
0.8s |
copy_from() |
0.4s |
pandas.to_sql() |
1.0s |
Your actual results may vary depending on system specs and PostgreSQL tuning.
✅ Summary
Method | Best For | Speed | Notes |
---|---|---|---|
executemany() |
Simplicity | Slow | Avoid for large datasets |
execute_values() |
Balanced choice | Fast | Flexible and fast |
copy_from() |
CSV-style data | ⚡ Fastest | Requires file I/O, no validation |
pandas.to_sql() |
Pandas workflows | ✅ Good | Easy for dataframes |