Once your BigQuery table is created, the next step is to insert data. Using Python and the BigQuery client library, you can insert rows programmatically—perfect for ETL pipelines, data automation, or ingestion workflows.
In this article, you’ll learn:
-
Different methods to insert data
-
How to insert single or multiple rows
-
Best practices for batch inserts
-
Full code example
-
Tips and common pitfalls
✅ Prerequisites
Before starting:
-
Google Cloud Project with billing enabled
-
BigQuery API enabled
-
Python installed
-
Service account with
BigQuery Data Editor
orAdmin
role
Install Required Library
pip install google-cloud-bigquery
Step 1: Authenticate and Initialize Client
import os
from google.cloud import bigquery
# Authenticate using your service account key
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json"
# Initialize the BigQuery client
client = bigquery.Client()
Step 2: Define Table ID
Use the format: project_id.dataset_id.table_id
table_id = "your-project-id.my_dataset.customers"
Step 3: Insert Data Rows
Method: insert_rows_json()
This method inserts JSON-like Python dictionaries (rows) into the table.
Example: Insert Multiple Rows
rows_to_insert = [
{"id": 1, "name": "Alice", "email": "[email protected]", "signup_date": "2023-01-01"},
{"id": 2, "name": "Bob", "email": "[email protected]", "signup_date": "2023-02-15"},
]
Insert the Rows
errors = client.insert_rows_json(table_id, rows_to_insert)
if errors == []:
print("✅ Data inserted successfully.")
else:
print("❌ Errors occurred while inserting rows:")
print(errors)
Full Example: Insert Data into BigQuery Table
import os
from google.cloud import bigquery
# Set credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json"
# Initialize client
client = bigquery.Client()
# Define table ID
table_id = "your-project-id.my_dataset.customers"
# Define rows to insert
rows_to_insert = [
{"id": 1, "name": "Alice", "email": "[email protected]", "signup_date": "2023-01-01"},
{"id": 2, "name": "Bob", "email": "[email protected]", "signup_date": "2023-02-15"},
{"id": 3, "name": "Charlie", "email": "[email protected]", "signup_date": "2023-03-10"},
]
# Insert data
errors = client.insert_rows_json(table_id, rows_to_insert)
# Check for errors
if errors == []:
print("✅ Data inserted successfully.")
else:
print("❌ Errors occurred:")
for error in errors:
print(error)
⚙️ Alternative: Load Data from a CSV/JSON File
You can load data from a local file using a load_job
.
from google.cloud import bigquery
# Configure load job
job_config = bigquery.LoadJobConfig(
source_format=bigquery.SourceFormat.CSV, skip_leading_rows=1, autodetect=True
)
with open("customers.csv", "rb") as source_file:
load_job = client.load_table_from_file(source_file, table_id, job_config=job_config)
load_job.result() # Wait for the job to complete
print("✅ Loaded data from CSV file.")
Tips for Inserting Data into BigQuery
Tip | Why It Matters |
---|---|
Use insert_rows_json() for small batches |
Best for real-time inserts |
Use load_table_from_file() for bulk inserts |
More efficient for large datasets |
Validate data schema | Data must match the table schema exactly |
Use autodetect=True only when loading files |
BigQuery will infer types |
Handle and log errors | Always check the errors list |
⚠️ Common Pitfalls
Issue | Solution |
---|---|
Invalid field type |
Ensure each field matches the table schema |
403 Permission Denied |
Grant BigQuery Data Editor or Admin role |
Row insert fails silently |
Always check the errors returned |
Too many rows in one insert |
Keep batches small (<10,000 rows) |
Timezone mismatch in date/time | Use ISO 8601 format (e.g., 2023-01-01T12:00:00Z ) |
Optional: Use Pandas DataFrame to Insert Data
import pandas as pd
df = pd.DataFrame([
{"id": 4, "name": "Diana", "email": "[email protected]", "signup_date": "2023-04-01"},
])
job = client.load_table_from_dataframe(df, table_id)
job.result()
print("✅ Inserted data from DataFrame.")
Conclusion
Inserting data into BigQuery using Python is simple and scalable. Whether you're working with real-time data, bulk loading, or streaming inserts, Python and the BigQuery API provide robust tools to get the job done.
This foundational knowledge will help you build pipelines that insert clean, structured data into your cloud warehouse automatically.