BigQuery supports the UPDATE
SQL statement to modify existing records in a table. This is useful for correcting data, enriching rows, or applying transformations over time. With the BigQuery Python client, you can execute these updates programmatically with ease.
Table of Contents
-
Prerequisites
-
BigQuery UPDATE Syntax
-
Update Data in BigQuery Using Python
-
Parameterized UPDATE Queries
-
Full Working Example
-
Tips and Best Practices
-
Common Pitfalls
-
Conclusion
✅ Prerequisites
Before performing updates:
-
Ensure you have a Google Cloud Project
-
Enable the BigQuery API
-
Set up a Service Account with the
BigQuery Data Editor
role -
Install the required Python package:
pip install google-cloud-bigquery
BigQuery UPDATE Syntax
UPDATE `project.dataset.table`
SET column1 = new_value, column2 = new_value
WHERE condition
Example
UPDATE `my-project.sales.customers`
SET status = 'active'
WHERE last_login_date >= '2024-01-01'
Step-by-Step: Update Data Using Python
Step 1: Import Libraries and Authenticate
from google.cloud import bigquery
import os
# Set your service account credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json"
# Initialize BigQuery client
client = bigquery.Client()
Step 2: Write and Run the Update Query
query = """
UPDATE `my-project.sales.customers`
SET status = 'inactive'
WHERE last_login_date < '2023-01-01'
"""
query_job = client.query(query)
query_job.result() # Wait for the job to complete
print("✅ Records updated successfully.")
Using Parameterized UPDATE Queries
Using parameters in your queries helps prevent SQL injection and makes your code more dynamic.
query = """
UPDATE `my-project.sales.customers`
SET status = @new_status
WHERE last_login_date < @cutoff_date
"""
job_config = bigquery.QueryJobConfig(
query_parameters=[
bigquery.ScalarQueryParameter("new_status", "STRING", "inactive"),
bigquery.ScalarQueryParameter("cutoff_date", "DATE", "2023-01-01")
]
)
query_job = client.query(query, job_config=job_config)
query_job.result()
print("✅ Parameterized update completed.")
Full Working Example
import os
from google.cloud import bigquery
# Authenticate
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json"
client = bigquery.Client()
# SQL UPDATE statement
query = """
UPDATE `my-project.crm.customers`
SET email_verified = TRUE
WHERE email IS NOT NULL AND email_verified = FALSE
"""
# Run the query
query_job = client.query(query)
query_job.result() # Wait for job to complete
print("✅ Updated all verified emails.")
Tips and Best Practices
Tip | Why It’s Important |
---|---|
Use WHERE conditions carefully | Prevent accidental updates to all rows |
Test with a SELECT first | Confirm that the correct rows are targeted |
Use parameters | Improves security and flexibility |
Partition large tables | Increases efficiency and lowers cost |
Keep backups | BigQuery has no built-in undo for data manipulation |
⚠️ Common Pitfalls
Pitfall | Solution |
---|---|
No WHERE clause | Always include a condition unless you intend to update every row |
Query runs but updates 0 rows | Check your condition logic |
Permission denied | Ensure your service account has BigQuery Data Editor or higher |
Trying to update a view or external table | UPDATE only works on native BigQuery tables |
Updates are slow on large tables | Use partitioning or clustering for better performance |
How to Check the Update
Run a SELECT
query to verify changes:
verify_query = """
SELECT COUNT(*) as updated_rows
FROM `my-project.crm.customers`
WHERE email_verified = TRUE
"""
results = client.query(verify_query).to_dataframe()
print(results)
Conclusion
The UPDATE
operation in BigQuery using Python allows you to modify your dataset programmatically and safely. Just remember:
✅ Use WHERE filters
✅ Leverage parameters
✅ Test before running updates on large or production tables