Python BigQuery: How to UPDATE Data in a Table

Last updated 1 month ago | 103 views 75     5

Tags:- Python BigQuery

BigQuery supports the UPDATE SQL statement to modify existing records in a table. This is useful for correcting data, enriching rows, or applying transformations over time. With the BigQuery Python client, you can execute these updates programmatically with ease.


Table of Contents

  1. Prerequisites

  2. BigQuery UPDATE Syntax

  3. Update Data in BigQuery Using Python

  4. Parameterized UPDATE Queries

  5. Full Working Example

  6. Tips and Best Practices

  7. Common Pitfalls

  8. Conclusion


✅ Prerequisites

Before performing updates:

  • Ensure you have a Google Cloud Project

  • Enable the BigQuery API

  • Set up a Service Account with the BigQuery Data Editor role

  • Install the required Python package:

pip install google-cloud-bigquery

BigQuery UPDATE Syntax

UPDATE `project.dataset.table`
SET column1 = new_value, column2 = new_value
WHERE condition

Example

UPDATE `my-project.sales.customers`
SET status = 'active'
WHERE last_login_date >= '2024-01-01'

Step-by-Step: Update Data Using Python

Step 1: Import Libraries and Authenticate

from google.cloud import bigquery
import os

# Set your service account credentials
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json"

# Initialize BigQuery client
client = bigquery.Client()

Step 2: Write and Run the Update Query

query = """
    UPDATE `my-project.sales.customers`
    SET status = 'inactive'
    WHERE last_login_date < '2023-01-01'
"""

query_job = client.query(query)
query_job.result()  # Wait for the job to complete

print("✅ Records updated successfully.")

Using Parameterized UPDATE Queries

Using parameters in your queries helps prevent SQL injection and makes your code more dynamic.

query = """
    UPDATE `my-project.sales.customers`
    SET status = @new_status
    WHERE last_login_date < @cutoff_date
"""

job_config = bigquery.QueryJobConfig(
    query_parameters=[
        bigquery.ScalarQueryParameter("new_status", "STRING", "inactive"),
        bigquery.ScalarQueryParameter("cutoff_date", "DATE", "2023-01-01")
    ]
)

query_job = client.query(query, job_config=job_config)
query_job.result()

print("✅ Parameterized update completed.")

Full Working Example

import os
from google.cloud import bigquery

# Authenticate
os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "path/to/your-key.json"
client = bigquery.Client()

# SQL UPDATE statement
query = """
    UPDATE `my-project.crm.customers`
    SET email_verified = TRUE
    WHERE email IS NOT NULL AND email_verified = FALSE
"""

# Run the query
query_job = client.query(query)
query_job.result()  # Wait for job to complete

print("✅ Updated all verified emails.")

Tips and Best Practices

Tip Why It’s Important
Use WHERE conditions carefully Prevent accidental updates to all rows
Test with a SELECT first Confirm that the correct rows are targeted
Use parameters Improves security and flexibility
Partition large tables Increases efficiency and lowers cost
Keep backups BigQuery has no built-in undo for data manipulation

⚠️ Common Pitfalls

Pitfall Solution
No WHERE clause Always include a condition unless you intend to update every row
Query runs but updates 0 rows Check your condition logic
Permission denied Ensure your service account has BigQuery Data Editor or higher
Trying to update a view or external table UPDATE only works on native BigQuery tables
Updates are slow on large tables Use partitioning or clustering for better performance

How to Check the Update

Run a SELECT query to verify changes:

verify_query = """
    SELECT COUNT(*) as updated_rows
    FROM `my-project.crm.customers`
    WHERE email_verified = TRUE
"""

results = client.query(verify_query).to_dataframe()
print(results)

Conclusion

The UPDATE operation in BigQuery using Python allows you to modify your dataset programmatically and safely. Just remember:

✅ Use WHERE filters
✅ Leverage parameters
✅ Test before running updates on large or production tables