Google BigQuery: Complete Gids voor Serverless Data Warehouse
Leer de fundamenten en geavanceerde technieken van Google BigQuery. Van serverless architecture tot machine learning integration en praktische implementatie strategieën voor moderne data analytics.
Zoek je Google Cloud Experts?
Vind ervaren GCP Data Engineers en BigQuery Architects voor je cloud data projecten
Inhoudsopgave
- Inleiding tot Google BigQuery
- Serverless Architecture Overzicht
- Data Ingestion en Loading
- Query Performance Optimalisatie
- Machine Learning met BigQuery ML
- Data Governance en Security
- Cost Management en Optimization
- Integration met Google Cloud Services
- Best Practices en Patterns
- Migratie naar BigQuery
- Praktijk Case Studies
1. Inleiding tot Google BigQuery
Wat is Google BigQuery?
Google BigQuery is een volledig beheerd, serverless data warehouse dat schaalbare data analytics mogelijk maakt zonder infrastructure management. Het biedt real-time analytics, machine learning integratie en enterprise-grade security.
Serverless Architecture
Geen servers te beheren, automatische scaling, pay-per-query pricing
Real-time Analytics
Streaming inserts, sub-second query response, live dashboards
Built-in ML
BigQuery ML voor machine learning met standaard SQL
Enterprise Security
End-to-end encryption, IAM integration, data masking
| Feature | Traditional Data Warehouse | Google BigQuery | Voordelen |
|---|---|---|---|
| Infrastructure | Managed servers | Serverless | Zero operations |
| Scaling | Manual scaling | Automatic scaling | Elastic capacity |
| Pricing | Per server/hour | Per query/TB | Pay-per-use |
| Performance | Hours for large queries | Seconds to minutes | 10-100x faster |
| ML Integration | External tools needed | Built-in ML | Simplified workflows |
| Data Types | Structured only | All data types | More insights |
2. Serverless Architecture Overzicht
BigQuery Architecture Principes
BigQuery's serverless architecture combineert Google's infrastructuur met geavanceerde query optimization voor ongeëvenaarde performance en schaalbaarheid.
Core Architecture Components
-- GOOGLE BIGQUERY ARCHITECTURE OVERZICHT
-- 1. STORAGE LAYER: Colossus (Google's distributed storage)
-- Data wordt opgeslagen in columnar format (Capacitor)
CREATE TABLE `project-id.dataset.sales_data`
PARTITION BY DATE(transaction_timestamp)
CLUSTER BY customer_id, product_category
AS
SELECT * FROM `project-id.dataset.raw_sales`;
-- 2. COMPUTE LAYER: Dremel (Query execution engine)
-- Distributed query execution across thousands of nodes
SELECT
customer_id,
COUNT(*) AS transaction_count,
SUM(sales_amount) AS total_spent
FROM `project-id.dataset.sales_data`
WHERE DATE(transaction_timestamp) >= '2024-01-01'
GROUP BY customer_id;
-- 3. JUPITER NETWORK: Petabit-scale networking
-- High-speed data transfer tussen storage en compute
-- Automatisch geoptimaliseerd door BigQuery
-- 4. BORG: Cluster management system
-- Resource allocation en scheduling
-- BIGQUERY SLOTS: Compute resources
-- Standaard: 2000 slots per project
-- Reservations voor garantieerde capaciteit
-- BI ENGINE: In-memory acceleration service
-- Voor sub-second query performance
CREATE CAPACITY `project-id.region.bi_engine_reservation`
OPTIONS(
size_gb = 10,
type = 'BI_ENGINE'
);
-- 5. EXTERNAL TABLES: Query data zonder loading
CREATE EXTERNAL TABLE `project-id.dataset.external_sales`
OPTIONS(
format = 'PARQUET',
uris = ['gs://bucket-name/sales/*.parquet']
);
-- 6. MATERIALIZED VIEWS: Pre-computed aggregaties
CREATE MATERIALIZED VIEW `project-id.dataset.mv_daily_sales`
AS
SELECT
DATE(transaction_timestamp) AS sales_date,
product_category,
COUNT(*) AS transaction_count,
SUM(sales_amount) AS total_sales
FROM `project-id.dataset.sales_data`
GROUP BY 1, 2;
3. Data Ingestion en Loading
Data Loading Strategies
Data ingestion in BigQuery ondersteunt batch loading, streaming inserts en federated queries voor flexibele data integration patronen.
Batch Loading en Streaming
-- BATCH DATA LOADING STRATEGIEËN
-- 1. LOAD DATA FROM GCS (Google Cloud Storage)
LOAD DATA OVERWRITE `project-id.dataset.customers`
FROM FILES(
format = 'PARQUET',
uris = ['gs://customer-data-bucket/raw/customers_*.parquet']
);
-- 2. LOAD DATA FROM LOCAL FILE (via bq CLI)
# bq load --source_format=CSV dataset.customers customers.csv
# bq load --source_format=NEWLINE_DELIMITED_JSON dataset.logs logs.json
-- 3. LOAD DATA WITH SCHEMA AUTODETECTION
LOAD DATA INTO `project-id.dataset.sales`
FROM FILES(
format = 'CSV',
uris = ['gs://sales-data-bucket/2024/*.csv'],
skip_leading_rows = 1
);
-- 4. LOAD DATA WITH EXPLICIT SCHEMA
LOAD DATA INTO `project-id.dataset.products`
(
product_id INT64,
product_name STRING,
category STRING,
price FLOAT64,
created_at TIMESTAMP
)
FROM FILES(
format = 'CSV',
uris = ['gs://product-data-bucket/products.csv']
);
-- 5. MERGE DATA (UPSERT) OPERATIES
MERGE `project-id.dataset.customers` AS target
USING `project-id.dataset.customers_staging` AS source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN
UPDATE SET
email = source.email,
last_updated = CURRENT_TIMESTAMP()
WHEN NOT MATCHED THEN
INSERT (customer_id, name, email, created_at)
VALUES (customer_id, name, email, CURRENT_TIMESTAMP());
-- STREAMING DATA INSERTION
-- 1. REAL-TIME STREAMING INSERTS (Python example)
from google.cloud import bigquery
import datetime
client = bigquery.Client()
table_id = "project-id.dataset.realtime_events"
rows_to_insert = [
{
"event_id": "event_001",
"event_type": "purchase",
"user_id": "user_123",
"amount": 99.99,
"timestamp": datetime.datetime.now().isoformat()
},
{
"event_id": "event_002",
"event_type": "click",
"user_id": "user_456",
"amount": None,
"timestamp": datetime.datetime.now().isoformat()
}
]
errors = client.insert_rows_json(table_id, rows_to_insert)
if errors == []:
print("New rows have been added.")
else:
print("Encountered errors while inserting rows: {}".format(errors))
-- 2. STREAMING BUFFER ANALYS
SELECT
COUNT(*) AS buffered_rows,
MIN(insert_timestamp) AS earliest_buffered,
MAX(insert_timestamp) AS latest_buffered
FROM `project-id.dataset.realtime_events/* */
WHERE _PARTITIONTIME IS NULL; -- Streaming buffer rows
-- 3. DATAFLOW VOOR STREAM PROCESSING
-- Apache Beam pipeline voor real-time ETL
"""
import apache_beam as beam
from apache_beam.options.pipeline_options import PipelineOptions
def run_pipeline():
options = PipelineOptions()
with beam.Pipeline(options=options) as p:
(p
| 'Read from PubSub' >> beam.io.ReadFromPubSub(topic='projects/project-id/topics/events')
| 'Parse JSON' >> beam.Map(lambda x: json.loads(x))
| 'Filter Events' >> beam.Filter(lambda x: x['event_type'] == 'purchase')
| 'Window' >> beam.WindowInto(beam.window.FixedWindows(60)) # 1-minute windows
| 'Write to BigQuery' >> beam.io.WriteToBigQuery(
table='project-id:dataset.events',
schema='event_id:STRING,event_type:STRING,user_id:STRING,amount:FLOAT64,timestamp:TIMESTAMP',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
)
"""
-- 4. CLOUD FUNCTIONS VOOR EVENT-DRIVEN LOADING
-- Trigger loading bij nieuwe GCS bestanden
"""
exports.loadToBigQuery = async (event, context) => {
const file = event;
const datasetId = 'dataset';
const tableId = 'events';
const metadata = {
sourceFormat: 'NEWLINE_DELIMITED_JSON',
autodetect: true,
writeDisposition: 'WRITE_APPEND'
};
const [job] = await bigquery
.dataset(datasetId)
.table(tableId)
.load(storage.bucket(file.bucket).file(file.name), metadata);
console.log(`Job ${job.id} completed.`);
};
"""
Google Cloud Experts Nodig?
Vind ervaren GCP Data Engineers en BigQuery Specialists voor je cloud data projecten
4. Query Performance Optimalisatie
BigQuery Performance Tuning
Query performance optimalisatie in BigQuery vereist begrip van partitioning, clustering, query patterns en cost management technieken.
Performance Optimization Techniques
-- BIGQUERY PERFORMANCE OPTIMALISATIE TECHNIQUEN
-- 1. PARTITIONERING VOOR DATA SEGMENTATIE
-- Time-based partitioning (meest voorkomend)
CREATE TABLE `project-id.dataset.sales_partitioned`
PARTITION BY DATE(transaction_timestamp)
CLUSTER BY customer_id, product_category
AS
SELECT * FROM `project-id.dataset.sales_raw`;
-- Integer-range partitioning
CREATE TABLE `project-id.dataset.customers_partitioned`
PARTITION BY RANGE_BUCKET(customer_id, GENERATE_ARRAY(1, 1000000, 10000))
AS
SELECT * FROM `project-id.dataset.customers_raw`;
-- 2. CLUSTERING VOOR DATA ORDENING
CREATE TABLE `project-id.dataset.web_events`
PARTITION BY DATE(event_timestamp)
CLUSTER BY user_id, event_type, page_url
AS
SELECT * FROM `project-id.dataset.events_raw`;
-- 3. QUERY OPTIMALISATIE PATRONEN
-- Gebruik WHERE clauses op partitioned columns
SELECT
customer_id,
SUM(sales_amount) AS total_spent
FROM `project-id.dataset.sales_partitioned`
WHERE DATE(transaction_timestamp) BETWEEN '2024-01-01' AND '2024-01-31'
AND customer_id = 12345
GROUP BY customer_id;
-- Vermijd SELECT *
SELECT -- SLECHT
*
FROM `project-id.dataset.sales_partitioned`;
SELECT -- GOED
customer_id,
product_id,
sales_amount,
transaction_timestamp
FROM `project-id.dataset.sales_partitioned`;
-- 4. JOIN OPTIMALISATIE
-- Gebruik gepaste JOIN types
SELECT
s.customer_id,
c.customer_name,
SUM(s.sales_amount) AS total_spent
FROM `project-id.dataset.sales_partitioned` s
INNER JOIN `project-id.dataset.customers` c
ON s.customer_id = c.customer_id
WHERE DATE(s.transaction_timestamp) = '2024-01-01'
GROUP BY s.customer_id, c.customer_name;
-- Gebruik APPROX_COUNT_DISTINCT voor grote datasets
SELECT
APPROX_COUNT_DISTINCT(customer_id) AS unique_customers,
COUNT(*) AS total_transactions
FROM `project-id.dataset.sales_partitioned`;
-- 5. MATERIALIZED VIEWS VOOR PERFORMANCE
CREATE MATERIALIZED VIEW `project-id.dataset.mv_daily_metrics`
PARTITION BY metric_date
CLUSTER BY metric_name
AS
SELECT
DATE(transaction_timestamp) AS metric_date,
'total_sales' AS metric_name,
COUNT(*) AS metric_value
FROM `project-id.dataset.sales_partitioned`
GROUP BY 1
UNION ALL
SELECT
DATE(transaction_timestamp),
'total_revenue',
SUM(sales_amount)
FROM `project-id.dataset.sales_partitioned`
GROUP BY 1;
-- 6. BI ENGINE VOOR SUB-SECOND QUERIES
-- Schakel BI Engine in voor kritieke dashboards
-- Automatische caching en in-memory acceleration
-- 7. QUERY CACHING STRATEGIEËN
-- BigQuery cache automatisch query results voor 24 uur
-- Gebruik legacySQL niet (geen caching)
-- 8. PERFORMANCE MONITORING
SELECT
job_id,
query,
total_bytes_processed,
total_slot_ms,
creation_time,
end_time
FROM `region-eu`.INFORMATION_SCHEMA.JOBS
WHERE job_type = 'QUERY'
AND creation_time >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
ORDER BY total_slot_ms DESC
LIMIT 10;
-- 9. SLOT RESERVATIONS VOOR PREDICTABLE PERFORMANCE
CREATE CAPACITY `project-id.eu.bq_reservation`
OPTIONS(
slot_count = 100,
edition = 'STANDARD'
);
-- 10. QUERY PLAN ANALYSE
-- Gebruik EXPLAIN voor query plan inspection
EXPLAIN
SELECT
customer_id,
COUNT(*) AS purchase_count
FROM `project-id.dataset.sales_partitioned`
WHERE DATE(transaction_timestamp) >= '2024-01-01'
GROUP BY customer_id;
5. Machine Learning met BigQuery ML
ML in BigQuery
BigQuery ML maakt machine learning mogelijk met standaard SQL syntax, waardoor data scientists en analysts modellen kunnen bouwen zonder data te verplaatsen.
Machine Learning Implementation
-- BIGQUERY ML: MACHINE LEARNING MET STANDARD SQL
-- 1. LINEAR REGRESSION: Voorspelling van continue waarden
CREATE OR REPLACE MODEL `project-id.dataset.customer_lifetime_value`
OPTIONS(
model_type = 'LINEAR_REG',
input_label_cols = ['lifetime_value'],
data_split_method = 'AUTO_SPLIT',
learn_rate_strategy = 'LINE_SEARCH'
) AS
SELECT
age,
income,
tenure_months,
avg_monthly_spend,
churn_flag AS lifetime_value
FROM `project-id.dataset.customer_features`;
-- 2. LOGISTIC REGRESSION: Binary classification
CREATE OR REPLACE MODEL `project-id.dataset.churn_prediction`
OPTIONS(
model_type = 'LOGISTIC_REG',
input_label_cols = ['will_churn'],
auto_class_weights = TRUE,
data_split_method = 'RANDOM',
data_split_eval_fraction = 0.2
) AS
SELECT
customer_id,
days_since_last_purchase,
avg_purchase_value,
purchase_frequency,
support_tickets,
CASE WHEN churned = 1 THEN 'yes' ELSE 'no' END AS will_churn
FROM `project-id.dataset.customer_behavior`;
-- 3. K-MEANS CLUSTERING: Customer segmentation
CREATE OR REPLACE MODEL `project-id.dataset.customer_segments`
OPTIONS(
model_type = 'KMEANS',
num_clusters = 5,
distance_type = 'EUCLIDEAN',
standardize_features = TRUE
) AS
SELECT
total_spent,
purchase_count,
avg_transaction_value,
days_since_last_purchase,
product_variety_score
FROM `project-id.dataset.customer_metrics`;
-- 4. MATRIX FACTORIZATION: Recommendation system
CREATE OR REPLACE MODEL `project-id.dataset.product_recommendations`
OPTIONS(
model_type = 'MATRIX_FACTORIZATION',
feedback_type = 'IMPLICIT',
user_col = 'user_id',
item_col = 'product_id',
rating_col = 'interaction_score',
l2_reg = 0.1,
num_factors = 10
) AS
SELECT
user_id,
product_id,
LOG(1 + views) + (2 * purchases) AS interaction_score
FROM `project-id.dataset.user_product_interactions`;
-- 5. TIME SERIES FORECASTING: ARIMA_PLUS model
CREATE OR REPLACE MODEL `project-id.dataset.sales_forecast`
OPTIONS(
model_type = 'ARIMA_PLUS',
time_series_timestamp_col = 'date',
time_series_data_col = 'sales',
holiday_region = 'NL',
auto_arima = TRUE,
data_frequency = 'DAILY'
) AS
SELECT
DATE(transaction_timestamp) AS date,
SUM(sales_amount) AS sales
FROM `project-id.dataset.sales_data`
WHERE DATE(transaction_timestamp) >= '2023-01-01'
GROUP BY 1;
-- 6. MODEL EVALUATIE EN METRICS
SELECT
*
FROM ML.EVALUATE(MODEL `project-id.dataset.churn_prediction`);
SELECT
*
FROM ML.CENTROIDS(MODEL `project-id.dataset.customer_segments`);
SELECT
*
FROM ML.ARIMA_COEFFICIENTS(MODEL `project-id.dataset.sales_forecast`);
-- 7. PREDICTIONS MAKEN
SELECT
customer_id,
predicted_will_churn,
predicted_will_churn_probs
FROM ML.PREDICT(
MODEL `project-id.dataset.churn_prediction`,
(
SELECT
customer_id,
days_since_last_purchase,
avg_purchase_value,
purchase_frequency,
support_tickets
FROM `project-id.dataset.active_customers`
)
);
-- 8. RECOMMENDATIONS GENEREREN
SELECT
user_id,
recommended_product_id,
prediction_score
FROM ML.RECOMMEND(MODEL `project-id.dataset.product_recommendations`);
-- 9. FORECASTING VOOR TOEKOMST
SELECT
forecast_timestamp,
forecast_value,
prediction_interval_lower_bound,
prediction_interval_upper_bound
FROM ML.FORECAST(
MODEL `project-id.dataset.sales_forecast`,
STRUCT(30 AS horizon, 0.8 AS confidence_level)
);
-- 10. IMPORT EXTERNAL TENSORFLOW MODEL
CREATE OR REPLACE MODEL `project-id.dataset.image_classifier`
OPTIONS(
model_type = 'TENSORFLOW',
model_path = 'gs://bucket-name/models/image_classifier/*'
);
6. Data Governance en Security
Enterprise Governance
Data governance en security in BigQuery bieden enterprise-grade controls voor data protection, compliance en access management.
| Security Feature | Implementatie | Use Case | Best Practices |
|---|---|---|---|
| IAM Roles | Role-Based Access Control | Granular permissions management | Principle of least privilege |
| Column-Level Security | Policy tags en data masking | PII data protection | Classify sensitive columns |
| Row-Level Security | Authorized views | Data segmentation per department | Use separate datasets |
| Data Encryption | Encryption at rest/in transit | Compliance requirements | Use CMEK for control |
| Audit Logging | Cloud Audit Logs | Compliance en forensics | Export to BigQuery |
| Data Catalog | Metadata management | Data discovery en lineage | Tag assets consistently |
Klaar voor BigQuery Projecten?
Vind de juiste GCP data experts of plaats je cloud data engineering vacature
7. Cost Management en Optimization
BigQuery Cost Control
Cost optimization in BigQuery vereist begrip van pricing modellen, monitoring tools en best practices voor kostenbeheersing.
Pricing Modellen
- On-demand: $6.25 per TB
- Flat-rate: $2,000+/maand
- Storage: $0.02 per GB/maand
- Streaming: $0.01 per 200MB
Cost Monitoring
- Billing reports
- Information Schema
- Cloud Monitoring
- Custom dashboards
Optimization Tactics
- Partition elimination
- Clustering optimization
- Query caching
- Materialized views
Alerting
- Budget alerts
- Anomaly detection
- Cost thresholds
- Usage reports
8. Integration met Google Cloud Services
GCP Ecosystem Integration
BigQuery integration met andere Google Cloud services creëert een end-to-end data platform voor moderne analytics workflows.
Integration Patterns
-- BIGQUERY INTEGRATION MET GCP SERVICES
-- 1. GOOGLE CLOUD STORAGE (GCS) INTEGRATION
-- External tables voor querying GCS data
CREATE EXTERNAL TABLE `project-id.dataset.gcs_sales`
OPTIONS(
format = 'PARQUET',
uris = ['gs://sales-data-bucket/2024/*.parquet'],
hive_partition_uri_prefix = 'gs://sales-data-bucket/',
uris = ['gs://sales-data-bucket/*']
);
-- 2. GOOGLE CLOUD DATAFLOW (Apache Beam)
-- Real-time data processing naar BigQuery
"""
pipeline_options = PipelineOptions()
with beam.Pipeline(options=pipeline_options) as p:
(p
| 'Read from PubSub' >> beam.io.ReadFromPubSub(topic='projects/project/topics/sales')
| 'Parse JSON' >> beam.Map(json.loads)
| 'Filter Valid' >> beam.Filter(lambda x: x['amount'] > 0)
| 'Write to BigQuery' >> beam.io.WriteToBigQuery(
'project:dataset.sales',
schema='customer_id:STRING,product_id:STRING,amount:FLOAT64',
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND)
)
"""
-- 3. GOOGLE CLOUD FUNCTIONS
-- Event-driven data processing
"""
exports.processNewFile = async (event, context) => {
const file = event;
const [metadata] = await bigquery
.dataset('dataset')
.table('processed_files')
.insert([{
file_name: file.name,
processed_at: new Date().toISOString(),
status: 'processed'
}]);
console.log(`Processed ${file.name}`);
};
"""
-- 4. GOOGLE CLOUD AI PLATFORM
-- Train models on BigQuery data
"""
gcloud ai-platform jobs submit training my_training_job \
--region=europe-west4 \
--master-image-uri=gcr.io/cloud-aiplatform/training/tf-cpu.2-3:latest \
--scale-tier=BASIC \
-- \
--query="SELECT * FROM \`project.dataset.training_data\`" \
--model-dir=gs://bucket/models
"""
-- 5. GOOGLE DATA STUDIO
-- Real-time dashboards op BigQuery data
-- Direct connection voor live reporting
-- 6. GOOGLE CLOUD COMPOSER (Airflow)
-- Orchestration van BigQuery workflows
"""
from airflow import DAG
from airflow.providers.google.cloud.operators.bigquery import (
BigQueryExecuteQueryOperator
)
dag = DAG('bigquery_etl', schedule_interval='@daily')
extract = BigQueryExecuteQueryOperator(
task_id='extract_data',
sql='SELECT * FROM `project.dataset.raw_sales`',
destination_dataset_table='project.dataset.staging_sales',
write_disposition='WRITE_TRUNCATE',
dag=dag
)
"""
-- 7. GOOGLE CLOUD DATAPROC
-- Spark jobs voor complexe transformaties
"""
gcloud dataproc jobs submit spark \
--cluster=my-cluster \
--region=europe-west4 \
--jar=gs://spark-lib/bigquery/spark-bigquery-latest.jar \
-- \
--table=project.dataset.sales \
--output=project.dataset.transformed_sales
"""
9. Best Practices en Patterns
BigQuery Best Practices
Best practices en patterns helpen bij het bouwen van schaalbare, performante en kostenefficiënte BigQuery oplossingen.
Design Patterns voor BigQuery
Data Lake Pattern
- Bronze/Silver/Gold layers
- External tables
- Parquet/AVRO format
- Data Catalog tags
Real-time Analytics
- Streaming inserts
- Materialized views
- BI Engine caching
- Looker Studio dashboards
ML Ops Pattern
- BigQuery ML models
- Model registry
- Prediction serving
- Monitoring pipelines
10. Migratie naar BigQuery
Migration Strategies
Migratie naar BigQuery vereist zorgvuldige planning, assessment en execution om data integriteit en performance te garanderen.
Assessment
- Data volume analysis
- Query complexity
- Performance requirements
- Security compliance
Planning
- Migration roadmap
- Phased approach
- Testing strategy
- Rollback planning
Execution
- Data transfer service
- Schema conversion
- Query optimization
- Performance testing
Validation
- Data quality checks
- Performance benchmarks
- Cost optimization
- User acceptance
11. Praktijk Case Studies
Real-World Implementaties
Praktijk case studies tonen hoe Google BigQuery wordt geïmplementeerd in verschillende industrieën met meetbare business outcomes.
Case Study: E-commerce Retailer
Uitdaging: Trage rapportage, hoge IT costs, beperkte personalisatie.
Oplossing: BigQuery met real-time analytics en ML recommendations.
| Metric | Before | After | Improvement |
|---|---|---|---|
| Query Performance | Hours to days | Seconds to minutes | 100x faster |
| IT Costs | $500k/year | $150k/year | 70% reduction |
| Personalization | Static recommendations | Real-time ML models | Dynamic personalization |
| Data Freshness | 24+ hours | 5 minutes | 288x faster |
| Uptime | 99.5% | 99.99% | Higher reliability |
Conclusie en Key Takeaways
Key Lessons Learned
DO's
- Gebruik partitioning en clustering
- Implementeer cost controls vroeg
- Gebruik BigQuery ML voor analytics
- Monitor query performance
- Implementeer data governance
DON'Ts
- Negeer cost management niet
- SELECT * op grote tabellen
- Vergeet caching niet
- Onderschat security niet
- Copy on-premise patterns
Emerging Trends
- BigQuery Omni (multi-cloud)
- BigQuery ML advancements
- Real-time ML serving
- Enhanced data governance
- AI-powered analytics
Veelgestelde Vragen (FAQ)
Q: Wat is het verschil tussen BigQuery en Bigtable?
A: BigQuery is een data warehouse voor analytics workloads (OLAP), terwijl Bigtable een NoSQL database is voor low-latency, high-throughput workloads (OLTP). BigQuery is geoptimaliseerd voor complexe queries op grote datasets, Bigtable voor real-time applicaties.
Q: Hoe implementeer ik CI/CD voor BigQuery?
A: CI/CD voor BigQuery: 1) Infrastructure as Code (Terraform), 2) Schema management (Liquibase/Flyway), 3) GitHub Actions/Cloud Build, 4) Testing framework (Dataform), 5) Environment promotion, 6) Rollback procedures, 7) Monitoring en alerting.
Q: Wat zijn de SLA's van Google BigQuery?
A: BigQuery SLA's: 1) Monthly uptime percentage: 99.9% (on-demand), 99.99% (flat-rate), 2) Storage durability: 99.999999999%, 3) Data availability: 99.99%, 4) Query availability: 99.9%. Enterprise support biedt higher SLA's.
Q: Hoe monitor ik BigQuery performance?
A: Monitoring tools: 1) INFORMATION_SCHEMA views, 2) Cloud Monitoring metrics, 3) Cloud Audit Logs, 4) BigQuery Admin panel, 5) Custom dashboards, 6) Query plan explain, 7) Slot utilization monitoring, 8) Cost analysis reports.