Optimize for Performance and Cost Google Professional Data Engineer GCP
Partitioned table
- a special table that is divided into segments, called partitions,
- make it easier to manage and query data.
- By dividing a large table improve query performance, and control costs
- partition tables by:
- Ingestion time
- Date/timestamp
Clustered tables
- table data is automatically organized based on the contents of one or more columns in the table’s schema.
- The columns specified are used to colocate related data.
- order of columns is important for the sort order of the data.
- Clustering improve the performance of queries
- During write data is sorted using the values in clustering columns.
- clustering is allowed over a partitioned table.
- Use clustering if
- data is already partitioned on a date, timestamp, or integer column.
- using filters or aggregation against particular columns in queries.
- clustering is based on ingestion time
- date/timestamp
- integer range
Views
- a virtual table defined by a SQL query.
- A view is queried it in the same way you query a table.
- query results of view contain data only from the tables and fields specified in the query that defines the view.
- can query views in BigQuery by
- Query editor box in the Cloud Console
- Compose Query option in the classic BigQuery web UI
- BigQuery command-line tool’s bq query command
- BigQuery REST API to programmatically call the jobs.query or query-type jobs.insert methods
- BigQuery client libraries
- Also use view as a data source for a visualization
- views limitations
- The dataset that contains view and the dataset that contains the tables referenced by the view must be in the same location.
- cannot run a BigQuery job that exports data from a view.
- cannot use the TableDataList JSON API method to retrieve data from a view.
- cannot reference query parameters in views.
- If schema change after view is created, the reported schema will be inaccurate
- cannot include a user-defined function in the SQL query that defines a view.
- cannot reference a view in a wildcard table query.
Reservations
- two pricing models
- On-demand pricing: You pay only for the data scanned by queries.
- Flat-rate pricing: Offers predictable and consistent month-to-month costs.
- By default, billed as per on-demand pricing model.
- With BigQuery Reservations, can switch to the flat-rate pricing model by purchasing commitments
- commitments are dedicated portions of query processing capacity measured in slots with cost of all bytes processed is included in the flat-rate price.
INFORMATION_SCHEMA
- For on-demand pricing, queries against INFORMATION_SCHEMA views 10 MB is the minimum billing amount
- For flat-rate pricing, queries against INFORMATION_SCHEMA views and tables consume purchased BigQuery slots.
- INFORMATION_SCHEMA queries are not cached
- charged each time you run an INFORMATION_SCHEMA query,
- not charged storage fees for the INFORMATION_SCHEMA views.
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz