Cloud Spanner Instances Google Professional Data Engineer GCP
- To use Cloud Spanner, first create a Cloud Spanner instance within Google Cloud project.
- instance allocates resources used by Cloud Spanner
- Instance creation includes instance configuration and the node count
- An instance configuration defines the geographic placement and replication of the databases in that instance.
- node count is the number of nodes to allocate to that instance.
- Each node provides up to 2 TB of storage.
- After instance creation, can add or remove nodes to the instance later.
- cannot remove nodes if
- If store more than 2 TB of data per node.
- Spanner has created a large number of splits for instance’s data
- To change nodes, use
- Cloud Consol
- the gcloud command-line tool
- the client libraries
Nodes versus replicas
- To scale up the serving and storage resources in instance, add more nodes to that instance.
- Adding a node does not increase the number of replicas but increases the resources
- total number of servers in a Cloud Spanner instance is the number of nodes the instance has multiplied by the number of replicas in the instance.
Data Management
- To create, alter, and delete tables and indexes is done by using the default Database editor
- Use Cloud Console for inserting, editing, and deleting data.
- run DML statements using client libraries, the Google Cloud Console, and the gcloud command-line tool.
- execute DML statements inside read-write transactions.
- During data read, shared read locks is acquired on limited portions of the row ranges to read.
- During write using DML statements, exclusive locks is acquired
- Cloud Spanner sequentially executes all the SQL statements (SELECT, INSERT, UPDATE, and DELETE) within a transaction and not concurrently except multiple SELECT statements
- transaction with DML statements has the same limits as any other transaction.
- Use Partitioned DML for large-scale changes
- If transaction result in more than 20,000 mutations, a BadUsage error is given
- If a transaction result larger than 100 MB, a BadUsage error is given
- Partitioned DML is designed for bulk updates and deletes, particularly periodic cleanup and backfilling.
Query
- DQL statements is used to query
- A query execution plan is the set of steps for how the results are obtained.
- can retrieve a query plan using the Cloud Console, the client libraries, and the gcloud command-line tool.
Query Best Practices
- Use query parameters to speed up frequently executed queries
- Use secondary indexes to speed up common queries
- Avoid large reads inside read-write transactions
- Use ORDER BY to ensure the ordering of SQL results
- Use STARTS_WITH instead of LIKE to speed up parameterized SQL queries
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz