Schema and data model Google Professional Data Engineer GCP

Can contain one or more tables
Tables look like relational database tables
Table structured with rows, columns, and values
table have primary keys.
Data is strongly typed
must define a schema for each database and specify the data types of each column.
Allowable data types include scalar and array types
Avoid hot spotting like bigtable
Use nested tables with primary keys called interleaving for faster access.
Do not use sequential numbers, timestamps.
If needed, store timestamps in descending order.

Parent-child table relationships

Two ways to define parent-child relationships – table interleaving and foreign keys.
Table interleaving is good if child table’s primary key includes the parent table’s primary key columns.
Foreign keys
- are not limited to primary key columns
- tables can have multiple foreign key relationships

Primary keys

Every table must have a primary key
primary key can be composed of zero or more columns of that table.
For child, the primary key column(s) of the parent table must be the prefix of the primary key of the child table.
Spanner stores rows in sorted order by primary key values
child rows also inserted between parent rows that share the same primary key prefix.
Spanner can physically co-locate rows of related tables.

Choosing a primary key

primary key uniquely identifies each row in a table.
Options for primary key
- Hash the key and store it in a column.
- Use a Universally Unique Identifier (UUID).
- Bit-reverse sequential values.

Database splits

can define hierarchies of parent-child relationships between tables up to seven layers deep
Spanner divides data into chunks called “splits”
individual splits can move independently from each other and assigned to different servers
A split is a range of rows in a top-level (in other words, non-interleaved) table, where the rows are ordered by primary key.
The start and end keys of this range are called “split boundaries”.
Spanner automatically adds and removes split boundaries

Key columns

Storing NULLs

Disallowed types

These cannot be of type ARRAY:

Replication

Spanner automatically gets replication at the byte level
Spanner writes database mutations to files in this filesystem
Spanner creates replicas of each database split
3 types of replicas: read-write replicas, read-only replicas, and witness replicas.
Single-region instances use only read-write replicas,
multi-region instance configurations use a combination of all three types

For multi-region

Design a schema that prevents hotspots and other performance issues.
Place compute resources for write-heavy workloads within or close to the default leader region for optimal write latency
Use staleness of at least 15 seconds for optimal read performance outside of the default leader region
place critical compute resources in at least two regions to avoid single-region dependency
keep high priority total CPU utilization under 45% in each region.