Storage types Google Professional Data Engineer GCP
- Two types – solid-state drives (SSD) or hard disk drives (HDD).
- SSD storage is the most efficient and cost-effective choice for most use cases.
- HDD storage is sometimes appropriate for very large data sets (>10 TB) that are not latency-sensitive or are infrequently accessed.
- HDD use cases
- store at least 10 TB of data.
- not to be used for user-facing or latency-sensitive application.
- workload is Batch workloads or Data archival
- Application profiles, or app profiles for instances using replication,
- app profiles control how applications connect to the instance’s clusters.
- Without replication, app profiles provide separate identifiers for each of applications
- A cluster is a service in a specific location.
- Cluster belongs to a single instance
- An instance can have up to 4 clusters
- application requests are handled by one of the clusters in the instance.
- cluster is located in a single zone.
- An instance’s clusters must each be in unique zones.
- can create more cluster in any zone if Bigtable is available.
- instances with only 1 cluster do not use replication.
- Each cluster in an instance has 1 or more nodes
- Nodes are compute resources to manage data.
- Bigtable splits all data from tables into smaller tablets.
- Tablets are stored on disk, separate from the nodes but in the same zone as the nodes.
- A tablet is associated with a single node.
- Each node
- Keep track of specific tablets on disk.
- Handle incoming reads and writes for its tablets.
- Perform maintenance tasks on its tablets
- Select or create a GCP project.
- A project name must be between 4 and 30 characters.
- A project ID is suggested which can be edited and is 6 to 30 characters, with a lowercase letter as the first character and last character cannot be a hyphen.
- Make sure billing is enabled for Google Cloud project.
- Enable the Cloud Bigtable and Cloud Bigtable Admin APIs.
Labels –
- a key-value pair
- helps you organize GCP resources
- Can attach a label to each resource
- filter the resources based on their labels.
- By default, can provision maximum thirty Cloud Bigtable nodes/zone in each Google Cloud project.
- For more use the node request form.
- After creating a Bigtable instance, can update following settings
- number of nodes in each cluster
- number of clusters in the instance
- application profiles for the instance
- labels for the instance
- display name for the instance
- can add clusters to an existing instance,
- a maximum of 4 clusters per instance can be present
- Clusters can be in any region if Bigtable is available
- Can delete all but 1 of the clusters is needed
- Deleting all but 1 cluster automatically disables replication.
- monitor Bigtable instance using Cloud Console and Cloud Monitoring
- A high-level overview is given
- Key Visualizer tool gives drill down
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz