Google Products and Storage Options Google Professional Data Engineer GCP
Various storage systems in Google cloud are discussed with their uses.
Cloud SQL
- A fully managed relational database service
- Easily set up and manage RDBMS – PostgreSQL, MySQL, and SQL Server in GCP
- Apt to be used for WordPress, backends, CRM tools, MySQL, PostgreSQL, and Microsoft SQL Servers
Cloud Spanner
- A scalable relational database service
- Full transactions support
- Provides strong consistency and high availability
- Useful for mission-critical applications
- Provides scale insurance
Cloud Bigtable
- NoSQL database service from GCP
- Provides low latency reads
- Supports high throughput writes
- Enables scalability and reliability
- Suitable for large analytical workloads and low-latency applications
- Store large amount of structured objects.
- No support for SQL’s queries or multi-row transactions.
- Provision for capacity petabytes with a maximum unit size of 10 megabytes per cell and 100 megabytes per row.
Cloud Memorystore
- A managed in-memory data store service for Redis
- Useful for sub-millisecond data access using Redis
- Can build application caches
- Provides scalable, secure and highly available GCP infrastructure.
Cloud Firestore
- Managed, serverless, cloud-native NoSQL document database.
- Useful for client side mobile and web applications and gaming leaderboards
Firebase Realtime Database
- A NoSQL database from GCP to store and sync data between users in real time.
- Useful for
- creating onboarding flows
- rolling out new features
- building serverless apps
BigQuery
- Serverless, highly scalable, and cost-effective data warehouse service
- Lowers data warehouse costs as all infrastructure in GCP
- Useful for
- real-time analytics
- advanced and predictive analytics
- large-scale events
Cloud Datastore
- NoSQL document database service
- Fully-managed service by GCP
- Easy scalability without configuration or downtime.
- Useful for
- user profiles
- product catalogues
- mobile games.
- Useful for web and mobile applications which may require massive scale in future
- Supports storage of unstructured objects, transactions and SQL-like queries.
- Provides terabytes of capacity
- maximum unit size of one megabyte per entity.
Cloud Storage
- For storing immutable blobs larger than 10 megabytes like images or movies.
- Provides huge capacity with a maximum unit size of five terabytes per object.
Select the right GCP database service
- Existing database – GCP database service
- Redis – Cloud Memorystore for Redis
- MemcacheD – App Engine for MemcacheD
- MySQL – Cloud SQL for MySQL
- PostgreSQL – Cloud SQL for PostgreSQL
- SQL Server – Cloud SQL for SQL Server
- HBase – Cloud Bigtable
Use Case
- If need full SQL support with OLTP use Cloud SQL or Cloud Spanner. Cloud SQL provides terabytes capacity and Cloud Spanner provides petabytes capacity
- For big data analysis and interactive query use BigQuery
- For semi structured application use Cloud Datastore
- For analytical data with heavy read/write events like Advertisement Tech, Financial or IoT data use Bigtable
- To store structured and unstructured, binary data like images/ multimedia files and backups, use Cloud Storage
- For popular web frameworks use Cloud SQL
- For huge database applications more than 2 terabytes use Cloud Spanner. Use cases like financial trading and e-commerce.
Feature Comparison Table
Relational | NoSQL / Nonrelational | |||||
Cloud Spanner | Cloud SQL | Cloud Bigtable | Cloud Firestore | Firebase Realtime Database | Cloud Memorystore | |
Scale insurance | Yes | Yes | Yes | |||
Data distribution | Regional or global | zonal | Regional or global | Regional or global | zonal | zonal |
OSS compatibility | Yes | Yes | Yes | |||
Replica consistency | strong | strong | eventual | strong | n/a | eventual |
Multi-primary | Yes | Yes | Yes | |||
Transactions (strong consistency) | Yes | Yes | Yes | |||
Joins and complex queries | Yes | Yes | ||||
Ultra low latency (microsecond or single-digit ms) | Yes | Yes | Yes | |||
Serverless | Yes | Yes | ||||
Realtime sync to clients | Yes | Yes | ||||
In-memory | Yes | |||||
Direct client access | Yes | Yes | ||||
Game state | Yes | Yes | Yes | |||
Gaming leaderboard/player profile | Yes | Yes | Yes |
Evaluating Cloud Storage Options
The key considerations in evaluation of GCP data storage options, are:
- Scalability – Able to scale as per requirement
- Durability – High availability and consistency to store critical data
- Able to store unstructured/semi-structured and structured data
- Free from fixed schema
- Separation of components Able to decouple storage, compute and other components for scaling of each.
- Cost Effective – Should be cost effective and offer pay as you use model.
Functional requirements
- Data format to be stored – data type to store like transactional data, JSON objects, telemetry, search indexes, or flat files.
- Scale and structure. Need for data partitioning and total storage capacity needed
- Data size supported – Size of entities to store and will be stored as a document, or can be split across multiple
- Data relationships. Relationship to support one to one, one-to-many or many-to-many relationships
- Consistency model. – Level of consistency needed ACID needed or accept eventual consistency
- concurrency needed during data updation and synchronization, pessimistic or optimistic concurrency
- Schema flexibility. fixed schema or schema less needed
- Data lifecycle.
- Data movement. Level of data movement, ETL for moving data to data stores or data warehouses
Non-functional requirements
- Performance and scalability. performance requirements needed for data ingestion or processing rates, acceptable response times for query
- Level of fault-tolerance needed, backup and restore capabilities
- Either distribute among multiple replicas or regions, replication capabilities needed
Management and cost
- Managed service. managed service provided for easy management
- Region availability. available in all regions or selected ones
- Does data need to be migrated
- Proprietary versus OSS usage and license
- Overall cost.
Security
- encryption used, authentication mechanism needed
- audit log level of details and what can be audited
- Networking requirements. Any restriction in access to data
DevOps
- Skill set. Specific programming languages, operating systems, or other technology needed
- Clients client support for development languages
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz