Store Google Professional Data Engineer GCP
Data is of various types as
Object Storage
Tools for object storage are listed.
Cloud Storage
- A managed object storage service
- Durable and highly-available storage for structured and unstructured data
- Can store
- log files
- database backup
- export files
- images
- binary files.
- Files organized by project into individual buckets.
- Buckets can support either custom ACLs or IAM controls.
- Logging by Cloud Logging.
- Use cases
- Data backup and disaster recovery
- Content distribution – store and deliver media files
- Storing ETL data
- Storing data for MapReduce jobs
- Storing query data
- Seeding machine learning
- Archiving cold data
- Multiple storage classes offered
- Standard Storage has highest availability, low-latency access for frequently accessed data, like serving website content, interactive storage workloads, data supporting mobile and gaming apps, data-intensive computations and big data processing.
- Nearline Storage is low-cost, highly durable storage if data is accessed once a month. Gives sub-second response times and apt for data archiving, online backup, or disaster recovery.
- Coldline Storage is a very-low-cost, highly durable storage for one a quarter data access. Gives sub-second response times, and apt for data archiving, online backup, and disaster recovery.
- Archive Storage is lowest-cost, highly durable storage for once a year data access. Gives fast access with sub-second response times and suitable for data archiving, online backup, and disaster recovery.
Cloud Storage for Firebase
- Scalable storage service for mobile app developers
- Designed to scale with user base.
- Also good for storing and retrieving assets such as images, audio, video, and other user-generated content in mobile and web apps.
- Firebase SDKs for uploads and downloads
- It stores files in a Cloud Storage bucket,
- Can do server-side processing like image filtering or video transcoding
Storing database data
Tools for databases, both RDBMS and NoSQL, are listed.
Cloud SQL
- A managed service giving MySQL and PostgreSQL engine
- built-in support for replication
- Provides low-latency, transactional and relational database workloads
- Supports standard APIs for connectivity.
- Has built-in backup and restoration, high availability, and read replicas.
- Supports RDBMS workloads up to 30 TB for both MySQL and PostgreSQL.
- Accessible from apps running on App Engine, GKE, or Compute Engine.
- Also supports standard connection drivers and app frameworks (like Django, Ruby on Rails) Data stored is encrypted in transit and at rest.
- Also has built-in support for access control, using network firewalls.
- Use cases for Cloud SQL OLTP
- Financial transactions
- User credentials
- Customer orders
- Also suitable for OLAP workloads or data needing dynamic schemas on a per-object basis.
- For dynamic schemas, use Datastore and for OLAP use BigQuery and for wide-column schemas, use Bigtable. Use Dataflow or Dataproc for ETL
Bigtable
- A managed service for wide-column NoSQL
- Designed for terabyte- to petabyte-scale workloads.
- Built on Google’s internal Bigtable database infrastructure
- Provides consistent, low-latency, and high-throughput storage for large-scale NoSQL data. Supports real-time app serving and large-scale analytical workloads.
- Use a single-indexed row key associated with a series of columns
- queries are based on row key
- Schemas are structured as tall or wide
- The style of schema is dependent on the downstream use cases and it’s important to consider data locality and distribution of reads and writes to maximize performance.
- Tall schemas used for time-series events, as data is keyed by a timestamp, with relatively fewer columns per row.
- Wide schemas, a simplistic identifier as the row key along with a large number of columns.
- Use cases
- Real-time app data
- Stream processing
- IoT time series data
- Adtech workloads
- Data ingestion
- Analytical workloads
- Apache HBase replacement
- No support for multi-row transactions, SQL queries or joins.
Spanner
- A horizontally scalable relational database service
- Has strong consistency, high availability, and global scale.
- Has ease of use and familiarity of a RDBMS with the scalability of a NoSQL database.
- Spanner supports
- Schemas
- ACID transactions
- SQL queries (ANSI 2011)
- Scales horizontally in regions and can scale across regions
- Perform automatic sharding and give millisecond latencies.
- Security includes data-layer encryption, audit logging, and Cloud IAM integration.
- Use cases
- Financial services
- Ad tech
- Retail and global supply chain
Firestore
- A flexible, scalable NoSQL database service
- stores JSON data
- JSON data can be synchronized in real time to connected clients
- Firestore API lets app persist data to a local disk
- Has a flexible, expression-based rules language
- Firestore Security Rules for authentication
- Use cases
- Chat and social media
- Mobile games
Ecosystem databases
- Can deploy own database software on Compute Engine VMs
- Traditional RDBMS supported like EnterpriseDB and Microsoft SQL Server
- NoSQL database systems like MongoDB and Cassandra
Storing data warehouse data
A data warehouse stores large quantities of data for query and analysis instead of transactional processing. For data-warehouse workloads, Google Cloud provides BigQuery.
BigQuery
- A managed data warehouse service
- Supports ingestion by web interface, command line tools, and REST API calls.
- Bulk loading in CSV, JSON, or Avro files.
- For streaming data, use Pub/Sub and Dataflow
- Can also stream data directly into BigQuery
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz