Best Practices DataProc Google Professional Data Engineer GCP
- Specify cluster image versions.
- Know when to use custom images.
- Use the Jobs API for submissions.
- Control the location of initialization actions.
- Keep an eye on Dataproc release notes.
- Know how to investigate failures.
- Use Google Cloud Storage as primary data source and sink
- Persist information on how to build clusters
- Identify a source control mechanism
- Externalize the Hive metastore database with Cloud SQL
- Use cloud authentication and authorization policies
- Knows way around Stackdriver
- Transform YARN queues into workflow templates
- Start small and enable autoscaling
- Consolidate job history across multiple clusters
- Take advantage of GCP services
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz