Best Practices DataProc Google Professional Data Engineer GCP

  1. Home
  2. Best Practices DataProc Google Professional Data Engineer GCP
  • Specify cluster image versions.
  • Know when to use custom images.
  • Use the Jobs API for submissions.
  • Control the location of initialization actions.
  • Keep an eye on Dataproc release notes.
  • Know how to investigate failures.
  • Use Google Cloud Storage as primary data source and sink
  • Persist information on how to build clusters
  • Identify a source control mechanism
  • Externalize the Hive metastore database with Cloud SQL
  • Use cloud authentication and authorization policies
  • Knows way around Stackdriver
  • Transform YARN queues into workflow templates
  • Start small and enable autoscaling
  • Consolidate job history across multiple clusters
  • Take advantage of GCP services
Menu