Cloud Composer Overview: Google Professional Data Engineer GCP
In this, we will get Cloud Composer Overview.
Cloud Composer Overview:
- Is a managed workflow orchestration service that is built on Airflow
- deploys multiple components to run Airflow.
- Composer relies on certain configurations to successfully execute workflows.
- Altering configurations can have unintended consequences or break Airflow deployment.
Environments
- Airflow is a micro-service architected framework.
- To deploy Airflow, provision many GCP components, called Cloud Composer environment.
- Can create one or more Cloud Composer environments inside of a project.
- Environments are self-contained Airflow deployments based on GKE.
- environments work with Google Cloud services.
- create Cloud Composer environments in supported regions
- environments run within a Compute Engine zone.
- Airflow communicates with other Google Cloud products through the products’ public APIs.
Architecture
Cloud Composer distributes the environment’s resources between a Google-managed tenant project and a customer project, as
Tenant project resources
- For unified Cloud IAM, access control and data security, Cloud Composer deploys Cloud SQL and App Engine in the tenant project.
Cloud SQL
- Cloud SQL stores the Airflow metadata.
- Composer limits database access to the default or the specified custom service account used to create the environment.
- Composer backs up the Airflow metadata daily to minimize potential data loss.
- Only service account used to create the Composer environment can access data in the Cloud SQL database.
App Engine
- Its flexible environment hosts the Airflow web server.
- By default, the Airflow web server is integrated with Identity-Aware Proxy.
- Also enables you to use the Cloud Composer IAM policy to manage web server access.
- Composer also supports deploying a self-managed Airflow web server in the customer project.
Customer project resources
Composer deploys following in customer project.
- Cloud Storage: provides the storage bucket for staging DAGs, plugins, data dependencies, and logs.
- Google Kubernetes Engine: By default, Cloud Composer deploys core components—such as Airflow scheduler, worker nodes, and CeleryExecutor—in a GKE. Composer also supports VPC-native clusters using alias IPs.
- Redis, the message broker for the CeleryExecutor, runs as a StatefulSet application so that messages persist across container restarts.
- Cloud Logging and Cloud Monitoring: Composer integrates with Cloud Logging and Cloud Monitoring, to view all Airflow service and workflow logs.
Cloud Composer Environment Component
Components for each environment:
- Web server: The web server runs the Apache Airflow web interface, and Identity-Aware Proxy protects the interface.
- Database: The database holds the Apache Airflow metadata.
- Cloud Storage bucket: bucket stores the DAGs, logs, custom plugins, and data for the environment.
Airflow management:
Use following Airflow-native tools for management
- Web interface: Access Airflow web interface from the Google Cloud Console or by direct URL.
- Command line tools: run gcloud composer commands to issue Airflow command-line commands.
- Cloud Composer REST and RPC APIs
Airflow configuration:
- configurations Composer provides for Apache Airflow are the same as the configurations for a locally-hosted Airflow deployment.
- Some Airflow configurations are preconfigured
- cannot change the configuration properties.
- Other configurations, to be specified when creating or updating environment.
Airflow DAGs (workflows):
- An Apache Airflow DAG is a workflow: a collection of tasks with additional task dependencies.
- Cloud Storage used to store DAGs.
- To add or remove DAGs add or remove the DAGs from the Cloud Storage bucket
- Can schedule DAGs
- can trigger DAGs manually or in response to events
Plugins
- can install custom plugins, into Cloud Composer environment.
Python dependencies
- can install Python dependencies from Python Package Index.
Access control
- manage security at the Google Cloud project level
- assign Cloud IAM roles for control.
- Without appropriate Cloud Composer IAM role, no access to any of environments.
Logging and monitoring:
- can view Airflow logs that are associated with single DAG tasks
- View in the Airflow web interface and
- the logs folder in the associated Cloud Storage bucket.
- Streaming logs are available for Cloud Composer.
- access streaming logs in Logs Viewer in Google Cloud Console.
- Also has audit logs, such as Admin Activity audit logs, for Google Cloud projects.
Networking and security:
During environment creation, following configuration options available
- Cloud Composer environment with a route-based GKE cluster (default)
- Private IP Cloud Composer environment
- Cloud Composer environment with a VPC Native GKE cluster using alias IP addresses
- Shared VPC
Create a Project
To create a project and enable the Cloud Composer API:
- In the Cloud Console, select or create a project.
- Make sure that billing is enabled for project.
- To activate the Cloud Composer API in a new or existing project, go to the API Overview page for Cloud Composer.
- Click Enable.
Cloud Composer Versioning
- Airflow follows the semantical software versioning schema.
- Composer supports last two stable minor Airflow releases and latest two patch versions for those minor releases.
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz