Cloud Pub/Sub Overview Google Professional Data Engineer GCP
- use Pub/Sub as messaging-oriented middleware
- use as event ingestion and delivery for streaming analytics pipelines.
- offers durable message storage and real-time message delivery
- gives high availability and consistent performance at scale.
- It is a publish/subscribe (Pub/Sub) service
- senders of messages are decoupled from the receivers of messages
- Main terms
- Message: the data that moves through the service.
- Topic: a named entity that represents a feed of messages.
- Subscription: a named entity that represents an interest in receiving messages on a particular topic.
- Publisher (also called a producer): creates messages and sends (publishes) them to the messaging service on a specified topic.
- Subscriber (also called a consumer): receives messages on a specified subscription.
- publisher creates and sends messages to a topic.
- Subscriber applications create a subscription to a topic to receive messages from it.
- Communication can be
- one-to-many (fan-out)
- many-to-one (fan-in),
- many-to-many.
The flow of messages through Pub/Sub is as
In above figure
- There are two publishers publishing messages on a single topic.
- 2 subscriptions to the topic
- The first subscription has two subscribers, so messages will be load-balanced across them
- each subscriber receiving a subset of the messages
- The second subscription has one subscriber that will receive all of the messages.
- The bold letters are messages.
- Message A comes from Publisher 1 and sent to Subscriber 2 via Subscription 1, and to Subscriber 3 via Subscription 2.
- Message B comes from Publisher 2 and is sent to Subscriber 1 via Subscription 1 and to Subscriber 3 via Subscription 2.
Publisher and subscriber endpoints
- Publishers should make HTTPS requests to pubsub.googleapis.com
- It can be
- an App Engine app
- a web service hosted on Google Compute Engine
- other third-party network, an app installed on a desktop or mobile device, or even a browser.
- Pull subscribers make HTTPS requests to pubsub.googleapis.com.
- Push subscribers must be Webhook endpoints that can accept POST requests over HTTPS.
Common use cases
- Balancing workloads in network clusters.
- Implementing asynchronous workflows.
- Distributing event notifications.
- Refreshing distributed caches.
- Logging to multiple systems.
- Data streaming from various processes or devices.
Architecture
- Publisher and subscriber clients are not aware of the location of the servers to which they connect or how those services route the data.
- load balancing direct publisher traffic to the nearest GCP data center
- individual message is stored in a single region.
- topic may have messages stored in many regions.
- When a subscriber client requests messages published to this topic, it connects to the nearest server
- Pub/Sub is divided into two primary parts:
- the data plane managing moving messages between publishers and subscribers,
- the control plane, managing assignment of publishers and subscribers to servers on the data plane.
- data plane servers are called forwarders
- control plane servers are called routers
- publishers and subscribers are connected to their assigned forwarders
- so easily upgrade the control plane of Pub/Sub without affecting any clients
- All message is a base64-encoded message body and an arbitrary set of key-value pairs called attributes.
- There is no structure or context to the message so, JSON or XML entities must be enforced. Each message has a globally unique message ID, to identify if it has already been processed.
- Messages may be up to 10 MB in total size
- Two methods for message delivery: push subscriptions and pull subscriptions
- In a push subscription, server sends a request to the subscriber app at a preconfigured URL endpoint.
- In the pull model, the subscriber requests messages from the server and acknowledges receipt.
- Push subscriptions have a limits of 10,000 messages per second and 10,000 concurrent message deliveries.
- By default, subscriptions are created with an ack deadline of 10 seconds and the message deadline may be increased to up to 600 seconds.
- By default, subscriptions expire after 31 days of inactivity
- Using subscription expiration policies, can configure the inactivity duration
Topic and Message Management
- can create, delete, and view topics using
- the API
- the Google Cloud Console
- the gcloud command-line tool
- must create a subscription to a topic before subscribers can receive messages published to the topic.
- create subscriptions with
- the API
- the Google Cloud Console
- the gcloud command-line tool
Resource Name
- Resource name uniquely identifies a Pub/Sub resource
- Resource can be a subscription or topic
- must fit the following format: projects/project-identifier/collection/relative-name
- The project-identifier must be the project ID, available from the Google Cloud Console. For example, projects/myproject/topics/mytopic.
- The collection must be one of subscriptions or topics.
- The relative-name must:
- Not begin with the string goog.
- Start with a letter
- Contain between 3 and 255 characters
- Contain only the following characters:
- Letters: [A-Za-z]
- Numbers: [0-9]
- Dashes: –
- Underscores: _
- Periods: .
- Tildes: ~
- Plus signs: +
- Percent signs: %
Message Storage Security
- If publish messages to a global endpoint, automatic storage in the nearest Google Cloud region.
- topic message storage policy ensure all data published to a topic is persisted in a specific region or set of regions, regardless of the publish request’s origin.
- When multiple regions are allowed by the policy, Pub/Sub chooses the nearest allowed region.
- To configure all of the topics in an organization-wide scope, use the Resource Location Restriction organization policy.
- For fine-grained control, configure a topic’s message storage policy at topic creation, or via the UpdateTopic operation.
- You can configure the policy using the:
- Topic details view
- gcloud command-line tool
- Service API (using client libraries)
Authentication and Access
- Following authentication methods allowed
- Service accounts
- User accounts – You can authenticate users directly to application, when the application needs to access resources on behalf of an end user.
- Uses Cloud IAM for access control
- access control can be configured at the project level and at the individual resource level.
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz