Cloud Data Loss Prevention (DLP): GCP Data Engineer GCP
In this, we will learn about Cloud Data Loss Prevention (DLP).
What is Cloud Data Loss Prevention (DLP)?
- helps to manage sensitive data.
- provides fast, scalable classification and redaction for sensitive data elements
- sensitive data elements may include
- credit card numbers
- names
- social security numbers
- phone numbers
- Classifies data using more than 120 predefined detectors or information type (or “infoType”) detectors.
- Can identify patterns, formats, and checksums, and contextual clues.
- May redact data by masking, secure hashing, tokenization, bucketing, and format-preserving encryption.
- Can define custom infoType detectors
- Has De-identification techniques
- Inspects a base64-encoded image for text
- Can detect sensitive data within streams of data, structured text, images.
- It can inspect for and redact sensitive text from an image as per defined criteria
- Can analyze structured data for risk of being re-identified
- Inspection: Cloud DLP inspects the submitted data for the specified intoTypes and returns the detected InfoTypes
- Redaction: After inspection, redacts any sensitive data findings by masking them with opaque rectangles for images.
InfoType and InfoType detector
- Is a type of sensitive data, like name, email address, telephone number,etc
- Every infoType has a corresponding detector.
- Used in configuration for scans to determine what to inspect for and how to transform findings.
- InfoType names are used if showing or reporting scan results.
- Uses infoType detectors and OCR
- Can use a combination of built-in and custom infoType detectors
Actions
- It occurs after a DLP job completes successfully.
- 2 types of actions:
- Saving the DLP scan job results to BigQuery:
- Publishing the DLP scan job to a Pub/Sub channel.
Jobs and job triggers
- an action that Cloud DLP runs to either scan content or calculate risk of re-identification. DLP creates and runs a job resource whenever instructed.
- 2 types of Cloud DLP jobs:
- Inspection jobs – inspect content for sensitive data
- Risk analysis jobs – analyze de-identified data
- can schedule when to run jobs by creating job triggers.
- A job trigger is an event which automates DLP jobs creation
- Can be scheduled by setting intervals at which each trigger goes off.
- can be configured to look for new findings
Match likelihood
- Scan results are categorized as per likely match to infoType.
- likelihood is number of matching elements a result contains to infoType.
- DLP uses a bucketized representation of likelihood
ENUM | Description |
LIKELIHOOD_UNSPECIFIED | Default value; same as POSSIBLE. |
VERY_UNLIKELY | It is very unlikely that the data matches the given InfoType. |
UNLIKELY | Unlikely that the data matches the given InfoType. |
POSSIBLE | It is possible that the data matches the given InfoType. |
LIKELY | Likely that the data matches the given InfoType. |
VERY_LIKELY | It is very likely that the data matches the given InfoType. |
Method types
- DLP includes different methods to inspect or transform data.
- methods can
- inspect data both on and off GCP
- optimize Cloud DLP behavior for different types of workloads.
- 2 method types: Content methods and Storage methods
- Content methods
- are synchronous, stateless methods.
- data is sent directly in the request to the DLP API.
- findings or transformed data is returned in the API response.
- Request data is encrypted in transit and is not persisted.
- Storage methods
- inspect data stored on GCP
- For inspection, create a Cloud DLP job using the dlpJobs resource.
- Each job runs as a managed service to inspect data and then DLP actions.
- Jobs managed by DLP API or Cloud DLP in the Google Cloud Console.
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz