Cloud Data Loss Prevention (DLP): GCP Data Engineer GCP

In this, we will learn about Cloud Data Loss Prevention (DLP).

helps to manage sensitive data.
provides fast, scalable classification and redaction for sensitive data elements
sensitive data elements may include
- credit card numbers
- names
- social security numbers
- phone numbers
Classifies data using more than 120 predefined detectors or information type (or “infoType”) detectors.
Can identify patterns, formats, and checksums, and contextual clues.
May redact data by masking, secure hashing, tokenization, bucketing, and format-preserving encryption.
Can define custom infoType detectors
Has De-identification techniques
Inspects a base64-encoded image for text
Can detect sensitive data within streams of data, structured text, images.
It can inspect for and redact sensitive text from an image as per defined criteria
Can analyze structured data for risk of being re-identified
Inspection: Cloud DLP inspects the submitted data for the specified intoTypes and returns the detected InfoTypes
Redaction: After inspection, redacts any sensitive data findings by masking them with opaque rectangles for images.

Is a type of sensitive data, like name, email address, telephone number,etc
Every infoType has a corresponding detector.
Used in configuration for scans to determine what to inspect for and how to transform findings.
InfoType names are used if showing or reporting scan results.
Uses infoType detectors and OCR
Can use a combination of built-in and custom infoType detectors

Actions

It occurs after a DLP job completes successfully.
2 types of actions:
- Saving the DLP scan job results to BigQuery:
- Publishing the DLP scan job to a Pub/Sub channel.

an action that Cloud DLP runs to either scan content or calculate risk of re-identification. DLP creates and runs a job resource whenever instructed.
2 types of Cloud DLP jobs:
- Inspection jobs – inspect content for sensitive data
- Risk analysis jobs – analyze de-identified data
can schedule when to run jobs by creating job triggers.
A job trigger is an event which automates DLP jobs creation
Can be scheduled by setting intervals at which each trigger goes off.
can be configured to look for new findings

ENUM	Description
LIKELIHOOD_UNSPECIFIED	Default value; same as POSSIBLE.
VERY_UNLIKELY	It is very unlikely that the data matches the given InfoType.
UNLIKELY	Unlikely that the data matches the given InfoType.
POSSIBLE	It is possible that the data matches the given InfoType.
LIKELY	Likely that the data matches the given InfoType.
VERY_LIKELY	It is very likely that the data matches the given InfoType.

DLP includes different methods to inspect or transform data.
methods can
- inspect data both on and off GCP
- optimize Cloud DLP behavior for different types of workloads.
2 method types: Content methods and Storage methods
Content methods
- are synchronous, stateless methods.
- data is sent directly in the request to the DLP API.
- findings or transformed data is returned in the API response.
- Request data is encrypted in transit and is not persisted.
Storage methods
- inspect data stored on GCP
- For inspection, create a Cloud DLP job using the dlpJobs resource.
- Each job runs as a managed service to inspect data and then DLP actions.
- Jobs managed by DLP API or Cloud DLP in the Google Cloud Console.