Cloud Data Loss Prevention (DLP): GCP Data Engineer GCP

  1. Home
  2. Cloud Data Loss Prevention (DLP): GCP Data Engineer GCP

In this, we will learn about Cloud Data Loss Prevention (DLP).

What is Cloud Data Loss Prevention (DLP)?
  • helps to manage sensitive data.
  • provides fast, scalable classification and redaction for sensitive data elements
  • sensitive data elements may include
    • credit card numbers
    • names
    • social security numbers
    • phone numbers
  • Classifies data using more than 120 predefined detectors or information type (or “infoType”) detectors.
  • Can identify patterns, formats, and checksums, and contextual clues.
  • May redact data by masking, secure hashing, tokenization, bucketing, and format-preserving encryption.
  • Can define custom infoType detectors
  • Has De-identification techniques
  • Inspects a base64-encoded image for text
  • Can detect sensitive data within streams of data, structured text, images.
  • It can inspect for and redact sensitive text from an image as per defined criteria
  • Can analyze structured data for risk of being re-identified
  • Inspection: Cloud DLP inspects the submitted data for the specified intoTypes and returns the detected InfoTypes
  • Redaction: After inspection, redacts any sensitive data findings by masking them with opaque rectangles for images.
InfoType and InfoType detector
  • Is a type of sensitive data, like name, email address, telephone number,etc
  • Every infoType has a corresponding detector.
  • Used in configuration for scans to determine what to inspect for and how to transform findings.
  • InfoType names are used if showing or reporting scan results.
  • Uses infoType detectors and OCR
  • Can use a combination of built-in and custom infoType detectors

Actions

  • It occurs after a DLP job completes successfully.
  • 2 types of actions:
    • Saving the DLP scan job results to BigQuery:
    • Publishing the DLP scan job to a Pub/Sub channel.

 

Jobs and job triggers
  • an action that Cloud DLP runs to either scan content or calculate risk of re-identification. DLP creates and runs a job resource whenever instructed.
  • 2 types of Cloud DLP jobs:
    • Inspection jobs – inspect content for sensitive data
    • Risk analysis jobs – analyze de-identified data
  • can schedule when to run jobs by creating job triggers.
  • A job trigger is an event which automates DLP jobs creation
  • Can be scheduled by setting intervals at which each trigger goes off.
  • can be configured to look for new findings

 

Match likelihood
  • Scan results are categorized as per likely match to infoType.
  • likelihood is number of matching elements a result contains to infoType.
  • DLP uses a bucketized representation of likelihood
ENUM Description
LIKELIHOOD_UNSPECIFIED Default value; same as POSSIBLE.
VERY_UNLIKELY It is very unlikely that the data matches the given InfoType.
UNLIKELY Unlikely that the data matches the given InfoType.
POSSIBLE It is possible that the data matches the given InfoType.
LIKELY Likely that the data matches the given InfoType.
VERY_LIKELY It is very likely that the data matches the given InfoType.
Method types
  • DLP includes different methods to inspect or transform data.
  • methods can
    • inspect data both on and off GCP
    • optimize Cloud DLP behavior for different types of workloads.
  • 2 method types: Content methods and Storage methods
  • Content methods
    • are synchronous, stateless methods.
    • data is sent directly in the request to the DLP API.
    • findings or transformed data is returned in the API response.
    • Request data is encrypted in transit and is not persisted.
  • Storage methods
    • inspect data stored on GCP
    • For inspection, create a Cloud DLP job using the dlpJobs resource.
    • Each job runs as a managed service to inspect data and then DLP actions.
    • Jobs managed by DLP API or Cloud DLP in the Google Cloud Console.

Pass the GCP Exam Now!

Menu