Commercial (Sensitive data, personally identifiable information (PII))
In this tutorial we will learn and understand about commercial (Sensitive data, personally identifiable information (PII)).
Scan for sensitive data in just a few clicks
Preventing the exposure of sensitive data is of critical importance for many businesses—particularly those in industries like finance and healthcare. Cloud Data Loss Prevention (DLP) lets you protect sensitive data by building in an additional layer of data security and privacy into your data workloads. Further, it also provides native services for large-scale inspection, discovery, and classification of data in storage repositories like Cloud Storage and BigQuery.
Scanning Cloud Storage Buckets
Cloud Storage is a highly scalable object storage for developers and enterprises, which use it as an integral part of their applications and data workloads. These workloads can include sensitive data. This can be credit card numbers, medical information, Social Security numbers, driver’s license numbers, addresses, full names, and service account credentials. However, using Cloud DLP with your Cloud Storage repositories lets you can identify where sensitive data is stored. And then use tools to redact those sensitive identifiers. With the DLP UI in Cloud Console, you can now discover and inspect your data in a few steps.
- Firstly, define what you want to scan, such as a Cloud Storage bucket, folder, or individual file.
- Then, filter that data by adding include or exclude patterns to narrow down the files you want to inspect
- Thirdly, scale your scans, by turning on sampling to increase efficiency and reduce cost:
- Sample storage objects
- Sample bytes per object
Scanning BigQuery
BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse. It can help you analyze your company’s most critical data assets and natively delivers powerful features like business intelligence (BI)-engine and machine learning. Similar to Cloud Storage, this data may contain sensitive or regulated information such as personally identifiable information (PII). Using Cloud DLP with BigQuery can help you discover and classify this information. Here’s how.
- Firstly, define the BigQuery table you want to scan.
- Secondly, decide whether to perform an exhaustive or sampled scan…
- Thirdly, since BigQuery data is “structured” tabular data, your findings will include additional metadata such as column names. You can optionally specify an identifying field such as a row or record number so that you can pinpoint findings and map them back to your source tables.
Scanning Cloud Datastore
Cloud Datastore is a highly scalable NoSQL database for web and mobile applications. Cloud DLP enables you to inspect data stored in Datastore by simply specifying the namespace and kind
View findings and take action
Whether you want to generate detailed findings to power an audit report, conduct an investigation. Or use summary findings to trigger automated actions and alerts, it’s easy to do so from the Cloud DLP UI. However, Take action, when an inspection job is completed, Cloud DLP can automatically trigger actions.
- Firstly, Save to BigQuery: Write detailed findings.
- Secondly, Publish to Cloud Pub/Sub. Emit a pub/sub notification when a job is completed. This can trigger custom logic in, for instance, a Cloud Function.
- Thirdly, Publish to Cloud Security Command Center.
- Then, Publish to Data Catalog.
- Lastly, Notify by email. Send an email with job completion details.
Take charge of your sensitive data with the Cloud Data Loss Prevention (DLP) API
The DLP API is a flexible and robust tool that helps identify sensitive data like credit card numbers, social security numbers, names and other forms of personally identifiable information (PII). Once you know where this data lives, the service gives you the option to de-identify that data using techniques. These techniques can be redaction, masking and tokenization. Further, these features help protect sensitive data while allowing you to still use it for important business functions like running analytics and customer support operations.
Identify sensitive data with flexible predefined and custom detectors
Backed by a variety of techniques including machine learning, pattern matching, mathematical checksums and context analysis. The DLP API provides over 70 predefined detectors for sensitive data like PII and GCP service account credentials. However, you can also define your own custom types using:
- Firstly, Dictionaries — find new types or augment the predefined infotypes
- Secondly, Regex patterns — find your own patterns and define a default likelihood score
- Thirdly, Detection rules — enhance your custom dictionaries and regex patterns with rules that can boost or reduce the likelihood score based on nearby context or indicator hotwords like “banking,” “taxpayer,” and “passport.”
Native discovery for Google Cloud storage products
The DLP API has native support for data classification in Cloud Storage, Cloud Datastore and BigQuery. Just point the API at your Cloud Storage bucket or BigQuery table, and we handle the rest. The API supports:
- Firstly, Periodic scans — trigger a scan job to run daily or weekly
- Secondly, Notifications — launch jobs and receive Cloud Pub/Sub notifications when they finish; this is great for serverless workloads using Cloud Functions
- Thirdly, Integration with Cloud Security Command CenterAlpha
- Lastly, SQL data analysis — write the results of your DLP scan into the BigQuery dataset of your choice. Then, use the power of SQL to analyze your findings. You can build custom reports in Google Data Studio or export the data to your preferred data visualization or analysis system.
Integrate the DLP API into your workloads across the cloud ecosystem
The DLP API is built to be flexible and scalable, and includes several features to help you integrate it into your workloads, wherever they may be.
- Firstly, DLP templates — Templates allow you to configure and persist how you inspect your data and define how you want to transform it. You can then simply reference the template in your API calls and workloads.Further, allowing you to easily update templates without having to redeploy new API calls or code.
- Secondly, Triggers — Triggers allow you to set up jobs to scan your data on a periodic basis, for example, daily, weekly or monthly.
- Lastly, Actions — When a large scan job is done, you can configure the DLP API to send a notification with Cloud Pub/Sub. This is a great way to build a robust system that plays well within a serverless, event-driven ecosystem.
Reference: Google Documentation, Doc 2