Datalab Overview Google Professional Data Engineer GCP
- It is an interactive tool for large-scale data exploration, analysis, and visualization.
- based on the open source Jupyter project.
- Cloud Datalab is packaged as a container and run in a VM instance.
- uses notebooks instead of the text files containing code.
- Notebooks combines code, documentation written as markdown, and the results of code execution
- notebooks help you write code: execute in interactive and iterative manner and rendering the results
- Can share notebook with team members
- Import from flat file, databases, or distributed storage systems
- Locate and remove or modify missing or mismatched data
- Unnest complex data structures
- Identify statistical outliers in data for review and management
- Perform lookups from one dataset into another reference dataset
- Aggregate columnar data using a variety of aggregation functions
- Normalize column values for more consistent usage and statistical modeling
- Merge datasets with joins
- Append one dataset to another through union operations
- notebooks can be stored in Google Cloud Source Repository, a git repository.
- git repository is cloned onto persistent disk attached to the VM.
- Notebooks automatically saved to persistent disk periodically
- Do not delete the persistent disk.
- VM used for running Cloud Datalab is a shared resource accessible to all the members of the associated cloud project.
- Results saved in the notebook remain in persistent format on the disk.
Google Professional Data Engineer (GCP) Free Practice TestTake a Quiz