Google Professional Data Engineer (GCP) Online Course
Google Professional Data Engineer (GCP) Online Course
This course is a really comprehensive guide to the Google Cloud Platform - it has ~20 hours of content and ~60 demos. The Google Cloud Platform is not currently the most popular cloud offering out there - that's AWS of course - but it is possibly the best cloud offering for high-end machine learning applications. That's because TensorFlow, the super-popular deep learning technology is also from Google.
Course Silent Features
- Certification stuff - Covers pretty much all of the material you ought to need to get past the Google Data Engineer and Cloud Architect certification tests
- Compute and Storage - AppEngine, Container Enginer (aka Kubernetes) and Compute Engine
- Big Data and Managed Hadoop - Dataproc, Dataflow, BigTable, BigQuery, Pub/Sub
- TensorFlow on the Cloud - what neural networks and deep learning really are, how neurons work and how neural networks are trained.
- DevOps stuff - StackDriver logging, monitoring, cloud deployment manager
- Security - Identity and Access Management, Identity-Aware proxying, OAuth, API Keys, service accounts
- Networking - Virtual Private Clouds, shared VPCs, Load balancing at the network, transport and HTTP layer; VPN, Cloud Interconnect and CDN Interconnect
- Hadoop Foundations: A quick look at the open-source cousins (Hadoop, Spark, Pig, Hive and HBase)
In this course, you will Learn and understand the following concepts deeply:-
- Deploy Managed Hadoop apps on the Google Cloud
- Build deep learning models in the cloud using TensorFlow
- Make informed decisions about Containers, VMs and AppEngine
- Use big data technologies such as BigTable, Dataflow, Apache Beam and Pub/Sub
Course Curriculum
1. Introduction
- Theory, Practice and Tests
- Why Cloud?
- Hadoop and Distributed Computing
- On-premise, Colocation or Cloud?
- Introducing the Google Cloud Platform
- Lab: Setting Up A GCP Account
- Lab: Using The Cloud Shell
2. Compute Choices
- Compute Options
- Google Compute Engine (GCE)
- More GCE
- Lab: Creating a VM Instance
- Lab: Editing a VM Instance
- Lab: Creating a VM Instance Using The Command Line
- Lab: Creating And Attaching A Persistent Disk
- Google Container Engine - Kubernetes (GKE)
- More GKE
- Lab: Creating A Kubernetes Cluster And Deploying A Wordpress Container
- App Engine
- Contrasting App Engine, Compute Engine and Container Engine
- Lab: Deploy and Run An App Engine App
3. Storage
- Storage Options
- Quick Take
- Cloud Storage
- Lab: Working With Cloud Storage Buckets
- Lab: Bucket And Object Permissions
- Lab: Life cycle Management On Buckets
- Lab: Running a Program On a VM Instance And Storing Results on Cloud Storage
- Transfer Service
- Lab: Migrating Data Using the Transfer Service
4. Cloud SQL, Cloud Spanner ~ OLTP ~ RDBMS
- Cloud SQL
- Lab: Creating A Cloud SQL Instance
- Lab: Running Commands On Cloud SQL Instance
- Lab: Bulk Loading Data Into Cloud SQL Tables
- Cloud Spanner
- More Cloud Spanner
- Lab: Working With Cloud Spanner
5. BigTable ~ HBase = Columnar Store.
- BigTable Intro
- Columnar Store
- Denormalised
- Column Families
- BigTable Performance
- Lab: BigTable demo
6. Datastore ~ Document Database
- Datastore
- Lab: Datastore demo
7. BigQuery ~ Hive ~ OLAP
- BigQuery Intro
- BigQuery Advanced
- Lab: Loading CSV Data Into Big Query
- Lab: Running Queries On Big Query
- Lab: Loading JSON Data With Nested Tables
- Lab: Public Datasets In Big Query
- Lab: Using Big Query Via The Command Line
- Lab: Aggregations And Conditionals In Aggregations
- Lab: Subqueries And Joins
- Lab: Regular Expressions In Legacy SQL
- Lab: Using The With Statement For SubQueries
8. Dataflow ~ Apache Beam
- Data Flow Intro
- Apache Beam
- Lab: Running A Python Data flow Program
- Lab: Running A Java Data flow Program
- Lab: Implementing Word Count In Dataflow Java
- Lab: Executing The Word Count Dataflow
- Lab: Executing MapReduce In Dataflow In Python
- Lab: Executing MapReduce In Dataflow In Java
- Lab: Dataflow With Big Query As Source And Side Inputs
- Lab: Dataflow With Big Query As Source And Side Inputs 2
9. Dataproc ~ Managed Hadoop
- Data Proc
- Lab: Creating And Managing A Dataproc Cluster
- Lab: Creating A Firewall Rule To Access Dataproc
- Lab: Running A PySpark Job OnDataproc
- Lab: Running ThePySpark REPL Shell And Pig Scripts On Dataproc
- Lab: Submitting A Spark Jar ToDataproc
- Lab: Working With Dataproc Using TheGCloud CLI
10. Pub/Sub for Streaming.
- Pub Sub
- Lab: Working With Pubsub On The Command Line
- Lab: Working WithPubSub Using The Web Console
- Lab: Setting Up A Pubsub Publisher Using The Python Library
- Lab: Setting Up A Pubsub Subscriber Using The Python Library
- Lab: Publishing Streaming Data IntoPubsub
- Lab: Reading Streaming Data FromPubSub And Writing To BigQuery
- Lab: Executing A Pipeline To Read Streaming Data And Write To BigQuery
- Lab: Pubsub Source BigQuery Sink
11. Datalab ~ Jupyter
- Data Lab
- Lab: Creating And Working On A Datalab Instance
- Lab: Importing And Exporting Data Using Datalab
- Lab: Using the Charting API InDatalab
12. TensorFlow and Machine Learning
- Introducing Machine Learning
- Representation Learning
- NN Introduced
- Introducing TF
- Lab: Simple Math Operations
- Computation Graph
- Tensors
- Lab: Tensors
- Linear Regression Intro
- Placeholders and Variables
- Lab: Placeholders
- Lab: Variables
- Lab: Linear Regression with Made-up Data
- Image Processing
- Images As Tensors
- Lab: Reading and Working with Images
- Lab: Image Transformations
- Introducing MNIST
- K-Nearest Neigbors as Unsupervised Learning
- One-hot Notation and L1 Distance
- Steps in the K-Nearest-Neighbors Implementation
- Lab: K-Nearest-Neighbors
- Learning Algorithm
- Individual Neuron
- Learning Regression
- Learning XOR
- XOR Trained
13. Regression in TensorFlow
- Lab: Access Data from Yahoo Finance
- Non TensorFlow Regression
- Lab: Linear Regression - Setting Up a Baseline
- Gradient Descent
- Lab: Linear Regression
- Lab: Multiple Regression in TensorFlow
- Logistic Regression Introduced
- Linear Classification
- Lab: Logistic Regression - Setting Up a Baseline
- Logit
- Softmax
- Argmax
- Lab: Logistic Regression
- Estimators
- Lab: Linear Regression using Estimators
- Lab: Logistic Regression using Estimators
14. Vision, Translate, NLP and Speech: Trained ML APIs
- Lab: Taxicab Prediction - Setting up the dataset
- Lab: Taxicab Prediction - Training and Running the model
- Lab: The Vision, Translate, NLP and Speech API
- Lab: The Vision API for Label and Landmark Detection
15. Networking
- Virtual Private Clouds
- VPC and Firewalls
- XPC or Shared VPC
- VPN
- Types of Load Balancing
- Proxy and Pass-through load balancing
- Internal load balancing
16. Ops and Security
- StackDriver
- StackDriver Logging
- Cloud Deployment Manager
- Cloud Endpoints
- Security and Service Accounts
- Auth and End-user accounts
- Identity and Access Management
- Data Protection
17. Appendix: Hadoop Ecosystem
- Introducing the Hadoop Ecosystem
- Hadoop
- HDFS
- MapReduce
- Yarn
- Hive
- Hive vs. RDBMS
- HQL vs. SQL
- OLAP in Hive
- Windowing Hive
- Pig
- More Pig
- Spark
- More Spark
- Streams Intro
- Microbatches
- Window Types