Queries Google Professional Data Engineer GCP

  1. Home
  2. Queries Google Professional Data Engineer GCP
  • BigQuery supports two types of queries:
    • Interactive queries
    • Batch queries
  • By default, BigQuery runs interactive queries, or query is executed as soon as possible.
  • BigQuery queues each batch query on behalf and starts the query as soon as idle resources are available
  • Run queries by
    • Query editor in the Cloud Console
    • Compose Query option in the BigQuery web UI
    • BigQuery command-line tool’s bq query command
    • BigQuery REST API to programmatically call the jobs.query or query-type jobs.insert methods
    • BigQuery client libraries
  • Query execution involves root servers which interpret the query and send it to intermediate servers, which orchestrate the execution to many leaf servers. Leaf servers read from disk and execute.
  • Analytical window functions:
    • Aggregate – sum, count
    • Navigation – lead, lag
    • Ranking, numbering – rank, cume_dist
    • “Partition by” is similar to “group by” but doesnt aggregate. This is different from bq partitions how data is stored.
  • Data Types – struct, array, timestamp, int64, float64, string
  • Inner table can be using WITH
  • ARRAY_AGG – creates array. UNNEST – break array.
  • STRUCT – creates struct
  • User defined functions – sql udf as well as javascript udfs is possible
  • Udf has constraints – size of udf output is limited, native javascript not supported
  • Unnest – takes an array and returns table

 

Query jobs

  • Jobs are actions that BigQuery runs on behalf to
    • load data
    • export data
    • query data
    • copy data.
  • With Cloud Console, the classic BigQuery web UI, or the CLI for above jobs, a job resource is automatically created, scheduled, and run.
  • If create a job programmatically, BigQuery schedules and runs the job for you.
  • jobs execute asynchronously
  • jobs can be polled for their status.

Saving and sharing queries

  • If save a query, it can be
    • private (visible only to you)
    • shared at the project level (visible to project members)
    • public (anyone can view it).
Menu