Performance and latency

  1. Home
  2. Performance and latency

Go back to GCP Tutorials

In this tutorial we will understand about performance and latency.

Strategies
  • Firstly, Evaluate performance requirements. Determine the priority of your various applications and what minimum performance you require of them.
  • Secondly, Use scalable design patterns. Improve scalability and performance with autoscaling, compute choices, and storage configurations.
  • Lastly, Identify and implement cost-saving approaches. Evaluate cost for each running service while associating priority to optimize for service availability and cost.

Use autoscaling and data processing

Use autoscaling so that as load increases or decreases, the services add or release resources to match.

Compute Engine autoscaling

Managed instance groups (MIGs) let you scale your stateless apps on multiple identical VMs, so that a group of Compute Engine resources is launched based on a single template. Further, Autoscaling policies include scaling based on CPU utilization, load balancing capacity, Cloud Monitoring metrics, or, for zonal MIGs, by a queue-based workload, like Pub/Sub.

Google Kubernetes Engine autoscaling

You can use the cluster autoscaler feature in Google Kubernetes Engine (GKE) to manage your cluster’s node pool based on varying demand of your workloads. However, cluster autoscaler increases or decreases the size of the node pool automatically, based on the resource requests of Pods running on that node pool’s nodes.

Serverless autoscaling

Serverless compute options include Cloud Run, App Engine, and Cloud Functions, each of which provides autoscaling capabilities. Further, use these serverless options to scale your microservices or functions.

Data processing

Dataproc and Dataflow offer autoscaling options to scale your data pipelines and data processing. Further, use these options to allow your pipelines to access more computing resources based on the processing load.

Use GPUs and TPUs to increase performance

Google Cloud provides options to accelerate the performance of your workloads. However, you can use these specialized hardware platforms to increase your application and data processing performance.

Graphics Processing Unit (GPU)

Compute Engine provides GPUs that you can add to your virtual machine instances. However, you can use these GPUs to accelerate specific workloads on your instances such as machine learning and data processing.

Tensor Processing Unit (TPU)

A TPU is specifically designed as a matrix processor by Google for machine learning workloads. Further, TPUs are best suited for massive matrix operations with a large pipeline, with significantly less memory access.

gcp cloud architect practice tests

Identify apps to tune

Application Performance Management (APM) includes tools to help you reduce latency and cost, so that you can run more efficient applications. However, with Cloud Trace, Cloud Debugger, and Cloud Profiler, you gain insight into how your code and services function.

Instrumentation

Latency plays a big role in determining your users’ experience. When your application backend starts getting complex or you start adopting microservice architecture. Then, it becomes challenging to identify latencies between inter-service communication or identify bottlenecks.

Debugging

Cloud Debugger helps you inspect and analyze your production code behavior in real time without affecting its performance or slowing it down.

Profiling

Poorly performing code increases the latency and cost of applications and web services. Moreover, Cloud Profiler helps you identify and address performance by continuously analyzing the performance of CPU or memory-intensive functions executed across an application.

Analyze your costs and optimize

  • The first step in optimizing your cost is to understand your current usage and costs. Google Cloud provides an Export Billing to BigQuery feature that provides a detailed way to analyze your billing data. Moreover, you can connect BigQuery to Google Data Studio to perform visual analysis, or other third-party business intelligence (BI) tools like Tableau, Qlik, or Looker.
  • Secondly, sustained use discounts are automatic discounts for running specific Compute Engine resources for a significant portion of the billing month.
  • Next, committed use discounts are ideal for workloads with predictable resources needs. When you purchase a committed use contract, you purchase a certain amount of vCPUs, memory, GPUs, and local SSDs at a discounted price. However, tis is in return for committing to paying for those resources for 1 year or 3 years.
Performance and latency GCP cloud architect  online course

Reference: Google Documentation

Go back to GCP Tutorials

Menu