Database and Replication
Understand the concepts of Database and Replication.
- Cloud continues to drive down the cost of storage and compute
- New applications need databases to store terabytes to petabytes of storage
- new types of data is to be stored
- Access to the data with millisecond latency is needed
- process millions of requests per second
- Scale to support millions of users anywhere in the world
- Both relational and non-relational databases are needed
- AWS offers broadest range of databases.
AWS Managed database services
- removes constraints that come with licensing costs
- supports diverse database engines
- Access to the information stored on these databases is the main purpose of cloud computing.
There are three different categories of databases to keep in mind while architecting:
- Relational databases – Data is normalized into
tables and also provided with
- powerful query language
- flexible indexing capabilities
- strong integrity controls
- ability to combine data from multiple tables in a fast and efficient manner.
- Can be scaled vertically and are highly available during failovers.
- NoSQL databases– They have flexible data model
that seamlessly scales horizontally. NoSQL databases utilize a variety of data
models, including graphs, key-value pairs, and JSON documents. They offer
- ease of development
- scalable performance
- high availability
- resilience
- Data warehouse – A specialized type of
relational database, optimized for analysis and reporting of large amounts of
data. It can be used to combine transactional data from disparate sources
making them available for analysis and decision-making.
- Horizontal-scaling is usually based on the partitioning of the data i.e. each node contains only part of the data
- Vertical-scaling the data resides on a single node and scaling is done through multi-core i.e. spreading the load between the CPU and RAM resources of that machine.
Database Engines
- Amazon Aurora
- PostgreSQL
- MySQL
- MariaDB
- Oracle
- Microsoft SQL Server
Relational Database
- Relational databases store data with pre-defined schema
- relationships between them are defined
- designed for supporting ACID transactions
- maintaining referential integrity
- maintain data consistency
- Used for
- Traditional applications
- ERP
- CRM
- e-commerce
- AWS Offerings
- Amazon Aurora
- MySQL, PostgreSQL
- Amazon RDS
- MySQL, PostgreSQL, MariaDB, Oracle, SQL Server
- Amazon Redshift
Key-value databses
- Optimized to store and retrieve key-value pairs
- large data volumes with milliseconds retrieval
- without the performance overhead
- scale limitations of relational databases
- Used for
- Internet-scale applications
- real-time bidding
- shopping carts
- customer preferences
- AWS Offering
- Amazon DynamoDB
Document Databases
- Designed to store semi-structured data
- Stores documents
- Data is typically represented as a readable document
- Used for
- Content management
- Personalization
- mobile applications
- AWS Offering
- Amazon DocumentDB (with MongoDB compatibility)
In-memory Databases
- Used for applications that require real time access to data.
- By storing data directly in memory, microsecond latency is provided
- Used for
- Caching
- gaming leaderboards
- real-time analytics
- AWS Offerings:
- Amazon ElastiCache for Redis
- Amazon ElastiCache for Memcached
Graph Databases
- Used for applications needing to query and navigate relationships between highly connected, graph datasets with millisecond latency.
- Used for
- Fraud detection
- social networking
- recommendation engines
- AWS Offering
- Amazon Neptune
Time Series Databases
- Used to efficiently
- Collect
- Synthesize
- derive insights from enormous amounts of data
that changes over time (known as time-series data)
- Used for
- IoT applications
- DevOps
- industrial telemetry
- AWS Offering:
- Amazon Timestream
Ledger Databases
- Used for a centralized, trusted authority to maintain a scalable, complete and cryptographically verifiable record of transactions.
- Used for
- Systems of record
- supply chain
- registrations
- banking transactions
- AWS Offering:
- Amazon Quantum Ledger Database (QLDB)
Amazon RDS
- Expands to Relational Database Service
- Eliminates much of relational database management
- Can be scaled independently
- CPU
- memory
- storage
- IOPS
- AWS manages
- backups
- software patching
- automatic failure detection
- recovery
- Can trigger, manual or automated backups
- It provides high availability with primary instance which if fails, switch to secondary instance
- Has a soft limit of 40 Amazon RDS DB instances per account
- From 40, up to 10 can be Oracle or Microsoft SQL Server DB instances under the License Included model.
- Customers can Bring Own License (BYOL) model to have all 40 DB instances for Oracle or Microsoft SQL Server
- Supports database engines
- MySQL
- MariaDB
- PostgreSQL
- Oracle
- Microsoft SQL Server
- MySQL-compatible Amazon Aurora
- AWS IAM controls AWS resources access to Amazon RDS databases.
For security:
- put database in an Amazon Virtual Private Cloud (Amazon VPC)
- using Secure Sockets Layer (SSL) for data in transit
- Using encryption for data at rest
Further,
- RDS APIs and the AWS Management Console provide a management interface that allows you to create, delete, modify, and terminate RDS DB instances; to create DB snapshots; and to perform point-in-time restores
- There is no AWS data API for Amazon RDS.
- Once a database is created, RDS provides a DNS endpoint for the database which can be used to connect to the database.
- Endpoint does not change over the lifetime of the instance even during the failover in case of Multi-AZ configuration
- RDS leverages Amazon EBS volumes as its data store
- RDS provides database backups, which are replicated across multiple AZ’s
- Then, RDS Multi AZ’s feature synchronously replicate data between a primary RDS DB instance and a standby instance in another Availability Zone, which prevents data loss,
- RDS provides a DNS endpoint and in case of an failure on the primary, it automatically fails over to the standby instance
- Next, RDS also allows Read replicas for the supported databases, which are replicated asynchronously
- RDS Provisioned IOPS, where the IOPS can be specified when the instance is launched and is guaranteed over the life of the instance
Resource | Default Limit |
Cross-region snapshots copy requests | 5 |
DB Instances | 40 |
Event subscriptions | 20 |
Manual snapshots | 100 |
Option groups | 20 |
Parameter groups | 50 |
Read replicas per master | 5 |
Reserved instances | 40 |
Rules per DB security group | 20 |
Rules per VPC security group | 50 inbound 50 outbound |
DB Security groups | 25 |
VPC Security groups | 5 |
Subnet groups | 50 |
Subnets per subnet group | 20 |
Tags per resource | 50 |
Total storage for all DB instances | 100 TiB |
Amazon DynamoDB
- It is a fully managed NoSQL database service
- It provides fast and predictable performance with seamless scalability.
- Offload the administrative burdens of operating and scaling a distributed database
- It also offers encryption at rest
- Create database tables that can store and retrieve any amount of data
- Serve any level of request traffic.
- Scale up or down tables’ throughput capacity without downtime or performance degradation
- Use the AWS Management Console to monitor resource utilization.
- Also, provides on-demand backup capability with full backups of tables for long-term retention and archival.
- tables, items, and attributes are the core components
- A table is a collection of items
- item is a collection of attributes.
- primary keys are used to uniquely identify each item in a table
- secondary indexes provide more querying flexibility
- DynamoDB Streams capture data modification events in DynamoDB tables.
DynamoDB components
Tables
- Similar to other database systems, DynamoDB stores data in tables.
- A table is a collection of data
- Example, see the example table called People, is listed below, to store personal contact information about friends, family, or anyone else of interest. You could also have a Cars table to store information about vehicles that people drive.
Items
- Each table contains zero or more items
- An item is a group of attributes that is uniquely identifiable among all of the other items.
- In a People table, each item represents a person. For a Cars table, each item represents one vehicle.
- Items are similar in many ways to rows, or tuples in other database systems.
- There is no limit to the number of items you can store in a table.
Attributes
- Each item is composed of one or more attributes.
- It is a fundamental data element, and is not to be broken down any further.
- For example, an item in a People table contains attributes called PersonID, LastName, FirstName, and so on.
- Attributes are similar to fields or columns in other database systems.
The following diagram shows a table named People with some example items and attributes.
Note the following about the People table:
- Each item in the table has a unique identifier, or primary key, that distinguishes the item from all of the others in the table. In the People table, the primary key consists of one attribute (PersonID).
- Other than the primary key, the People table is schemaless, which means that neither the attributes nor their data types need to be defined beforehand. Each item can have its own distinct attributes.
- Most of the attributes are scalar, which means that they can have only one value. Strings and numbers are common examples of scalars.
- Some of the items have a nested attribute (Address). DynamoDB supports nested attributes up to 32 levels deep.
DynamoDB supports two different kinds of primary keys:
- Partition key – A simple primary key, composed of one attribute known as the partition key. Partition key’s value is input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. In a table that has only a partition key, no two items can have the same partition key value. The People table has a simple primary key (PersonID) to access any item in the table directly by providing it.
- Partition key and sort key – Called as a composite primary key. Has two attributes – first is the partition key, and second is the sort key. Partition key value as input to an internal hash function. The output from the hash function determines the partition (physical storage internal to DynamoDB) in which the item will be stored. All items with the same partition key value are stored together, in sorted order by sort key value.
Secondary Indexes –
- Can create one or more secondary indexes on a table.
- Query the table using an alternate key by secondary index, instead of primary key.
- DynamoDB doesn’t require index, but index give more flexibility during querying data.
- With a secondary index on a table, read data from index similar as, from the table.
DynamoDB index types
- Global secondary index – Partition key and sort key that can be different from those on the table.
- Local secondary index –Same partition key as the table, but a different sort key.
DynamoDB supports eventually consistent and strongly consistent reads.
- Eventually Consistent Reads – When you read data from a DynamoDB table, the response might not reflect the results of a recently completed write operation. The response might include some stale data. If you repeat read request after a short time, the response should return the latest data.
- Strongly Consistent Reads – When you request a strongly consistent read, DynamoDB returns a response with the most up-to-date data, reflecting the updates from all prior write operations that were successful. A strongly consistent read might not be available if there is a network delay or outage. Consistent reads are not supported on global secondary indexes (GSI).
Amazon Redshift
- It is a fast, scalable data warehouse
- Used to analyze all data across data warehouse and data lake.
- Integrates with machine learning, parallel query execution, and columnar storage.
- Setup and deploy a new data warehouse in minutes
- Run queries across petabytes in Redshift, and exabytes in data lake.
- uses columnar storage, data compression, and zone maps to reduce I/O to perform queries.
- Has a massively parallel processing (MPP) architecture to parallelize and distribute SQL operations
- stores three copies of data — all data written to a node in cluster is automatically replicated to other nodes within the cluster, and all data is continuously backed up to Amazon S3.
- Snapshots are automated, incremental, and continuous and stored for a user-defined period (1-35 days)
- Manual snapshots can be created and are retained until deleted.
- Continuously monitors health of cluster
- Automatically re-replicates data from failed drives and replaces nodes as necessary.
Three pricing components:
- data warehouse node hours – total number of hours run across all the compute node
- backup storage – storage cost for automated and manual snapshots
- data transfer
number of nodes can be easily scaled as per demand
Monitoring Amazon RDS
- Amazon RDS Events – Subscribe to Amazon RDS events to be notified when changes occur with a DB instance, DB cluster, DB snapshot, DB cluster snapshot, DB parameter group, or DB security group.
- Database log files – View, download, or watch database log files using the Amazon RDS console or Amazon RDS API actions. You can also query some database log files that are loaded into database tables.
- Amazon RDS Enhanced Monitoring — Look at metrics in real-time for the operating system that DB instance or DB cluster runs on.
- Amazon CloudWatch Metrics – Amazon RDS automatically sends metrics to CloudWatch every minute for each active database instance and cluster.
- Then, Amazon CloudWatch Alarms – Watch a single Amazon RDS metric over a specific time period, and perform actions based on the value of the metric relative to a threshold you set.
- Amazon CloudWatch Logs – MariaDB, MySQL, and Aurora MySQL enable you to monitor, store, and access database log files in CloudWatch Logs.
AWS Certified Solutions Architect Associate Free Practice TestTake a Quiz