Databases-2

Magnetic Storage
RDS offers magnetic storage for backward compatibility with older instances. It’s limited to a maximum size of 4 TB and 1,000 IOPS.

Read Replicas
If your database instance doesn’t meet your performance requirements, you have a few options, depending on where the bottleneck is. As mentioned previously, if your memory, compute, network speed, or disk throughput are the issue, you can scale your database instance vertically by upgrading to a larger
instance class. Scaling vertically—also called scaling up—is a straightforward approach. You simply throw more resources at your database instance and don’t have to make any changes to your application or databases. Scaling horizontally, also known as scaling out, entails creating additional database instances called read replicas. All database engines except for Oracle and Microsoft SQL Server support read replicas. Aurora exclusively supports a specific type of read replica
called an Aurora replica. A read replica is another database instance that services only queries against the database. A read replica takes some of the query load off of the master database instance, which remains solely responsible for writing data to the database. Read replicas are useful for read-heavy applications. You can have up to five read replicas and up to fifteen Aurora replicas. Data from the master is asynchronously replicated to each read replica, meaning that there’s a delay between when the data is written to the database by the master and when it shows up on the replica. This makes read replicas unsuitable for disaster recovery. For MySQL, you can set a replication delay. When you create a read replica, RDS gives you a read-only endpoint which is a domain name that resolves only to your read replica. If you have multiple replicas, RDS will load balance the connection to one of them. If you have reporting and analysis tools that only
need to read data, you would point them to a read-only endpoint.

High Availability (Multi-AZ)
To keep your database continuously available in the event of a database instance outage, you can deploy multiple database instances in different availability zones using what RDS calls a multi-AZ deployment. In a multi-AZ deployment, you have a primary database instance in one availability zone that handles reads and writes to the database, and you have a standby database instance in a different availability zone. If the primary instance experiences an outage, it will failover to the standby instance, usually within two minutes. Here are a few possible causes for a database instance outage:
■ Availability zone outage
■ Changing a database instance type
■ Patching of the instance’s operating system
You can configure multi-AZ when you create a database instance or later. All database engines support multi-AZ but implement it slightly differently.
If you enable multi-AZ after creating your instance, you’ll experience a significant performance hit, so be sure to do it during a maintenance window.

Multi-AZ with Oracle, PostgreSQL, MariaDB, MySQL, and Microsoft SQL Server
In this multi-AZ deployment, all instances must reside in the same region. RDS synchronously replicates data from the primary to the standby instance. This replication can introduce some latency, so be sure to use EBS-optimized instances and provisioned IOPS SSD storage. The standby instance is not a read replica and cannot serve read traffic. Your application connects to the endpoint domain name for the primary. When a failover occurs, RDS changes the DNS record of the endpoint to point to the standby. The only thing your application has to do is reconnect to the endpoint. If using the bring your own license model for Oracle, you must possess a license for both the primary and standby instances. For MySQL and MariaDB, you can create a multi-AZ read replica in a different region. This lets you failover to a different region.

Multi-AZ with Amazon Aurora
Amazon Aurora handles multi-AZ a bit differently. An Amazon Aurora cluster consists of a primary instance. Aurora gives you a cluster endpoint that always points to the primary instance. An Aurora cluster also may include Aurora replicas. The primary and all replicas share a single cluster volume, which is synchronously replicated across three availability zones. This cluster volume automatically expands as needed, up to 64 TB. In the event the primary instance fails, one of two things will happen. If no Aurora replicas exist, Aurora will create a new primary instance. If an Aurora replica does exist, Aurora will promote the replica to the primary.

Backup and Recovery
RDS gives you the ability to take EBS volume snapshots of your database instances. Snapshots include all databases on the instance and are stored in S3, just like regular EBS snapshots. Snapshots are kept in multiple zones in the same region for redundancy. Taking a snapshot suspends all I/O operations for a few seconds unless you’re using multi-AZ with a database engine other than Microsoft SQL Server. Be sure to take your snapshots during off-peak times. When considering your backup and recovery needs, there are two metrics you should understand. The recovery time objective (RTO) is the maximum acceptable time to recover data and resume processing after a failure. The recovery point objective (RPO) is the maximum period of acceptable data loss. Consider your own RTO and RPO requirements when choosing your RDS backup options. When you restore from a snapshot, RDS restores it to a new instance. The time to restore a snapshot can take several minutes, depending on its size. The more provisioned IOPS you allocate to your new instance, the faster the recovery time.

Automated Snapshots
RDS can automatically create snapshots of your instances daily during a 30-minute backup window. You can customize this window or let RDS choose it for you. Because taking a snapshot impacts performance, choose a time when your database is least busy. If you let RDS select the backup window, it will randomly select a 30-minute window within an 8-hour block that varies by region. Enabling automatic backups enables point-in-time recovery, which archives database
change logs to S3 every 5 minutes. In the event of a failure, you’ll lose only up to five minutes’ worth of data. Restoring to a point-in-time can take hours, depending on how much data is in the transaction logs. RDS keeps automated snapshots for a limited period of time and then deletes them. You can choose a retention period between 1 day and 35 days. The default is 7 days. To disable automated snapshots, set the retention period to zero. Note that disabling automated snapshots immediately deletes all existing automated snapshots and disables point-in-time recovery. Also, if you change the retention period from zero to any other value, it will trigger an immediate snapshot. You can also manually take a snapshot of your database instance. Unlike automated snapshots, manual snapshots stick around until you delete them. If you delete an instance, RDS will prompt you to take a final snapshot. It will also prompt you to retain automated snapshots. RDS will keep the final snapshot and all manual snapshots. If you choose not to retain automated backups, it will immediately delete any automated snapshots.

Maintenance Items
Because RDS is a managed service, it’s the responsibility of AWS to handle patching and upgrades. AWS routinely performs these maintenance items on your database instances. Maintenance items include operating system security and reliability patches. These generally occur once every few months. Database engine upgrades also may occur during a maintenance window. When AWS begins to support a new version of a database engine, you can choose to upgrade to it. Major version upgrades may contain database changes that aren’t backward compatible. As such, if you want a major version upgrade, you must apply it manually. AWS may automatically apply minor version changes that are nonbreaking. You can determine when these maintenance tasks take place by specifying a 30-minute weekly maintenance window. The window cannot overlap with the backup window. Even though the maintenance window is 30 minutes, it’s possible for tasks to run beyond this.

Amazon Redshift
Redshift is a managed data warehouse solution designed for OLAP databases. Although it’s based on PostgreSQL, it’s not part of RDS. Redshift uses columnar storage, meaning that it stores the values for a column close together. This improves storage speed and efficiency and makes it faster to query data from individual columns. Redshift supports ODBC and JDBC database connectors. Redshift uses compression encodings to reduce the amount of size each column takes up in storage. You can apply compression manually on a column-by-column basis. Or if you use the COPY command to import data from a file into a Redshift database, Redshift will determine which columns to compress.

Compute Nodes
A Redshift cluster contains one or more compute nodes that are divided into two categories. Dense compute nodes can store up to 326 TB of data on magnetic storage. Dense storage nodes can store up to 2 PB of data on fast SSDs. If your cluster contains more than one compute node, Redshift also includes a leader
node to coordinate communication among the compute nodes, as well as to communicate with clients. A leader node doesn’t incur any additional charges.

Data Distribution Styles
Rows in a Redshift database are distributed across compute nodes. How the data is distributed depends on the distribution style. In EVEN distribution, the leader node spreads the data out evenly across all compute nodes. This is the default style. KEY distribution spreads the data according to the value in a single column. Columns with the same value are stored on the same node. In the ALL distribution, every table is distributed to every compute node.

Nonrelational (No-SQL) Databases
Nonrelational databases are designed to consistently handle tens of thousands of transactions per second. Although they can store the same data you’d find in a relational database, they’re optimized for so-called unstructured data. Unstructured data is an unfortunate term, as all data you store in any database has some structure. A more accurate description would be multistructured data. The data you store in a nonrelational database can vary in structure, and that structure can change over time. Nonrelational and relational databases have many elements in common. Nonrelational databases—also known as no-SQL databases—consist of collections that are confusingly also sometimes called tables. Within a table, you store items, which are similar to rows or tuples in a relational database. Each item consists of at least one attribute, which is analogous to a column in a SQL database. An attribute consists of a unique name called a key, a data type, and a value. Attributes are sometimes called key-value pairs.

Storing Data
One of the biggest differences between a relational and nonrelational database is that nonrelational databases are schema less and don’t require all items in a table to have the same attributes. Each item requires a primary key attribute whose value must be unique within the table. The purpose of the primary key is to uniquely identify an item and provide a value by which to sort items. Nonrelational databases are flexible when it comes to the type of data you can store. With the exception of the primary key attribute, you don’t have to define attributes when you create a table. You create attributes on the fly when you create or modify an item. These attributes are unordered and hence have no relation to each other, which is they’re called nonrelational. Nonrelational databases do not give you a way to split data across tables and then merge it together at query time. Therefore, an application will generally keep all of its data in one table. This can lead to the duplication of data, which in a large database can incur substantial storage costs.

Querying Data
The trade-off for having flexibility to store unstructured data comes in terms of being more limited in your queries. Nonrelational databases are optimized for queries based on the primary key. Queries against other attributes are slower, making nonrelational databases inappropriate for complex or arbitrary queries. Prior to creating a table, you need to understand the exact queries that you’re going to need to perform against the data. Consider the following item:

Key Type Value
Employee ID (primary key) Number 101
Department String Information technology
Last Name String Smith
First Name String Charlotte

If you wanted to list every department that has an employee named Charlotte, it would be difficult to obtain this using a nonrelational database. Because items are sorted by Employee ID, the system would have to scan through every item to locate all items that have an attribute First Name with a value of Charlotte. And because the data in each item is unstructured, it may require searching through every attribute. It would then have to determine which of these items contain a Department attribute. Such a query would be slow and computationally expensive.

Also read this topic: Introduction to Cloud Computing and AWS -1

Types of Nonrelational Databases
You may hear nonrelational databases divided into categories such as key-value stores, document- oriented stores, and graph databases. But all nonrelational databases are key-value store databases. A document-oriented store is a particular application of a nonrelational database that analyzes the contents of a document stored as a value and extracts metadata from it. A graph database analyzes relationships between attributes in different items. This is different than a relational database that enforces relationships between records. A graph database discovers these relationships in unstructured data.

DynamoDB
DynamoDB is a managed nonrelational database service that can handle thousands of reads and writes per second. It achieves this level of performance by spreading your data across multiple partitions. A partition is an allocation of storage for a table, and it’s backed by solid-state drives in multiple availability zones.

Partition and Hash Keys
When you create a table, you must specify a primary key and a data type. Because the primary key uniquely identifies an item in the table, its value must be unique within the table. There are two types of primary keys you can create. A partition key, also known as a hash key, is a primary key that contains a single value. When you use only a partition key as a primary key, it’s called a simple primary key. Good candidates for a partition key would be an email address, a unique username, or even a randomly generated identifier. A partition key can store no more than 2,048 B. A primary key can also be a combination of two values: a partition key and a sort (or range) key. This is called a composite primary key. The partition key doesn’t have to be unique, but the combination of the partition key and sort key must be unique. For example, a person’s last name could be the partition key, while the first name could be the sort key. Using this approach, you could use the following values for a composite primary key for a table:

Last name (partition key) First name (sort Key)
Lewis Clive
Lewis Warren
Williams Warren

Neither the last name Lewis nor the first name Warren is unique in the table. But combining the partition and sort keys together creates a unique primary key.
DynamoDB distributes your items across partitions based on the primary key. Using the preceding example, items with the last name Lewis would all be stored on the same partition. DynamoDB would arrange the items in ascending order by the sort key. Note that a sort key can store no more than 1,024 B. When a lot of read or write activity occurs against an item, the partition the item exists in is said to be a hot partition. This can negatively affect performance. To avoid hot partitions, try to make your partition keys as unique as possible. For example, if you’re storing log entries, using the current date as a partition key will result in a different hot partition every day. Instead, you may consider using a timestamp that changes frequently.

Attributes and Items
Each key-value pair composes an attribute, and one or more attributes make up an item. DynamoDB can store an item size of up to 400 KB, which is roughly equivalent to 50,000 English words! At a minimum, every item contains a primary key and corresponding value. When you create an attribute, you must define the data type. Data types fall into the following three categories: Scalar A scalar data type can have only one value. The string data type can store up to 400 KB of Unicode data with UTF-8 encoding. A string must always be greater than zero. The number data type stores positive or negative numbers up to 38 significant digits. DynamoDB trims leading and trailing zeros. The binary data type stores binary data in base-64 encoded format. Like the string type, it’s limited by the maximum item size to 400 KB. The Boolean data type can store a value of either true or false. The null data type is for representing an attribute with an undefined or unknown value. Oddly, it must contain a value of null. Set A set data type holds an unordered list of scalar values. The values must be unique
within a set, and a set must contain at least one value. You can create number sets, string sets, and binary sets. Document Document data types are designed to hold different types of data that fall outside the constraints of scalar and set data types. You can nest document types together up to 32 levels deep. A list document type can store an ordered collection of values of any type. For example, you could include the following in the value of a list document: Chores: [“Make coffee”, Groceries: [“milk”, “eggs”, “cheese”], “Pay bills”, Bills: [water: [60], electric: [100]]] Notice that the Chores list contains string data, numeric data, and nested lists. A map data type can store an unordered collection of key-value pairs in a format similar to JavaScript Object Notation (JSON). As with a list, there are no restrictions on the type of data you can include. The following is an example of a map that contains a nested list and a nested map:

{
Day: “Friday”,
Chores: [
“Make coffee”,
“Groceries”, {
milk: { Quantity: 1 },
eggs: { Quantity: 12 }
}
“Mow the lawn”],
}

Throughput Capacity
When creating a table, you must specify the number of reads and writes per second your application will require. This is called provisioned throughput. DynamoDB reserves partitions based on the number of read capacity units (RCUs) and write capacity units (WCUs)you specify when creating a table. When you read an item from a table, that read may be strongly consistent or eventually consistent. A strongly consistent read always gives you the most up-to-date data, while an eventually consistent read may produce stale data that does not reflect data from a recent write operation. Whether you use strongly or eventually consistent reads depends on whether your application can tolerate reading stale data. You need to understand whether you need strongly or eventually consistent reads when deciding how much throughput to provision. For an item up to 4 KB in size, one RCU buys you one strongly consistent read per second.
To read an 8 KB item every second using a strongly consistent read, you’d need two RCUs. If you use an eventually consistent read, one RCU buys you two eventually consistent reads per second. To read an 8 KB item every second using an eventually consistent read ,you’d need only one RCU. When it comes to writing data, one WCU gives you one write per second for an item up to 1 KB in size. This means if you need to write 100 items per second, each item being less
than 1 KB, you’d need to provision 100 WCUs. If you need to write 10 items per second, each item being 2 KB, then you’d need 20 WCUs. The throughput capacity you specify is an upper limit of what DynamoDB delivers. If you exceed your capacity, DynamoDB may throttle your request and yield an “HTTP 400 (Bad request)” error. AWS SDKs have built-in logic to retry throttled requests, so having a request throttled won’t prevent your application from reading or writing data, but it will slow it down.

Auto Scaling
If you’re unsure exactly how much throughput you need to provision for a table or if you anticipate your throughput needs will vary over time, you can configure Auto Scaling to automatically increase your provisioned throughput when it gets close to hitting a defined threshold. To configure Auto Scaling, you specify a minimum and maximum RCU and WCU. You also specify a desired utilization percentage. DynamoDB will automatically adjust your RCU and WCU to keep your utilization at this percentage. For example, suppose you set a utilization of 70 percent, a minimum RCU of 10, and a maximum RCU of 50. If you consume 21 RCU, Auto Scaling will adjust your provisioned capacity to around 30 RCU. If your consumption drops to 14, Auto Scaling will reduce your provisioned throughput to 20 RCU. Setting the right utilization is a balancing act. The higher you set your utilization, the more likely you are to exceed your provisioned capacity. If that happens, your requests may get throttled. On the other hand, if you set your utilization too low, you will end up paying for capacity you don’t need.

Reserved Capacity
If you need 100 or more WCU or RCU, you can purchase reserved throughput capacity to save money. You must reserve RCU and WCU separately, and you’re limited to 100,000 units of each. You have to pay a one-time fee and commit to a period of one or three years.

Reading Data
DynamoDB provides two different operations to let you read data from a table. A scan lists all items in a table. It’s a read-intensive operation and can potentially consume all of your provisioned capacity units. A query returns an item based on the value of the partition key. When performing a query, the value of the partition key you search for must exactly match that of an item. If your table contains a sort key, you may optionally query by the sort key as well. For the sort key, you have more flexibility. You can search by exact value, a value greater than or less than the key, a range of values, or the beginning of the value.

Secondary Indexes
Secondary indexes solve two issues with querying data from DynamoDB. When you query for a particular item, you must specify a partition key exactly. For example, the Author table you created earlier has LastName as the partition key and FirstName as the sort key. Secondary indexes let you look up data by an attribute other than the table’s primary key. Think of a secondary index as a copy of some of the attributes in a table. The table that the index gets its data from is called the base table. When you create a secondary index, you can choose which attributes get copied from the base table into the index. These are called projected attributes. A secondary index always includes the partition and sort key attributes from the base table. You can choose to copy just the partition and sort keys and their values, the keys plus other attributes, or everything. This lets you extract only the data you need. There are two types of secondary indexes.

Global Secondary Index
You can create a global secondary index (GSI) any time after creating a table. In a global secondary index, the partition and hash keys can be different than the base table. The same rules for choosing a primary key still apply. You want the primary key of your index to be as unique as possible. If you use a composite primary key, items having partition keys with the same value will be stored on the same partition. When reading from a global secondary index, reads are always eventually consistent. If you add an item to a table, it may not immediately get copied to the secondary index.

Local Secondary Index
A local secondary index (LSI) must be created at the same time as the base table. You also cannot delete a local secondary index after you’ve created it. The partition key must always be the same as the base table, but the sort key can be different. For example, if the base table has LastName as the partition key and FirstName as the sort key, you can create a local secondary index with a partition key of LastName and a sort key of Birthyear. Reads from a local secondary index can be strongly or eventually consistent, depending on what you specify at read time.

Upcoming Training :- ISO27001 LA T. D.:-18th, 19th, 20th & 26th, 27th April 2025

People also ask this Questions

Leave a Comment Cancel Reply