Databases-1

Relational Databases
A relational database contains at least one table, which you can visualize as a spreadsheet with columns and rows. In a relational database table, columns may also be called attributes, and rows may also be called records or tuples.

Columns and Attributes
Before you can add data to a relational database table, you must predefine each column’s name and what data types it can accept. Columns are ordered, and you can’t change the order after you create the table. The ordering creates a relationship between attributes in the table, which is where the term relational database comes from. Refer to Table 5.1 for an example of a relational database table containing employee records. Data must match the type defined under each column. For example, the Employee ID can’t be a letter, because it’s defined as a numeric data type. One advantage of a relational database is that you don’t have to understand up front how you’re going to query the data. As long as the data is there in a consistent format, you can craft queries to get the data you want, the way you want. This makes relational databases good for applications that need to query data in arbitrary columns and customize how that data is presented. For example, you could query the database to return the birthdate of every employee whose first name is Charlotte. You can add more columns to a table after creating it. You can also delete columns, but deleting a column entails deleting all of the data stored under the column. If you delete the First Name column, it will remove first name data for all employees in the table.

Using Multiple Tables
Storing all data in a single table can lead to unnecessary duplication, needlessly increasing the size of the database and making queries slower. Hence, it’s common for applications to use multiple related tables. Using the preceding example, if 50 employees work in the information technology department, the string “Information technology” appears in the table 50 times—once for each record. Instead of putting the department names in every employee record, you would create a single record for each department in the Departments table. In this relationship, the Departments table is the parent table , and the Employees table is the child table . Each value in the Department column in the Employees table refers to the Department ID in the Departments table. Notice that the data type for the Department column is still a string. Although you could change the data type to a number, it’s not necessary. The Department ID in the Departments table is called the primary key , and it must be unique in the table so that it can uniquely identify a row. The Employees table refers to this as a foreign key . You must defi ne primary and foreign keys so that the database knows how the columns in different tables are related. The database will enforce foreign key constraints to ensure that when a child table references a foreign key, that key also exists in the parent.

Structured Query Language
You use the Structured Query Language (SQL) with relational databases to store and query data and perform database maintenance tasks. For this reason, relational databases are often called SQL databases. SQL statements differ slightly depending on the specific relational database management system (RDBMS) you’re using. As an AWS architect, you don’t need to know SQL, as all major programming languages have libraries that construct SQL statements and interact
with the database. But when dealing with AWS-managed database offerings, you do need to understand the concepts behind a few common SQL terms.

Querying Data
The SELECT statement is used to query data from a SQL database. It allows you to query based on the value in any column, as well as specify the specific columns you want the database to return. Thanks to the predictable structure of tables and the enforcement of foreign key constraints, you can use a JOIN clause with a SELECT statement to join together data from different tables.

Storing Data
The INSERT statement allows you to insert data directly into a table. If you need to load a large number of records, you can use the COPY command to copy data from a properly formatted file into the table you specify.

Online Transaction Processing vs. Online Analytic Processing
Depending on its configuration, a relational database can fall into one of two categories: online transaction processing (OLTP) or online analytic processing (OLAP).

OLTP
OLTP databases are suited to applications that read and write data frequently, on the order of multiple times per second. They are optimized for fast queries, and those queries tend to be regular and predictable. Depending on the size of the database and its performance requirements, an OLTP database may have intense memory requirements so that it can store frequently accessed portions of tables in memory for quick access. Generally, a single server with ample memory and compute power handles all writes to an OLTP database. An OLTP database would be a good candidate for backing an online ordering system that processes hundreds of orders a minute.

OLAP
OLAP databases are optimized for complex queries against large data sets. As a result, OLAP databases tend to have heavy compute and storage requirements. In data warehousing applications, it’s common to aggregate multiple OLTP databases into a single OLAP database. For example, in an OLTP database for an employee management system, employee data may be spread out across multiple tables. At regular but infrequent intervals, a data warehouse would aggregate these tables into a single table in an OLAP database. This makes it easier to write queries against the data and reduces the amount of time it takes to process such a query. With a large OLAP database, it’s common for multiple database servers to share the computational load of complex queries. In a process called partitioning, each server gets a portion of the database for which it’s responsible.

Amazon Relational Database Service
The Amazon Relational Database Service (RDS) is a managed database service that lets you run relational database systems in the cloud. RDS takes care of setting up the database system, performing backups, ensuring high availability, and patching the database software and the underlying operating system. RDS also makes it easy to recover from database failures, restore data, and scale your databases to achieve the level of performing and availability that your application requires. To deploy a database using RDS, you start by configuring a database instance, which is an isolated database environment. A database instance exists in a virtual private cloud (VPC) that you specify, but unlike an EC2 instance, AWS fully manages database instances. You can’t SSH into them, and they don’t show up under your EC2 instances.

Database Engines
A database engine is simply the software that stores, organizes, and retrieves data in a database. Each database instance runs only one database engine. RDS offers the following six database engines to choose from: MySQL MySQL is designed for OLTP applications such as blogs and ecommerce. RDS offers the latest MySQL Community Edition versions, including 5.5, 5.6, and 5.7. MySQL offers two storage engines—MyISAM and InnoDB—but you should use the latter as it’s the only one compatible with RDS-managed automatic backups. MariaDB MariaDB is a drop-in binary replacement for MySQL. It was created over concerns about MySQL’s future after Oracle acquired the company that developed it. RDS offers many versions of MariaDB, ranging from 10.0.17 through 10.2. MariaDB supports the XtraDB and InnoDB storage engines, but AWS recommends using the latter for maximum compatibility with RDS. Oracle Oracle is one of the most widely deployed relational database management systems. Some applications expressly require an Oracle database. RDS provides the following Oracle Database editions: ■ Standard Edition One (SE1)
■ Standard Edition Two (SE2)
■ Standard Edition (SE)
■ Enterprise Edition (EE) PostgreSQL PostgreSQL advertises itself as the most Oracle-compatible open source database. This is a good choice when you have in-house applications that were developed for Oracle but want to keep costs down. RDS offers versions of PostgreSQL ranging from 9.3.12-R1 through 10.4-R1.
Amazon Aurora Aurora is Amazon’s drop-in binary replacement for MySQL and PostgreSQL. Aurora offers better write performance than both by using a virtualized storage layer that reduces the number of writes to the underlying storage. It provides three editions. ■ MySQL 5.6-compatible
■ MySQL 5.7-compatible
■ PostgreSQL compatible Depending on the edition you choose, Aurora is compatible with PostgreSQL or MySQL import and export tools and snapshots. Aurora is designed to let you seamlessly migrate from an existing deployment that uses either of those two open source databases. For MySQL-compatible editions, Aurora supports only the InnoDB storage engine. Also, the Aurora Backtrack feature for MySQL lets you, within a matter of seconds, restore your database to any point in time within the last 72 hours. Microsoft SQL Server RDS offers multiple Microsoft SQL Server versions: 2008 R2, 2012, 2014, 2016, and 2017. For the edition, you can choose Express, Web, Standard, and Enterprise. The variety of flavors makes it possible to migrate an existing SQL Server database from an on-prem deployment to RDS without having to perform any database upgrades.

Also read this topic: Introduction to Cloud Computing and AWS -1

Licensing Considerations
RDS provides two models for licensing the database engine software you run. The license included model covers the cost of the license in the pricing for an RDS instance. The bring your own license (BYOL) model requires you to obtain a license for the database engine you run.
License Included MariaDB and MySQL use the GNU General Public License (GPL) v2.0, and PostgreSQL uses the PostgreSQL license, all of which allow for free use of the respective software. All versions and editions of Microsoft SQL Server that you run on RDS include a license, as do Oracle Database Standard Edition One (SE1) and Standard Edition Two (SE2).
Bring Your Own License The following Oracle Database editions allow you to bring your own license: ■ Enterprise Edition (EE)
■ Standard Edition (SE)
■ Standard Edition One (SE1)
■ Standard Edition Two (SE2)

Database Option Groups
Database engines offer various features to help you manage your databases and improve security. Option groups let you specify these features—called options—and apply them to one or more instances. Options require more memory, so make sure your instances have ample memory and enable only the options you need. The options available for a database option group depend on the engine. Microsoft SQL Server and Oracle offer transparent data encryption (TDE), which causes the engine to encrypt data before writing it to storage. MySQL and MariaDB offer an audit plugin that lets you log user logons and queries run against your databases.

Database Instance Classes
When launching a database instance, you must decide how much processing power, memory, network bandwidth, and disk throughput it needs. RDS offers a variety of database instance classes to meet the diverse performance needs of different databases. If you get it wrong or if your needs change, you can switch your instance to a different class. RDS divides database instance classes into the following three types.

Standard
Standard instance classes meet the needs of most databases. The latest-generation instance class is db.m4, which provides up to:
■ 256 GB memory
■ 64 vCPU
■ 25 Gbps network bandwidth
■ 10,000 Mbps (1,280 MBps) disk throughput

Memory Optimized
Memory-optimized instance classes are for databases that have hefty performance requirements. Providing more memory to a database allows it to store more data in memory, which can result in faster query times. The latest-generation instance class is db.x1e, and it provides up to:
■ 3,904 GB memory
■ 128 vCPU
■ 25 Gbps network bandwidth
■ 14,000 Mbps (1,750 MBps) disk throughput
Database instances use EBS storage. Both the standard and memory-optimized instance class types are EBS-optimized, meaning they provide dedicated bandwidth for transfers to and from EBS storage.

Burst Capable (Burstable)
Burstable instances are for development, test, and other nonproduction databases. The only burstable instance class available is db.t2, and it gives you up to:
■ 32 GB memory
■ 8 vCPU
AWS indicates that the network performance is “moderate,” which in most cases corresponds to less than 1 Gbps. AWS doesn’t provide stats on disk throughput, but you shouldn’t expect to get more than 3,200 Mbps (400 MBps).

Storage
Selecting the right storage for your database instance is about more than just ensuring you have enough disk space. You also have to decide how fast the storage must be to meet the performance requirements of your database-backed application.

Understanding Input/Output Operations Per Second
AWS measures storage performance in input/output operations per second (IOPS). An input/output (I/O) operation is either a read from or write to storage. All things being equal, the more IOPS you can achieve, the faster your database can store and retrieve data. RDS allocates you a number of IOPS depending on the type of storage you select, and you can’t exceed this threshold. The speed of your database storage is limited by the number of IOPS allocated to it. The amount of data you can transfer in a single I/O operation depends on the page size that the database engine uses. To understand how many IOPS you
need, you fi rst need to understand how much disk throughput you need. MySQL and MariaDB have a page size of 16 KB. Hence, writing 16 KB of data to disk
would constitute one I/O operation. Oracle, PostgreSQL, and Microsoft SQL Server use a page size of 8 KB. Writing 16 KB of data using one of those database engines would consume two I/O operations. The larger the page size, the more data you can transfer in a single I/O operation. Assuming a 16 KB page size, suppose your database needed to read 102,400 KB (100MB) of data every second. To achieve this level of performance, your database would have to be able to read 6,400 16 KB pages every second. Because each page read counts as one I/O operation, your storage and instance class would need to be able to sustain 6,400 IOPS. Notice the inverse relationship between IOPS and page size: The larger your page size, the fewer IOPS you need to achieve the same level of throughput.

General-Purpose SSD
For most databases, general-purpose SSD (gp2) storage is sufficient. It’s fast, giving you single-digit millisecond latency. You can allocate a volume of up to 16 TB. For each gigabyte of data that you allocate to a volume, RDS allocates that volume a baseline performance of three IOPS, up to a total of 10,000 IOPS per volume. A 20 GB volume would get 60 IOPS, while a 100 GB volume would get 300 IOPS. This means that the larger your volume, the better performance you’ll get. Note that the minimum storage volume you can create depends on the database engine. For SQL Server Enterprise and Standard, it’s 200 GB, and for all others, it’s 20 GB. The maximum throughput the gp2 storage type offers is 1,280 Mbps (160 MBps). To achieve this, two things have to fall into place. First, your instance must support a disk throughput of at least that much. For example, the db.m4.4xlarge instance class has a maximum throughput of 2,000 Mbps, so it would suffice. Also, you must allocate a sufficient number of IOPS to sustain this throughput. Suppose you’re running MariaDB with a page size of 16 KB (128 Kb or 0.128 Mb). To determine the number of IOPS you’ll need to sustain 1,280 Mbps of disk throughput, you’d divide the bandwidth by the page size, as follows: 1280 Mbps/0.128 Mb = 10,000 IOPS To achieve 1,280 Mbps disk throughput to a volume, it would need 10,000 IOPS allocated. Note that this is the maximum number of IOPS possible with gp2. To achieve this many IOPS, your volume would have to be 3,333.3 GB or about 3.34 TB.
If you think you might occasionally need up to 3000 IOPS, but don’t need a lot of storage, you don’t have to over-allocate storage just to get your desired number of IOPS. Volumes smaller than 1 TB can temporarily burst to 3,000 IOPS. The duration of the burst is determined by the following formula: Burst duration in seconds = (Credit balance)/[3,000 – 3 * (storage size in GB)] When you initially boot a database instance, you get a credit balance of 5,400,000
IOPS. Anytime your instance uses IOPS above and beyond its baseline, it will dip into the credit balance. Once the credit balance is depleted, you can no longer burst. For example, with a 200 GB volume, the burst duration would be 2,250 seconds or 37.5 minutes. The credit balance is replenished at a rate of one baseline IOPS every second. For example, if you have a 200 GB volume with a baseline IOPS of 600, your credit balance increases by 600 IOPS per second, up to the maximum of 5,400,000. To get some practice with these concepts

Provisioned IOPS SSD (io1)
If you’re not excited about the rather confusing math involved with using gp2 storage, RDS provides a more straightforward option. Provisioned IOPS SSD lets you simply allocate the number of IOPS you need when you create your instance. There’s no concept of bursting in io1 storage. The number of IOPS you provision is what you get and what you pay for, whether you use it or not. This makes it useful for OLTP databases that require consistent low-latency performance. If you use a standard or memory optimized instance class, RDS guarantees that you’ll achieve within 10 percent of the provisioned IOPS for 99.9 percent of the year. That means you may get less than your specified number of IOPS for only about 2 hours and 45 minutes out of the year. Assuming that you pick a standard instance with 4,000 Mbps throughput and use a database engine with a 16 KB page size, you can achieve up to 31,250 IOPS. To get this, you should provision 32,000 IOPS when you create the instance, as you must specify provisioned IOPS in increments of 1,000. The maximum number of IOPS you can achieve and how much storage you can allocate are constrained by the database engine you select. Oracle, PostgreSQL, MariaDB, MySQL, and Aurora let you choose 100 GB to 16 TB of storage and allocate 1,000 to 40,000 provisioned IOPS. Microsoft SQL Server gives you up to 16 TB of storage and lets you choose between 1,000 and 32,000 provisioned IOPS. The ratio of storage in gigabytes to IOPS must be at least 50:1. For example, if you want 32,000 IOPS, you must provision at least 640 GB of storage.

People also ask this Questions

Leave a Comment Cancel Reply