The Reliability Pillar-1

Calculating Availability
A common way of quantifying and expressing reliability is in terms of availability. Availability is the percentage of time an application is working as expected. Note that this measurement is subjective, and “working as expected” implies that you have a certain expectation of how the application should work. Generally, you want application availability to be very high, 99 percent or greater.

Availability Differences in Traditional vs. Cloud-Native Applications
As an AWS architect you must understand how application design decisions affect reliability. The applications you run in the cloud will fall into one of two broad categories: traditional or cloud-native.

Traditional Applications
Traditional applications are those written to run on and use the capabilities of traditional Linux or Windows servers. To deploy such an application on AWS, you’ll need to run it on one or more EC2 instances. If the application uses a database, you’ll either run and manage your own database software on an EC2 instance or use the AWS-managed equivalent. If your application requires a relational database, you can use Amazon Relational Database Service (RDS). If it uses a nonrelational database such as Redis, you can use ElastiCache for Redis. Traditional applications operate the same whether they’re running on the cloud or in a data center. If you can “lift and shift” an application from the data center to the cloud without making any changes to the application’s code, then it’s a traditional application. Suppose you have a traditional application running on a single EC2 instance backed by a multi-availability zone (AZ) RDS deployment. The application’s availability depends on the availability of both the EC2 instance and the RDS instance. Both of these are called hard dependencies. To calculate the total availability of the application, you simply multiply the availability of those hard dependencies. AWS advertises the availability of the EC2 service in a region as 99.99 percent, and multi-AZ RDS as 99.95 percent. If you multiply these availabilities together (.9999 × .9995), you get 99.94 percent, which is about 5 hours and 15 minutes of downtime per year. To increase availability, you can use redundant components. Let’s start with EC2. Suppose that instead of one EC2 instance, you use two EC2 instances each in a different availability zone. Each EC2 instance runs the application, and an application load balancer (ALB) distributes connections to the instances in a target group . If the application on one instance fails or if the EC2 instance itself fails, the ALB will route connections to the remaining healthy instance. In this sense, the EC2 instances are redundant. To calculate the availability of these redundant components, you take 100 percent minus the product of the instance failure rate. If the availability of an EC2 instance is 99.99 percent then the failure rate of that instance is .01 percent. To calculate the availability for two EC2 instances, it would be as follows:
100% – (0.01% × 0.01%) = 99.999999%
This is less than one second! But that’s just for EC2. You also have to consider the database. The database represents a hard dependency, so you must multiply the database’s availability—99.95 percent—
by the availability of the EC2 instances—99.999999 percent. The product would be about 99.949 percent, which is about 4 hours and 22 minutes of downtime a year. That’s not a big improvement. Notice that because the RDS instance is a hard dependency and its availability is 99.95 percent, you won’t be able to do better than this level of availability without introducing a redundant database. Later in this chapter, we’ll show you how to do that.

Cloud-Native Applications
Cloud-native applications are written to use the resources of a specific cloud platform like AWS. These may be serverless applications written to use Lambda functions. Or they may run on EC2 instances and store objects in S3 or use DynamoDB instead of a relational database for storing data. This is especially likely to be true if the application requires low-latency access to data stored in a nonrelational data format such as JavaScript Object Notation (JSON). DynamoDB is also a popular choice for storing session state data. Suppose you have a Linux application that runs on a single EC2 instance—just as a traditional app would—but is designed to use DynamoDB for database storage. In a single region, DynamoDB has an availability of 99.99 percent. Again, EC2 also has an availability
of 99.99 percent. So considering just these two factors, the total application availability is as follows:
99.99% × 99.99% = 99.98%
This is about 1 hour and 45 minutes of downtime per year. Much better! But suppose you want even greater availability. Instead of running one EC2 instance,
you run two, each in a different AZ. Again, you’ll use ALB to distribute traffic to these instances. To calculate the availability of these instances, you’d do this:
100% – (0.01% × 0.01%) = 99.999999%
To get the total application availability, multiply 99.999999 percent by the availability of DynamoDB—99.99 percent —which would give you 99.989 percent, or 57 minutes of downtime per year.
Believe it or not, you can make this even better by using two regions. In one region, you would still have two instances, each in a different AZ, and an ALB to distribute traffic to the application instances. But now you replicate this setup in a different region and then use a Route 53 weighed routing policy to distribute traffic between the two regions. Now, with a total of four instances, your calculation would be like this:
100% – (.01% × .01% × .01% × .01%)
This is about 16 nines (99.99999999999999 percent). Since you’d be running the application across regions, you could take advantage of DynamoDB global tables, which replicate your DynamoDB tables across regions. DynamoDB global tables have an availability of 99.999 percent. So, to calculate the total availability for the application, you’d multiply 99.99999999999999 percent by 99.999 percent to get about 99.9989 percent, or 5 minutes of downtime per year. Being able to achieve this level of availability is one of the reasons organizations choose to use cloud-native applications instead of just running traditional applications in the cloud.

Building Serverless Applications with Lambda
Although as an AWS architect you don’t need to know how to program, it’s important to understand how serverless applications that run on Lambda differ from executables that run on EC2 instances. Lambda allows you to create a function written in one of a variety of languages, including the following:
■✓ C#
■✓ Go
■✓ Java
■✓ JavaScript
■✓ PowerShell
■✓ Python
Lambda functions are useful for performing intermittent tasks that don’t warrant keeping a running EC2 instance around. For example, you can create a Lambda function that retrieves an image from an S3 source bucket, resizes it, and then uploads the processed image to a destination bucket. When a new image is uploaded to the source bucket, S3 triggers the Lambda function. You’re billed only for the time the function runs. Lambda executes functions using a highly available distributed computing platform. Unlike an EC2 instance, which can fail or be stopped or terminated, Lambda is always available.

Know Your Limits
Although one of the big selling points of the cloud is the ability to grow with you, cloud capacity is not unlimited. AWS imposes limits to prevent anyone from accidentally or intentionally consuming all resources, effectively resulting in a denial of service for other customers. The limits depend on the service and include things such as network throughput, S3 PUT requests per second, the number of instances per region, the number of elastic IP addresses per region, and so on. Many of these limits can be increased upon request.
Use the AWS Trusted Advisor to see what service limits apply to your account. You may also consider setting CloudWatch Alarms to let you know when you’re getting close to hitting a limit so you can react to avoid hitting it. Avoiding hitting a limit may involve requesting a limit increase by contacting AWS support, adding another availability zone, or even shifting some of the load to another region.

Increasing Availability
The availability numbers you’ve looked at thus far are based only on the availability of AWS services. But the actual availability of your application could be worse. There are many things that can impact availability. For example, your app can crash, perhaps because of a bug, memory leak, or data corruption.
The best way to maximize availability is to avoid failure in the first place. Instead of having a single monster instance hosting a web application, have multiple smaller instances and spread them out across different availability zones. This way, a failure of a single instance or even an entire availability zone won’t render your application unavailable. Rather than depending on one instance to be highly available, distributed application design distributes the work across multiple smaller resources that are not dependent on one another. But that’s not the whole story. If an instance fails, it means the other instances have to pick up the slack. If enough instances fail, only a few instances will be servicing a lot of traffic, resulting in poor performance and likely in those remaining instances also crashing. Therefore, you need a way to re-create failed instances when such a crash occurs. Another issue that could arise is that increased demand could place such a load on your instances that they become unusably slow, or even crash altogether. An advantage of using a distributed application design is that it makes it easier to add capacity nondisruptively. Instead of having to upgrade to a more powerful instance class, which requires downtime, you simply add more instances. When you have a distributed system, getting more resources just involves scaling out—adding more of the same.

EC2 Auto Scaling
The EC2 Auto Scaling service offers a way to both avoid application failure and recover from it when it happens. Auto Scaling works by provisioning and starting on your behalf a specified number of EC2 instances. It can dynamically add more instances to keep up with increased demand. And when an instance fails or gets terminated, Auto Scaling will automatically replace it. EC2 Auto Scaling uses either a launch configuration or a launch template to automatically configure the instances that it launches. Both perform the same basic function of defining the basic configuration parameters of the instance as well as what scripts (if any) run on it at launch time. Launch configurations have been around longer and are more familiar to you if you’ve been using AWS for a while. You’re also more likely to encounter them if you’re going into an existing AWS environment. Launch templates are newer and are what AWS now recommends. You’ll learn about both, but which you use is up to you.

Also read this topic: Introduction to Cloud Computing and AWS -1

Launch Configurations
When you create an instance manually, you have to specify many configuration parameters, including an Amazon Machine Image (AMI), instance type, SSH key pair, security group, instance profile, block device mapping, whether it’s EBS optimized, placement tenancy, and User Data, such as custom scripts to install and configure your application. A launch configuration is essentially a named document that contains the same information you’d provide when manually provisioning an instance. You can create a launch configuration from an existing EC2 instance. Auto Scaling will copy the settings from the instance for you, but you can customize them as needed. You can also create a launch configuration from scratch. Launch configurations are for use only with EC2 Auto Scaling, meaning you can’t manually launch an instance using a launch configuration. Also, once you create a launch configuration, you can’t modify it. If you want to change any of the settings, you have to create an entirely new launch configuration.

Launch Templates
Launch templates are similar to launch configurations in that you can specify the same settings. But the uses for launch templates are more versatile. You can use a launch template with Auto Scaling, of course, but you can also use it for spinning up one-off EC2 instances or even creating a Spot fleet.
Launch templates are also versioned, allowing you to change them after creation. Any time you need to make changes to a launch template, you create a new version of it. AWS keeps all versions, and you can then flip back and forth between versions as needed. This makes it easier to track your launch template changes over time.e.

Auto Scaling Groups
An Auto Scaling group is a group of EC2 instances that Auto Scaling manages. When creating an Auto Scaling group, you must first specify either the launch configuration or launch template you created. When you create an auto scaling group, you must specify how many running instances you want Auto Scaling to provision and maintain using the launch configuration or template you created. Specifically, you have to specify the minimum and maximum size of the Auto Scaling group. You may also optionally set the desired number of instances you want Auto Scaling to provision and maintain.
Minimum Auto Scaling will ensure the number of healthy instances never goes below the minimum. If you set this to zero, Auto Scaling will not spawn any instances and will terminate any running instances in the group.
Maximum Auto Scaling will make sure the number of healthy instances never exceeds this. This might seem strange, but remember that AWS imposes service limits on how many instances you can run simultaneously. Setting your maximum to less than or equal to your limit ensures you never exceed it.
Desired capacity The desired capacity is an optional setting that must lie within the minimum and maximum values. If you don’t specify a desired capacity, Auto Scaling will launch the number of instances as the minimum value. If you specify a desired capacity, Auto Scaling will add or terminate instances to stay at the desired capacity. For example, if you set the minimum to 1, maximum to 10, and desired capacity to 4, then Auto Scaling will create 4 instances. If one of those instance gets terminated—for example, because of human action or a host crash—Auto Scaling will replace it to maintain the desired capacity of 4. In the web console, desired capacity is also called the group size.

Specifying an Application Load Balancer Target Group
If you want to use an application load balancer to distribute traffic to instances in your Auto Scaling group, just plug in the name of the ALB target group when creating the Auto Scaling group. Whenever Auto Scaling creates a new instance, it will automatically add it to the ALB target group.

Health Checks Against Application Instances
When you create an Auto Scaling group, Auto Scaling will strive to maintain the minimum number of instances, or the desired number if you’ve specified it. If an instance becomes unhealthy, Auto Scaling will terminate and replace it. By default, Auto Scaling determines an instance’s health based on EC2 health checks.
Recall from Chapter 7, “CloudTrail, CloudWatch, and AWS Config,” that EC2 automatically performs system and instance status checks. These checks monitor for instance problems such as memory exhaustion, filesystem corruption, or an incorrect network or startup configuration, and system problems that require AWS involvement to repair. Although these checks can catch a variety of instance and host-related problems, they won’t necessarily catch application-specific problems.
If you’re using an application load balancer to route traffic to your instances, you can configure health checks for the load balancer’s target group. Target group health checks can check for HTTP response codes from 200 to 499. You can then configure your Auto Scaling group to use the results of these health checks to determine if an instance is healthy.
If an instance fails the ALB health check, it will route traffic away from the failed instance, ensuring users don’t reach it. At the same time, Auto Scaling will remove the instance, create a replacement, and add the new instance to the load balancer’s target group. The load balancer will then route traffic to the new instance.

Auto Scaling Options
Once you create an Auto Scaling group, you can leave it be and it will continue to maintain the minimum or desired number of instances indefinitely. However, maintaining the current number of instances is just one option. Auto Scaling provides several other options to scale out the number of instances to meet demand.

Manual Scaling
If you change the minimum, desired, or maximum values at any time after creating the group, Auto Scaling will immediately adjust. For example, if you have the desired capacity set to 2 and change it to 4, Auto Scaling will launch two more instances. If you have four instances and set the desired capacity to 2, Auto Scaling will terminate two instances. Think of the desired capacity as a thermostat!

Dynamic Scaling Policies
Most AWS-managed resources are elastic; that is, they automatically scale to accommodate increased load. Some examples include S3, load balancers, Internet gateways, and network address translation (NAT) gateways. Regardless of how much traffic you throw at them, AWS is responsible for ensuring they remain available while continuing to perform well. But when it comes to your EC2 instances, you’re responsible for ensuring that they’re powerful and plentiful enough to meet demand. Running out of instance resources—be it CPU utilization, memory, or disk space—will almost always result in the failure of whatever you’re running on it. To ensure that your instances never become overburdened, dynamic scaling policies automatically provision more instances before they hit that point. Auto Scaling generates the following aggregate metrics for all instances within the group:
■ Aggregate CPU utilization
■ Average request count per target
■ Average network bytes in
■ Average network bytes out
You’re not limited to using just these native metrics. You can also use metric filters to extract metrics from CloudWatch logs and use those. As an example, your application may generate logs that indicate how long it takes to complete a process. If the process takes too long, you could have Auto Scaling spin up new instances. Dynamic scaling policies work by monitoring a CloudWatch alarm and scaling out—by increasing the desired capacity—when the alarm is breaching. There are three dynamic scaling policies to choose from: simple, step, and target tracking.

Upcoming Training :- ISO27001 LA T. D.:-13th, 14th, 15th & 21st , 22nd June 2025.

The Reliability Pillar-1

People also ask this Questions

Leave a Comment Cancel Reply