Amazon Simple Storage Service and Amazon Glacier Storage-1

S3 Service Architecture

You organize your S3 files into buckets. By default, you’re allowed to create as many as 100 buckets for each of your AWS accounts. As with other AWS services, you can ask AWS to raise that limit. Although an S3 bucket and its contents exist within only a single AWS region, the name you choose for your bucket must be globally unique within the entire S3 system. There’s some logic to this: you’ll often want your data located in a particular geographical region to satisfy operational or regulatory needs. But at the same time, being able to reference a bucket without having to specify its region simplifies the process. Here is the URL you would use to access a file called filename that’s in a bucket called bucket name over HTTP:
https://s3.amazonaws.com/bucketname/filename
Naturally, this assumes you’ll be able to satisfy the object’s permissions requirements. This is how that same file would be addressed using the AWS CLI:
s3://bucketname/filename

Prefixes and Delimiters

As you’ve seen, S3 stores objects within a bucket on a flat surface without subfolder hierarchies. However, you can use prefixes and delimiters to give your buckets the appearance of a more structured organization. A prefix is a common text string that indicates an organization level. For example, the word contracts when followed by the delimiter / would tell S3 to treat a file with a name like contracts/acme.pdf as an object that should be grouped together with a second file named contracts/dynamic.pdf. S3 recognizes folder/directory structures as they’re uploaded and emulates their hierarchical design within the bucket, automatically converting slashes to delimiters. That’s why you’ll see the correct folders whenever you view your S3-based objects through the console or the API.

Working with Large Objects

While there’s no theoretical limit to the total amount of data you can store within a bucket, a single object may be no larger than 5 TB. Individual uploads can be no larger than 5 GB. To reduce the risk of data loss or aborted uploads, AWS recommends that you use a feature called Multipart Upload for any object larger than 100 MB. As the name suggests, Multipart Upload breaks a large object into multiple smaller parts and transmits them individually to their S3 target. If one transmission should fail, it can be repeated without impacting the others. Multipart Upload will be used automatically when the upload is initiated by the AWS CLI or a high-level API, but you’ll need to manually break up your object if you’re working with a low-level API. An application programming interface (API) is a programmatic interface through which operations can be run through code or from the command line. AWS maintains APIs as the primary method of administration for each of its services. AWS provides low-level APIs for cases when your S3 uploads require hands-on customization, and it provides high-level
APIs for operations that can be more readily automated. This page contains specifics: https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html
Work through Exercise 3.1 and create your own bucket.

Encryption
Unless it’s intended to be publicly available—perhaps as part of a website—data stored on S3 should always be encrypted. You can use encryption keys to protect your data while it’s at rest within S3 and—by using only Amazon’s encrypted API endpoints for data transfers— protect data during its journeys between S3 and other locations. Data at rest can be protected using either server-side or client-side encryption.

Server-Side Encryption
The “server-side” here is the S3 platform, and it involves having AWS encrypt your data objects as they’re saved to disk and decrypt them when you send properly authenticated requests for retrieval.

You can use one of three encryption options:
■ Server-Side Encryption with Amazon S3-Managed Keys (SSE-S3), where AWS uses its own enterprise-standard keys to manage every step of the encryption and decryption process

■ Server-Side Encryption with AWS KMS-Managed Keys (SSE-KMS), where, beyond the SSE-S3 features, the use of an envelope key is added along with a full audit trail for tracking key usage. You can optionally import your own keys through the AWS KMS service.

■ Server-Side Encryption with Customer-Provided Keys (SSE-C), which lets you provide your own keys for S3 to apply to its encryption

Also read this topic: Introduction to Cloud Computing and AWS -1

Client-Side Encryption
It’s also possible to encrypt data before it’s transferred to S3. This can be done using an AWS KMS–Managed Customer Master Key (CMK), which produces a unique key for each object before it’s uploaded. You can also use a Client-Side Master Key, which you provide through the Amazon S3 encryption client. Server-side encryption can greatly reduce the complexity of the process and is often preferred. Nevertheless, in some cases, your company (or regulatory oversight body) might require that you maintain full control over your encryption keys, leaving client-side as the only option.

Logging
Tracking S3 events to log files is disabled by default—S3 buckets can see a lot of activity, and not every use case justifies the log data that S3 can generate.
When you enable logging, you’ll need to specify both a source bucket (the bucket whose activity you’re tracking) and a target bucket (the bucket to which you’d like the logs saved). Optionally, you can also specify delimiters and prefixes to make it easier to identify and organize logs from multiple source buckets that are saved to a single target bucket. S3-generated logs, which sometimes only appear after a short delay, will contain basic operation details, including the following:
■ The account and IP address of the requestor
■ The source bucket name
■ The action that was requested (GET, PUT, etc.)
■ The time the request was issued
■ The response status (including error code)
S3 buckets are also used by other AWS services—including CloudWatch and CloudTrail—to store their logs or other objects (like EBS Snapshots).

S3 Durability and Availability
S3 offers more than one class of storage for your objects. The class you choose will depend on how critical it is that the data survives no matter what (durability), how quickly you might need to retrieve it (availability), and how much money you have to spend.

Durability

S3 measures durability as a percentage. For instance, the 99.999999999 percent durability guarantee for most S3 classes and Amazon Glacier is as follows:
“. . . corresponds to an average annual expected loss of 0.000000001%
of objects. For example, if you store 10,000,000 objects with Amazon
S3, you can on average expect to incur a loss of a single object once every
10,000 years.”
Source: https://aws.amazon.com/s3/faqs
In other words, realistically, there’s pretty much no way that you can possibly lose data stored on one of the standard S3/Glacier platforms because of infrastructure failure. However, it would be irresponsible to rely on your S3 buckets as the only copies of important data. After all, there’s a real chance that a misconfiguration, account lockout, or unanticipated external attack could permanently block access to your data. And, as crazy as it might sound right now, it’s not unthinkable to suggest that AWS could one day go out of business. Kodak and Blockbuster Video once dominated their industries, right?
The high durability rates delivered by S3 are largely because they automatically replicate your data across at least three availability zones. That means that even if an entire AWS facility was suddenly wiped off the map, copies of your data would be restored from a different zone. There are, however, two storage classes that aren’t quite so resilient. Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA), as the name suggests, stores data in only a single availability zone. Reduced Redundancy Storage (RRS) is rated at only 99.99 percent durability (because it’s replicated across fewer servers than other classes). You can balance
increased/decreased durability against other features like availability and cost to get the balance that’s right for you.

Availability
Object availability is also measured as a percentage, this time though, it’s the percentage you can expect a given object to be instantly available on request through the course of a full year. The Amazon S3 Standard class, for example, guarantees that your data will be ready whenever you need it (meaning: it will be available) for 99.99% of the year. That means there will be less than nine hours each year of down time. If you feel down-time has exceeded that limit within a single year, you can apply for a service credit. Amazon’s durability guarantee, by contrast, is designed to provide 99.999999999% data protection. This means there’s practically no chance your data will be lost, even if you might sometimes not have instance access to it.

Eventually Consistent Data
It’s important to bear in mind that S3 replicates data across multiple locations. As a result, there might be brief delays while updates to existing objects propagate across the system. Uploading a new version of a file or, alternatively, deleting an old file altogether can result in one site reflecting the new state with another still unaware of any changes. To ensure that there’s never a conflict between versions of a single object—which could lead to serious data and application corruption—you should treat your data according to an Eventually Consistent standard. That is, you should expect a delay (usually just two seconds
or less) and design your operations accordingly. Because there isn’t the risk of corruption, S3 provides read-after-write consistency for the creation (PUT) of new objects.

S3 Object Lifecycle
Many of the S3 workloads you’ll launch will probably involve backup archives. But the thing about backup archives is that, when properly designed, they’re usually followed regularly by more backup archives. Maintaining some previous archive versions is critical, but you’ll also want to retire and delete older versions to keep a lid on your storage costs. S3 lets you automate all this with its versioning and lifecycle features.

Versioning
Within many file system environments, saving a file using the same name and location as a pre-existing file will overwrite the original object. That ensures you’ll always have the most recent version available to you, but you will lose access to older versions—including versions that were overwritten by mistake. By default, objects on S3 work the same way. But if you enable versioning at the bucket level, then older overwritten copies of an object will be saved and remain accessible indefinitely. This solves the problem of accidentally losing old data, but it replaces it with the potential for archive bloat. Here’s where lifecycle management can help.

Lifecycle Management
You can configure lifecycle rules for a bucket that will automatically transition an object’s storage class after a set number of days. You might, for instance, have new objects remain in the S3 Standard class for their first 30 days after which they’re moved to the cheaper One Zone IA for another 30 days. If regulatory compliance requires that you maintain older versions, your fi les could then be moved to the low-cost, long-term storage service Glacier for 365 more days before being permanently deleted.

Amazon Simple Storage Service and Amazon Glacier Storage-1

People also ask this Questions

Leave a Comment Cancel Reply