- Published on
S3 Glacier
- Authors
- Name
- Bowen Y
What is Amazon S3 Glacier?
Amazon S3 Glacier is a cloud storage service provided by Amazon Web Services (AWS) designed for long-term data archiving and backup. It is a part of Amazon Simple Storage Service (Amazon S3). S3 Glacier is known for its extremely low cost and durability, making it an attractive option for storing data that is infrequently accessed.
Key features of Amazon S3 Glacier include:
Cost-Effectiveness: It offers a very low cost per gigabyte of storage, which is significantly lower than typical high-availability storage. This makes it suitable for data that doesn't need to be retrieved often.
Data Durability: Amazon S3 Glacier is designed for 99.999999999% durability. It achieves this by automatically replicating data across multiple facilities and conducting regular, systematic data integrity checks.
Scalability: It allows users to store large or small amounts of data with ease. There are no limits to how much data you can store.
Security: Amazon S3 Glacier provides comprehensive security and compliance capabilities that can help meet even the most stringent regulatory requirements. It includes data encryption at rest.
Flexible Retrieval Times: S3 Glacier offers various options for data retrieval times ranging from a few minutes to several hours.
Integration with AWS Services: It is seamlessly integrated with other AWS services, allowing for various data lifecycle policies and management strategies.
S3 Glacier is commonly used for archiving data such as medical records, media archives, financial records, and any other data that must be retained over the long term but is not needed for frequent access. The service is especially beneficial for organizations that need to comply with regulatory requirements for data retention.
If the money spent on storage is not a big concern, why should I use S3 Glacier instead of S3?
If the cost is not a concern for you, there are still several scenarios where Amazon S3 Glacier might be more suitable than Amazon S3. Here are some considerations:
Long-Term Data Archiving: S3 Glacier is specifically designed for long-term data preservation. If you have data that you need to keep for years or even decades, but you don't need frequent access to it, S3 Glacier is an optimal choice. This is because it's tailored for data that is accessed infrequently but needs to be retained for long periods, like legal archives, historical data, or scientific research data.
Regulatory Compliance: Certain industries have regulatory requirements for data retention that can span several years. S3 Glacier provides features that help in meeting these compliance needs, especially in terms of data durability and immutability.
Data Durability: While both S3 and S3 Glacier offer high durability, Glacier’s design is more focused on ensuring data is preserved intact over long periods, reducing the risk of data loss.
Digital Preservation: For digital preservation purposes, where the data may not be accessed frequently but needs to be retained without alteration, S3 Glacier offers immutability options such as Glacier Vault Lock. This allows you to enforce compliance controls on the data stored in Glacier.
Backup and Disaster Recovery: In a comprehensive backup strategy or disaster recovery plan, S3 Glacier can be used for the deepest layer of backup, where the data is not expected to be accessed unless there is a significant need, such as a critical system failure.
Lower Cost of Ownership Over Time: Even if immediate costs are not a concern, the long-term cost savings of using S3 Glacier can be substantial, especially for very large datasets that do not require frequent access.
Data Egress Management: Since retrieving data from Glacier can take longer and might have additional costs compared to S3, it naturally discourages frequent data downloads, which can be a way to manage and control data egress.
In summary, if your use case involves long-term data retention, regulatory compliance, or infrequent access patterns, and you're looking for a solution that emphasizes data preservation and durability, S3 Glacier could be a more appropriate choice than standard S3, regardless of the cost differences.
How data is organized in S3 Glacier?
The organization of data in Amazon S3 Glacier is somewhat similar to Amazon S3, but there are key differences due to the nature and use-cases of the services. Here's an overview of how data is organized in S3 Glacier:
Vaults: In S3 Glacier, data is stored in "vaults," which are analogous to buckets in Amazon S3. Vaults are the primary container for storing data in Glacier. Each vault is a unique namespace within your AWS account and region.
Archives: Within these vaults, data is stored as "archives." An archive can be any data such as a photo, video, document, or even an entire directory or database. Each archive is essentially a discrete, immutable blob of data. Unlike S3, where an object can be easily overwritten or modified, in Glacier, once you upload an archive, it cannot be altered; it must be deleted and re-uploaded for any changes.
Archive IDs: Each archive is identified by a unique archive ID assigned by Glacier upon creation. Unlike S3, you don't get to assign a human-readable key to each file. This archive ID is essential for retrieving or deleting the archive later on.
No Direct, Real-Time Access: Unlike S3, where you can immediately list and access your objects, in Glacier, you don't have direct, real-time access to the archives. To retrieve data, you must first initiate a retrieval request, which can take several hours to process.
Data Retrieval Policies: Glacier supports several retrieval options, each with different costs and access times. You can choose from expedited, standard, or bulk retrievals depending on how quickly you need access to the data and how much you're willing to pay.
Metadata: When you upload data to Glacier, you can include descriptive metadata with each archive. However, this metadata is limited compared to the extensive metadata capabilities in S3.
Integration with S3 Lifecycle Policies: Although Glacier is a separate service, it's tightly integrated with S3 through lifecycle policies. You can automatically transition objects from S3 to Glacier or Glacier Deep Archive for cost-effective, long-term storage.
Security and Compliance: Like S3, Glacier offers encryption for data at rest and supports various compliance needs. However, it adds features like Vault Lock for imposing stricter compliance controls on data.
In summary, while both S3 and Glacier store data as objects (archives in the case of Glacier), Glacier is geared more towards long-term storage with less frequent access, and its design reflects this through its unique identifiers, slower retrieval times, and different access methodologies.