Skip to main content

Simple Storage Service-S3

AWS is revolutionizing  many aspects of the lively hood of many folks in the tech industry. Industries have aligned its processes around the many AWS services and depend on the availability and consistency of its services. As a data person , I am supper delighted and totally on board the AWS journey to a brave new world in the cloud.

For those of you who are getting on-board AWS, Simple Storage System(S3) will be one place to start the journey. I have outlined few points on S3 that can help you on your way.



  • Is Read after write consistency for PUTS for new objects
  • Is known as object based storage
  • Default S3 buckets per account  =100
  • Storage tiers

S3
Durable , Immediately available , frequently accessed
S3-IA
Durable , Immediately available , infrequently accessed
S3 - Redundancy
Reduced  redundancy storage ( data that is easily reproducible , such as thumb nails ,watermarks etc. )

99.99 vailability
Glaciar
Cheaper than 33 but more costly and time consuming to retrieve data. This should be used as the long term archival location.
  • Core fundamental of S3
    • Key
    • Value
    • Version ID
    • Metadata
    • Accesscontrol List
  • What is S3 best used for
    • For object based storage ( a file )
    • Not install an operating system; use a elastic blocked based storage to install a OS
    • Versioning of a file
Versioning can’t be disabled, only suspended.
  • Good for backups . Users block marker
  • Cross region replication
Data can’t be encrypted to take this benefit at the moment
    • Life cycle manage management
  • Edge location
    • This is where content is cached so that it can be easily accessed/loaded after they are first loaded from the Origin
    • They are not limited for read only, they could be written to them
  • Origin
    • This is the origin of all files where a CDC will be distributed, This could be a S3 bucket or a EC2  instance, elastic load balancer or route53
  • Destination
    • This is the name given the CDN which consists of a collection of edge locations
    • Web distribution - Typically used for websites
    • RTM - used for media  streaming
    • Objects are cached for a lifetime of the TTl( Time to Live)
    • You can clear cached objects , but you will be charged
  • Security
    • By default all buckets are private
    • Access control are implemented using
      • Bucket policies - at bucket level
      • Access control list
    • Logging at S3 buckets
      • Logs can be maintained at the same bucket or at different bcket
  • Encryption
    • Intransit
      • SSL/TSL
    • At rest
      • Server side encryption
        • S3 Managed Keys - SSE-S3
        • AWS key management Service, managed keys - SSE-KMS
        • Server Side Encryption with customer provided keys - SSE-C
      • Client Side encryption
  • Storage Gateways
It’s on-premise virtual appliance that cab be used to cache s3 locally at the customer site
    • File Gateways
    • Volume Gateways
      • Stored volumes - Data is stored on site due to have low latency
      • Cached volumes  - The entire data set is stored in S3 , the most frequently accessed data is cached on site
      • Gateway virtual tape library(VTL)
  • Snowball
    • Snowball
    • Snowball edge  - can run lambda functions
    • Snowmobile
  • Can use as static website

  

Comments

Popular posts from this blog

Create a dacpac To Compare Schema Differences

It's been some time since i added anything to the blog and a lot has happened in the last few months. I have run into many number of challenging stuff at Xero and spread my self to learn new things. As a start i want to share a situation where I used a dacpac to compare the differences of a database schema's. - This involves of creating the two dacpacs for the different databases - Comparing the two dacpacs and generating a report to know the exact differences - Generate a script that would have all the changes How to generate a dacbpac The easiest way to create a dacpac for a database is through management studio ( right click on the databae --> task --> Extract data-tier-application). This will work under most cases but will error out when the database has difffrent settings. ie. if CDC is enabled To work around this blocker, you need to use command line to send the extra parameters. Bellow is the command used to generate the dacpac. "%ProgramFiles...

High Watermarks For Incremental Models in dbt

The last few months it’s all been dbt. Dbt is a transform and load tool which is provided by fishtown analytics. For those that have created incremental models in dbt would have found the simplicity and easiness of how it drives the workload. Depending on the target datastore, the incremental model workload implementation changes. But all that said, the question is, should the incremental model use high-watermark as part of the implementation. How incremental models work behind the scenes is the best place to start this investigation. And when it’s not obvious, the next best place is to investigate the log after an test incremental model execution and find the implementation. Following are the internal steps followed for a datastore that does not support the merge statements. This was observed in the dbt log. - As the first step, It will copy all the data to a temp table generated from the incremental execution. - It will then delete all the data from the base table th...

The maximum number of working threads (100) are already running

The problem                 This afternoon, out of the blue, the development folks called over wanting to know why the DB server was not responding, sure enough the databases were not accessible from application and from MMS. I knew there weren’t any maintenance happening and so I logged in to the server remotely and found that the sql services were still running as usual and the services had not restarted. To my surprise, in 10-15 mins everyone was able connect to the server again.  My first thoughts were,  it would have been an issue with the network and due to the glitch the servers weren’t accessible during the  time period. Environment details : -           The sql server were on a hyper v with a single CPU and 1024 memory -           There was 80 + transaction replications setup and further 20-30 sql ser...