Skip to main content

Simple Storage Service-S3

AWS is revolutionizing  many aspects of the lively hood of many folks in the tech industry. Industries have aligned its processes around the many AWS services and depend on the availability and consistency of its services. As a data person , I am supper delighted and totally on board the AWS journey to a brave new world in the cloud.

For those of you who are getting on-board AWS, Simple Storage System(S3) will be one place to start the journey. I have outlined few points on S3 that can help you on your way.



  • Is Read after write consistency for PUTS for new objects
  • Is known as object based storage
  • Default S3 buckets per account  =100
  • Storage tiers

S3
Durable , Immediately available , frequently accessed
S3-IA
Durable , Immediately available , infrequently accessed
S3 - Redundancy
Reduced  redundancy storage ( data that is easily reproducible , such as thumb nails ,watermarks etc. )

99.99 vailability
Glaciar
Cheaper than 33 but more costly and time consuming to retrieve data. This should be used as the long term archival location.
  • Core fundamental of S3
    • Key
    • Value
    • Version ID
    • Metadata
    • Accesscontrol List
  • What is S3 best used for
    • For object based storage ( a file )
    • Not install an operating system; use a elastic blocked based storage to install a OS
    • Versioning of a file
Versioning can’t be disabled, only suspended.
  • Good for backups . Users block marker
  • Cross region replication
Data can’t be encrypted to take this benefit at the moment
    • Life cycle manage management
  • Edge location
    • This is where content is cached so that it can be easily accessed/loaded after they are first loaded from the Origin
    • They are not limited for read only, they could be written to them
  • Origin
    • This is the origin of all files where a CDC will be distributed, This could be a S3 bucket or a EC2  instance, elastic load balancer or route53
  • Destination
    • This is the name given the CDN which consists of a collection of edge locations
    • Web distribution - Typically used for websites
    • RTM - used for media  streaming
    • Objects are cached for a lifetime of the TTl( Time to Live)
    • You can clear cached objects , but you will be charged
  • Security
    • By default all buckets are private
    • Access control are implemented using
      • Bucket policies - at bucket level
      • Access control list
    • Logging at S3 buckets
      • Logs can be maintained at the same bucket or at different bcket
  • Encryption
    • Intransit
      • SSL/TSL
    • At rest
      • Server side encryption
        • S3 Managed Keys - SSE-S3
        • AWS key management Service, managed keys - SSE-KMS
        • Server Side Encryption with customer provided keys - SSE-C
      • Client Side encryption
  • Storage Gateways
It’s on-premise virtual appliance that cab be used to cache s3 locally at the customer site
    • File Gateways
    • Volume Gateways
      • Stored volumes - Data is stored on site due to have low latency
      • Cached volumes  - The entire data set is stored in S3 , the most frequently accessed data is cached on site
      • Gateway virtual tape library(VTL)
  • Snowball
    • Snowball
    • Snowball edge  - can run lambda functions
    • Snowmobile
  • Can use as static website

  

Comments

Popular posts from this blog

How To Execute A SQL Job Remotely

One of the clients needed its users to remotely execute a SQL job and as usual I picked this up hoping for a quick brownie point. Sure enough there was a catch and there was something to learn. Executing the job through SQLCMD was a no-brainer but getting it to execute on the remote machine was bit of challenge. On the coding Front 1    1.)     The bat file included the following code                 SQLCMD -S "[ServerName] " -E -Q "EXEC MSDB.dbo.sp_start_job @Job_Name = ' '[JobName]" 2    2.)     The Individual users were given minimum permissions  to execute the SQL job Ex. use msdb EXECUTE sp_addrolemember @rolename = 'SQLAgentOperatorRole', @membername = ' Domain\UserLogin ' At the client machine              This took a fair bit of time till our sysadmin got me an empty VM machine....

Create a dacpac To Compare Schema Differences

It's been some time since i added anything to the blog and a lot has happened in the last few months. I have run into many number of challenging stuff at Xero and spread my self to learn new things. As a start i want to share a situation where I used a dacpac to compare the differences of a database schema's. - This involves of creating the two dacpacs for the different databases - Comparing the two dacpacs and generating a report to know the exact differences - Generate a script that would have all the changes How to generate a dacbpac The easiest way to create a dacpac for a database is through management studio ( right click on the databae --> task --> Extract data-tier-application). This will work under most cases but will error out when the database has difffrent settings. ie. if CDC is enabled To work around this blocker, you need to use command line to send the extra parameters. Bellow is the command used to generate the dacpac. "%ProgramFiles...

High Watermarks For Incremental Models in dbt

The last few months it’s all been dbt. Dbt is a transform and load tool which is provided by fishtown analytics. For those that have created incremental models in dbt would have found the simplicity and easiness of how it drives the workload. Depending on the target datastore, the incremental model workload implementation changes. But all that said, the question is, should the incremental model use high-watermark as part of the implementation. How incremental models work behind the scenes is the best place to start this investigation. And when it’s not obvious, the next best place is to investigate the log after an test incremental model execution and find the implementation. Following are the internal steps followed for a datastore that does not support the merge statements. This was observed in the dbt log. - As the first step, It will copy all the data to a temp table generated from the incremental execution. - It will then delete all the data from the base table th...