How To Determine the Size of an S3 Bucket

This one came up while working on my home network / photo management setup. I’ve set my Synology DS216+ NAS to use Cloud Sync to back up my files to an Amazon S3 bucket (see this post for some more information on using S3 for backups). The problem was it was taking a very long time, and I needed to figure out how much had transferred.

Unfortunately, Amazon has no simple mechanism for determining the size of an S3 bucket. I found a couple posts on StackOverflow showing how to do it, but they seemed overly complex.

While you can get a bucket size using several third party GUI tools, the command line approach is quick and easy. It does require the Amazon Command line Tools to be installed, and access keys generated, but once that’s done, you can quickly query Amazon for just about anything.

Here’s the command I used to determine the size of my bucket. This is on a mac:

$ aws s3 ls s3://BUCKETNAME --recursive | awk '{total+=$3} END{print "total =",total/1024/1024," MB"}'

This will give back something like:

total = 245032  MB

Voila! Time for that command can vary depending on the size of the bucket. For me, with around 20,000 photos stored, it takes about 20 seconds.

Backing up your Photos – A Cautionary Tale

A recent article appeared on Petapixel regarding a Montreal photojournalist having all his photos stolen by burglars:

A photographer’s worst nightmare just happened to a well-known photographer: on Monday, Montreal-based photojournalist Jacques Nadeau returned home to find that burglars had stolen all the photos he has taken during his life and career.

CBC News reports that Nadeau, a photojournalist for the newspaper Le Devoir, walked into his home find that five of his hard drives had been stolen.

They contained an estimated 30,000 to 50,000 photos captured over the course of his 35 year photography career.

This is a terrible story, and absolutely devastating to the photographer.  My heart goes out to him.  But we can take a lesson from this…

Embrace the Paranoia.  Always ask “What if….”

Take a look around you.  At your life, at your belongings, at things you hold dear.  Ask yourself “What would happen if this were lost or destroyed?”  If the answer is “This is irreplaceable”, then move on to the next question “How can I protect these things in a way that makes sure they’re never lost?”

For anyone in the digital world, the answer is simple.  Backups.  There are myriad sites singing the song “Always do your backups!” and “Here’s how to back up your things!”  I won’t go into detail here.  But people should extend that idea to other things of value.  Important documents.  Printed photos.  Artwork.  That doll from your youth.  Look at these things of value and be a little paranoid.  “How could this be destroyed?”  Some china inherited from a relative – is it on a shelf that can be knocked over easily?  A doll you once cuddled as a child, perhaps putting it out of reach of the dog would be a good idea?

Yea yeah, okay.  So how do YOU do it?

I’m glad you asked!  This article happened to appear while I was in the middle of backing up my photo library!

Currently, I do all my photo work in Aperture.  Apple has announced that this product is being end of lifed, so no matter what, I’ll need to do a bunch of work migrating photos.  I keep my photo library on an external 1TB USB3 drive, and I’m acutely aware of how fragile that is.  Hard drives fail constantly, and having all my eggs in one basket is never a good idea.  The challenge is, photo libraries are BIG.  Hundreds of gigabytes of data.  If I were to try to back up my Aperture library onto DVD+R DS (the largest ‘consumer level’ long term storage medium available at 17G per disc),  I’d need 31 some odd discs.  That’s too many, and cumbersome as heck to work with.

I considered Dropbox, Box.net, Google drive, and Amazon Drive, but I feel these are targeted at a desktop user who just wants a drive out in the cloud.  While I use Dropbox extensively for making photos available to customers, it’s sync mechanism is quite tricky if what you’re storing on Dropbox is much larger than what you can store locally.  I’m also not confident these systems will last, unchanged and accessible, for the long term.  Google, in particular, has a dreadful record for keeping products and offerings available for the long run.

In the end I decided to use a pretty technical solution:  Amazon S3 storage.

Backing up to S3 and Glacier

Amazon has a bulk storage system called S3, coupled with a ‘long term storage’ system called Glacier.  S3 is in essence a big storage bucket where you can drop files and retrieve them at will.  Glacier allows you to take S3 elements and put them in, as you might guess from the name, ‘Cold storage’.  The costs for S3 storage is extremely low ($0.0240 per GB per month, or for my 600G of photo data, about $14/mo).  If I move those files into Glacier, it drops to $6/mo.  The difference is that restoring data from Glacier may not be immediate – it may take a few hours for your files to be available.  For this sort of long term storage, that’s fine by me!

This is not as cheap as current offerings from Amazon Prime (Unlimited storage free with Prime and Amazon Drive).  But I’m still very skeptical of the ‘drive’ offerings from the big players.  Everyone is trying to get into the “cloud drive” market with custom clients and apps.  My storage needs are exceedingly simple.  About 300 very large files (copies of each of my photo projects).  S3 is extremely well established, and used widely in the industry.

With S3, to back up my library, I go through these (for me) straightforward steps:

  • In Aperture, I select a project, and say “Export to library”.  I locate that library on my external drive.  This is an exact copy of my original masters / RAW images, as well as all the ‘versions’ I may have created (all in JPG form).  It’s also including metadata and Aperture edit notes.  While I know Aperture is not long for the world, I at least have things backed up.  This results in a directory that contains a mini ‘apilibrary’ containing all my files.
  • From the command line, I make a ‘tgz’ of that directory.  This compresses the directory down into a single file.  If I were so inclined, I could do this on the Mac just by selecting the directory and choosing ‘Compress’ – that will create a .zip file containing the entire library.
  • Next, I copy the file up to S3.  Because I’m a super-geek, I do this right on the command line using my Amazon credentials I created a while back.  If you’re a GUI person, you can use any number of S3 clients for the mac or PC.  For me, I do:
    aws s3 cp 2014-09-23\ CA\ Over\ 15k.aplibrary.tgz s3://daveshevettphotos/ --profile personal

After some time (some of the libraries are quite large.  A 25gig wedding archive took 85 minutes to upload) I have an offsite backup of that photo library!  Hurray!  At any point I can go to the Amazon S3 console and put these files into Glacier for long term storage, or download them as needed.

I realize this process is not for everyone.  I share here to simply raise awareness that in the modern age, many of our most important things are stored in an ephemeral, easily lost way.  Take the time to look around and see what you could lose if something were to happen.  Something as simple as your laptop being stolen,  a broken water pipe, or even a home fire.  Always ask.. “What if…”

How not to compliment a photographer

Not too long ago an acquaintance of mine asked if I would do them a favor and come  photograph their event.  No problem, I enjoy shooting, and any chance to work is an opportunity to improve my skill.   I went to the event, spent a few hours taking pictures, and had a great interaction with everyone.  Later on I sat down and did all my post processing, tuning, and polishing – a process that can take hours, depending on the size of the shoot and the complexity of the imagery.  

Zach at arisia
Zach at Arisia
This particular event wasn’t that difficult, and I ended up with several dozen shots I was pretty happy with.   I published the pictures and sent the link out.  Over the next day or two, I got good feedback from the event coordinator and several attendees.

One message I got was simply this…

“These pictures are beautiful!  That sure is a great camera!”

Needless to say, this pushed my buttons.

If you’re a photographer, and  understand why this statement could be irritating, feel free to skip the following rant.

In the modern age of high pixel count cell phones, cheap high resolution point and shoot cameras, and “entry level” DSLRs, even the simplest, auto-everything, “shoot and post” pictures can come out looking great.   But whether you get a good picture or not with these tools alone is, frankly, luck.  Sure, you could get a great picture – but that’s mostly the result of chance.  Please don’t assume that’s what I do.

I am a photographer (among other things).  I spend a lot of time thinking about framing, light, setting, angles, subjects, and timing.  When I take pictures, sure, I take zillions (a typical hour or two shoot can result in 500+ exposures).  But to me a photographers’ art consists of an end to end process that may take days.   The camera is one of the tools in that process, but saying things like “that sure is a great camera!” while it may be true, really diminishes the work that goes into creating really good imagery.

So folks, next time you see a picture by someone you know is a photographer, compliment them on the picture, or better yet, on their skill, not on the camera.