Because the blogging system that I use doesn’t apply finely grained object-level caching rules, I end up with objects such as images that cache appropriately but an
index.html page that does not. I don’t want client browsers to hang on to the main
index.html page for more than an hour or so because it should update much more frequently than that as its content changes.
It’s possible that I could dig around under the hood of hexo and create a version that applies customized caching rules. Instead, I make a second pass over the content, adjusting the Cache-Control and other metadata according to my needs. For this task I use the Amazon Web Services command line interface AWS-CLI.
Installing the AWS CLI is straightforward. On the platform I use (OS X), it’s just:
$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
After installation, you will want to configure AWS CLS. Installing the credentials for AWS is an important step which you can do via the
aws configure command:
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
Once installed, you can use the AWS CLI to perform a variety of options on your S3 buckets. It’s worth reading documentation to get familiar with the command structure which is very detailed.
Using AWS CLI to adjust image caching
To compute new Cache-Control header dates for the
aws command, I used Python for a little script to do the job. For images, I want to maximize caching in the request/reply chain. Since images are the heaviest objects traveling on the wire, I want minimize how many of them I need to reload. So I want to set a long cache time for these objects. Here’s how I compute new dates and build up the
This will build and execute the following command:
aws s3 cp s3://ojisanseiuchi.com/ s3://ojisanseiuchi.com/ --exclude "*" --include *.jpg --include *.jpg --recursive --metadata-directive REPLACE --expires 2016-04-05T11:37:16.181141 --acl public-read --content-encoding "gzip" --cache-control "public, max-age=1209600"
This will recursively manipulate the metadata for all jpg and png files in the bucket. The
weeks parameter can be adjusted to any duration you would like.
Using AWS CLI to adjust the
The main index page should get reloaded frequently. Otherwise users have no idea that the page has been changed. For this part, I’ll drop down to the lower level
s3api command for illustration. Here’s the Python script to make this work:
hours = 1
When run, this will build and execute the following command:
aws s3api copy-object --copy-source ojisanseiuchi.com/index.html --key index.html --bucket ojisanseiuchi.com --metadata-directive "REPLACE" --expires 2016-03-22T12:42:44.706536 --acl public-read --content-type "text/html; charset=UTF-8" --content-encoding "gzip" --cache-control "public, max-age=3600"
This will ensure caching only for 1 hour.
Automating the post-processing
As I’ve written before, I use Grunt to automate blogging tasks. To run the post-processing I’ve described about, I simply add it as a task in the
To initialize the post-processing task:
To register the task:
grunt.registerTask('deploy', ['shell:clean', 'shell:generate', 'sitemap:production', 'robotstxt:production', 's3']);
Now I can deploy the blog and run the post-processing using