Writing Hexo filters

Hexo, the static blogging system that I use is very extensible and provides numerous hooks into generation pipeline.

While working on a Russian language blog that’s coming online soon, I had the opportunity to write a filter to render Cyrillic text in a different font than the rest of the body text.

Markup filter use case

I wanted to set the Cyrillic text apart both in color, typeface, and font weight. Although I could have extended Hexo using a new tag, I decided to use a filter so that after rendering HTML anywhere on the blog, items demarcated by double pipes || would be replaced by a new <span>.

I used an npm module to deploy the filter. You can find it on the npm and at its GitHub repo

Here’s the very short code for the filter itself:

hexo.extend.filter.register('after_render:html', function(str,data) {
    var re = /(\|{2}?)((.|\n)+?)(\|{2}?)/gm;
    var result = str.replace(re,'<span class="rsb">$2</span>');
    return result;
});

The regex in the second line just identifies a block of text fenced by double pipes and replaces it with a span with the class that specifies the styling to be applied. In the future, I’d like to identify Cyrillic text with a regex and not have to use a fence at all.

Fine-tuning caching for S3-hosted static blogs using AWS CLI

Because the blogging system that I use doesn’t apply finely grained object-level caching rules, I end up with objects such as images that cache appropriately but an index.html page that does not. I don’t want client browsers to hang on to the main index.html page for more than an hour or so because it should update much more frequently than that as its content changes.

It’s possible that I could dig around under the hood of hexo and create a version that applies customized caching rules. Instead, I make a second pass over the content, adjusting the Cache-Control and other metadata according to my needs. For this task I use the Amazon Web Services command line interface AWS-CLI.

Installation

Installing the AWS CLI is straightforward. On the platform I use (OS X), it’s just:

$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

After installation, you will want to configure AWS CLS. Installing the credentials for AWS is an important step which you can do via the aws configure command:

AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: ENTER

Once installed, you can use the AWS CLI to perform a variety of options on your S3 buckets. It’s worth reading documentation to get familiar with the command structure which is very detailed.

Using AWS CLI to adjust image caching

To compute new Cache-Control header dates for the aws command, I used Python for a little script to do the job. For images, I want to maximize caching in the request/reply chain. Since images are the heaviest objects traveling on the wire, I want minimize how many of them I need to reload. So I want to set a long cache time for these objects. Here’s how I compute new dates and build up the aws command:

#!/usr/bin/python

import datetime
from dateutil.relativedelta import relativedelta
import subprocess

weeks = 2
seconds = weeks * 7 * 24 * 60 * 60

today = datetime.datetime.now()
new_date = today + relativedelta(weeks=weeks)

command = '''aws s3 cp s3://ojisanseiuchi.com/ s3://ojisanseiuchi.com/ --exclude "*" '''
command += '''--include *.jpg '''
command += '''--include *.jpg '''
command += '''--recursive '''
command += '''--metadata-directive REPLACE '''
command += '''--expires {0} '''.format(new_date.isoformat())
command += '''--acl public-read '''
command += '''--content-encoding "gzip" '''
command += '''--cache-control "public, max-age={0}"'''.format(seconds)

subprocess.call(command,shell=True)

This will build and execute the following command:

aws s3 cp s3://ojisanseiuchi.com/ s3://ojisanseiuchi.com/ --exclude "*" --include *.jpg --include *.jpg --recursive --metadata-directive REPLACE --expires 2016-04-05T11:37:16.181141 --acl public-read --content-encoding "gzip" --cache-control "public, max-age=1209600"

This will recursively manipulate the metadata for all jpg and png files in the bucket. The weeks parameter can be adjusted to any duration you would like.

Using AWS CLI to adjust the index.html caching

The main index page should get reloaded frequently. Otherwise users have no idea that the page has been changed. For this part, I’ll drop down to the lower level s3api command for illustration. Here’s the Python script to make this work:

hours = 1
seconds = hours * 60 * 60   # seconds in hours
new_date = today + relativedelta(hours=hours)
command = '''aws s3api copy-object  --copy-source ojisanseiuchi.com/index.html --key index.html --bucket ojisanseiuchi.com '''
command += '''--metadata-directive "REPLACE" '''
command += '''--expires {0} '''.format(new_date.isoformat())
command += '''--acl public-read '''
command += '''--content-type "text/html; charset=UTF-8" '''
command += '''--content-encoding "gzip" '''
command += '''--cache-control "public, max-age={0}"'''.format(seconds)

subprocess.call(command,shell=True)

When run, this will build and execute the following command:

aws s3api copy-object  --copy-source ojisanseiuchi.com/index.html --key index.html --bucket ojisanseiuchi.com --metadata-directive "REPLACE" --expires 2016-03-22T12:42:44.706536 --acl public-read --content-type "text/html; charset=UTF-8" --content-encoding "gzip" --cache-control "public, max-age=3600"

This will ensure caching only for 1 hour.

Automating the post-processing

As I’ve written before, I use Grunt to automate blogging tasks. To run the post-processing I’ve described about, I simply add it as a task in the Gruntfile.js

To initialize the post-processing task:

grunt.initConfig({
    shell: {
        fixImageCacheHeaders: {
            options: {
                stdout: true,
                execOptions: {
                    cwd: '.'
                }
            },
            command: 'python fixCacheHeaders.py'
        }
    }
    //  etc...
}

To register the task:

grunt.registerTask('deploy', ['shell:clean', 'shell:generate', 'sitemap:production', 'robotstxt:production', 's3']);
grunt.registerTask('logpre', function() {
    grunt.log.writeln('*** Fix metadata ***');
});
grunt.registerTask('logpost', function() {
    grunt.log.writeln('*** Fixed metadata ***');
})
grunt.registerTask('deployf', function() {
    grunt.task.run(['shell:clean', 'shell:generate', 'sitemap:production', 'robotstxt:production', 's3']);
    grunt.task.run('logpre');
    grunt.task.run('shell:fixImageCacheHeaders');
    grunt.task.run('logpost');
})

Now I can deploy the blog and run the post-processing using grunt deployf.

The entire metadata post-processing script is available as a gist. My updated Gruntfile.js is too.

Modern textbook design: an architecture for distraction

The design of textbooks in common use at all levels from elementary school through high school are appallingly bad. I’ve come to this conclusion after several years of carefully looking at my sons' books as they went through public middle and high school. What follows is a critique of very common design “features” in these books in reference to visual information design principles. Since I’m not a subject expert in the content of the disciplines presented, I’ll just refer to the visual design, typography and information design principles in general.

Trump, the conspiracy theorist

One of the most striking features of the GOP front-runner is his special fondness for conspiracy theories. From the (non-existent) connection between vaccines and autism to the “real culprits” behind 9/11, he shows the typical clustered endorsement of multiple conspiracy theories. The question about whether this a form of pandering or a genuinely held set of perspectives is interesting, though barely relevant. In the former case, the abandonment of reason to achieve a political goal is an egregious fault.

The spread of anger in social networks and its implications for political violence

An ingenious study using the massive Weibo network revealed insights into the spread of certain emotions through social networks. Weibo is a social network platform not unlike Twitter. It is also hugely popular in China with millions of users making it an ideal platform for understanding how emotional states between socially-connected users correlate with each other. But the highest correlation by far was among angry users. Rui and co say anger strongly influences the neighbourhood in which it appears, spreading on average by about 3 hops or degrees.

anki_tool: low level manipulation of Anki databases

Speaking of Anki, here’s a Swiss Army knife of database utilities that provides searching, moving and renaming functions from the command line. On GitHub. You can do things like this to rename and collect tags: $ anki_tool mv_tags '(dinosaur|mammal)' animal Looks cool.

JavaScript in Anki cards

[N.B. 2016-03-26 Nathan Ifill pointed out that it is possible to use Anki’s built-in conditional replacement feature to do what I’m illustrating. I’ll have to work on another example!] Anki is a widely-used flashcard application. If you’re learning a foreign language and you’re not using Anki, you should be. If you are using Anki and are picky about the appearance of the cards, you should know that JavaScript can be used in the card template.

Organizing knowledge for memorization

Memorization has a bad reputation in education today, but it underpins the abilities of all sorts of high-performing people. I often refer to this article from 1999 about how to better organize information for memorization. My favorite pieces of advice: Do not learn (memorize) if you do not understand. Stick to the minimum information principle. Use imagery Avoid sets and enumerations Use mnemonic techniques.

Observation: Facebook groups don't work

web
I’m reluctant about using Facebook. Recently I returned after a 5 year sabbatical. It seems about the same as it was when I left. But I had never really used Facebook groups before. So when a friend launched a group around a topic of interest to me, I joined enthusiastically. While watching the numbers grow quickly in the first few days, I realized what a difficult platform it is for having any kind of meaningful discussion.

Detecting Russian letters with regex

How to identify Russian letters in a string? The short answer is: [А-Яа-яЁё] but depending on your regex flavor, [\p{Cyrillic}] might work. What in the word does this regex mean? It’s just like [A-Za-z] with a twist. The Ёё at the end adds support for ё (“yo”) which is in the Latin group of characters. See this question on Stack Overflow.