Using AppleScript with MailTags

I’m a fan of using metadata to classify and file things rather than declarative systems of nested folders. Most of the documents and data that I store for personal use are in DEVONthink which has robust support for metadata. On the email side, there’s MailTags which lets you apply metadata to emails. Since MailTags also supports AppleScript, I began to wonder whether it might be possible to script workflows around email processing. Indeed it is, once you discover the trick of what dictionary to use.

The key is to use MailTagsHelper for the dictionary. To access the terms from that dictionary, you need to embed the code in the following block:

1
2
3
using terms from application "MailTagsHelper"
-- access MailTags properties here
end using terms from

Read More

Using AdBlock Plus to block YouTube comments

YouTube comments are some of the most offensive on the web. Even serious videos attract trolls bent on inscribing their offensiveness and cruelness on the web.

Here’s one method of dealing with YouTube comments. Treat the comments block as an advertisement and block it.[1]

1. Download AdBlock Plus

Download the AdBlock Plus extension for the browser you use and install it.

2. Create a custom ad filter

In this step you will create a filter that treats the entire comments section of a YouTube page as an advertisement.

  • Navigate to YouTube and load any video page.
  • Click on the AdBlock icon in the toolbar to bring up its contextual menu
AdBlock Plus contextual menu
  • Choose “Block an ad on this page”
  • Navigate to an area of the page just above the “COMMENTS” header where the ads are located. Once the entire ads area of the page is highlighted, click there.
Block YouTube comments
  • AdBlock will ask you to confirm the block. If it looks right to you, agree.[2]

If all goes well, you’ll have comment-free YouTube pages now.


  1. There are other ways of avoiding YouTube comments. I've used ViewPure but it's hard to find content that way even though they seem to be working on making it more seamless to get from YouTube to ViewPure.

  2. The div to be blocked is <DIV id="watch-discussion" class="branded-page-box yt-card scrolldetect" >. Don't be surprised if that changes and you need to update your filter as YouTube changes their page structure.

Introducing AnkiStats & AnkiStatsServer

The spaced repetition software system Anki is the de facto standard for foreign language vocabulary learning. Its algorithm requires lots of performance data to schedule flashcards in the most efficient way. Anki displays these statistics in a group of thorough and informative statistical graphs and descriptive text.

However, they aren’t easily available for the end-user to export. Thus, the reason behind the companion projects AnkiStats and AnkiStatsServer.

The premise is that you can run your own more extensive experiments and statistical tests on the data once you have it in hand. A bit of technical expertise is needed to get it operational but if you are up to it, clone the github repos above and go for it.

Waking the computer to allow AppleScript to run.

I have a number of AppleScript applications that need to run at odd times. These maintenance tasks often attempt to run while the computer is sleeping. Particularly those that rely on UI scripting do not function during this period.

This most flexible way of dealing with this is to manipulate the power management settings directly via the pmset(1) command.

The variety of options available using pmset is staggering and beyond the scope of this post. Here’s what I do to wake the computer up at specific times so that scheduled AppleScripts can run:

1
2
~|⇒ sudo pmset repeat wakeorpoweron MTWRFSU 12:29:00
~|⇒ sudo pmset repeat wakeorpoweron MTWRFSU 23:49:00

Now my 12:30 PM and 11:50 PM scripts will run just fine.

Edit 2016-04-18: Actually my scripts don’t run just fine because pmset does allow setting of multiple wakeorpoweron events like this. Only the last one set is retained. However, you can use the root’s crontab to do that as long as you schedule the cron event to deliver the pmset schedule before it’s needed. Here’s the idea:

1
2
3
@reboot pmset repeat wakeorpoweron MTWRFSU 23:49:00
00 12 * * * * pmset repeat wakeorpoweron MTWRFSU 12:01:00
02 12 * * * * pmset repeat wakeorpoweron MTWRFSU 23:49:00

An easier way to automate synchronization of Anki profiles with AppleScript

After waking up this morning with my mouse locked onto the Anki icon in the dock and trying to figure out how to get Activity Monitor up and running so I could force quite my Automator application that I described yesterday I figured it was back-to-the-drawing board.

I’d like to have used the Accessibility Inspector to manipulate the PyQt objects in Anki’s windows, they aren’t exposed in a may that you can script them. But System Events rules all.

When Anki launches it offers a dialog box with profiles to sync (assuming you have multiple profiles.) Using AppleScript and System Events scripting, you can drive the keyboard as it manipulates the PyQt interface. Here’s my solution. Yours may vary depending on where the profile in question lies in the list.

1
2
3
4
5
6
7
8
9
tell application "Anki" to launch
delay 2.0

tell application "System Events"
key code 125 -- down arrow to point at the profile to sync
key code 36 -- Enter key
delay 10.0 -- time to sync
key code 12 using {command down} -- quit
end tell

Much less painless than Automator.

Scheduling synchronization of Anki databases on OS X

While working on a project to automatically collect statistics on my Anki databases (stay tuned…) I worked out a system for scheduling synchronization from my desktop OS X machine.

Prerequisites

  • LaunchControl is a GUI application that lets you create and manage user services on OS X
  • Anki is a spaced repetition memorization software system

The solution relies on Automator. Normally, I don’t care much for Automator. It has too many limits on what tasks I can accomplish and workflows created with it are often fragile. However, in this case, we take advantage of its workflow recording feature. We’re going to record the process of opening Anki, selecting the profile to sync, then quitting Anki. This sequence of events ensures that the database on the local system is synchronized with the remote version.

Read More

Resizing of images for Anki with Hazel and ImageMagick

I use Anki to study foreign language vocabulary. It’s the de facto spaced repetition software for memorization.[1] When making flashcards for language learnings, I try to use imagery as much as possible. So a card may have a Russian word on one side and just an image on the opposite side. (Since I already know the English word that the image represents, why not try to engage a different part of the brain to help with memorization?)

If you use Anki on multiple devices, then synchronization is a key step. However, image size becomes a limiting factor for sync speed. Since only a small image is often necessary to convey the intended meaning, we can improve the sync efficiency by using them while not sacrificing any meaning. Bulk, efficient resizing of images for Anki cards is an important part of the process for me.

Here I’ll describe a process of automatically processing images for use on Anki cards using Hazel and ImageMagick. Sorry, PC and Linux users, this is OS X only.

Read More

Writing Hexo filters

Hexo, the static blogging system that I use is very extensible and provides numerous hooks into generation pipeline.

While working on a Russian language blog that’s coming online soon, I had the opportunity to write a filter to render Cyrillic text in a different font than the rest of the body text.

Markup filter use case

I wanted to set the Cyrillic text apart both in color, typeface, and font weight. Although I could have extended Hexo using a new tag, I decided to use a filter so that after rendering HTML anywhere on the blog, items demarcated by double pipes || would be replaced by a new <span>.

I used an npm module to deploy the filter. You can find it on the npm and at its GitHub repo

Here’s the very short code for the filter itself:

1
2
3
4
5
hexo.extend.filter.register('after_render:html', function(str,data) {
var re = /(\|{2}?)((.|\n)+?)(\|{2}?)/gm;
var result = str.replace(re,'<span class="rsb">$2</span>');
return result;
});

The regex in the second line just identifies a block of text fenced by double pipes and replaces it with a span with the class that specifies the styling to be applied. In the future, I’d like to identify Cyrillic text with a regex and not have to use a fence at all.

Fine-tuning caching for S3-hosted static blogs using AWS CLI

Because the blogging system that I use doesn’t apply finely grained object-level caching rules, I end up with objects such as images that cache appropriately but an index.html page that does not. I don’t want client browsers to hang on to the main index.html page for more than an hour or so because it should update much more frequently than that as its content changes.

It’s possible that I could dig around under the hood of hexo and create a version that applies customized caching rules. Instead, I make a second pass over the content, adjusting the Cache-Control and other metadata according to my needs. For this task I use the Amazon Web Services command line interface AWS-CLI.

Installation

Installing the AWS CLI is straightforward. On the platform I use (OS X), it’s just:

1
2
3
$ curl "https://s3.amazonaws.com/aws-cli/awscli-bundle.zip" -o "awscli-bundle.zip"
$ unzip awscli-bundle.zip
$ sudo ./awscli-bundle/install -i /usr/local/aws -b /usr/local/bin/aws

After installation, you will want to configure AWS CLS. Installing the credentials for AWS is an important step which you can do via the aws configure command:

1
2
3
4
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: us-west-2
Default output format [None]: ENTER

Once installed, you can use the AWS CLI to perform a variety of options on your S3 buckets. It’s worth reading documentation to get familiar with the command structure which is very detailed.

Using AWS CLI to adjust image caching

To compute new Cache-Control header dates for the aws command, I used Python for a little script to do the job. For images, I want to maximize caching in the request/reply chain. Since images are the heaviest objects traveling on the wire, I want minimize how many of them I need to reload. So I want to set a long cache time for these objects. Here’s how I compute new dates and build up the aws command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/python

import datetime
from dateutil.relativedelta import relativedelta
import subprocess

weeks = 2
seconds = weeks * 7 * 24 * 60 * 60

today = datetime.datetime.now()
new_date = today + relativedelta(weeks=weeks)

command = '''aws s3 cp s3://ojisanseiuchi.com/ s3://ojisanseiuchi.com/ --exclude "*" '''
command += '''--include *.jpg '''
command += '''--include *.jpg '''
command += '''--recursive '''
command += '''--metadata-directive REPLACE '''
command += '''--expires {0} '''.format(new_date.isoformat())
command += '''--acl public-read '''
command += '''--content-encoding "gzip" '''
command += '''--cache-control "public, max-age={0}"'''.format(seconds)

subprocess.call(command,shell=True)

This will build and execute the following command:

aws s3 cp s3://ojisanseiuchi.com/ s3://ojisanseiuchi.com/ --exclude "*" --include *.jpg --include *.jpg --recursive --metadata-directive REPLACE --expires 2016-04-05T11:37:16.181141 --acl public-read --content-encoding "gzip" --cache-control "public, max-age=1209600"

This will recursively manipulate the metadata for all jpg and png files in the bucket. The weeks parameter can be adjusted to any duration you would like.

Using AWS CLI to adjust the index.html caching

The main index page should get reloaded frequently. Otherwise users have no idea that the page has been changed. For this part, I’ll drop down to the lower level s3api command for illustration. Here’s the Python script to make this work:

1
2
3
4
5
6
7
8
9
10
11
12
hours = 1
seconds = hours * 60 * 60 # seconds in hours
new_date = today + relativedelta(hours=hours)
command = '''aws s3api copy-object --copy-source ojisanseiuchi.com/index.html --key index.html --bucket ojisanseiuchi.com '''
command += '''--metadata-directive "REPLACE" '''
command += '''--expires {0} '''.format(new_date.isoformat())
command += '''--acl public-read '''
command += '''--content-type "text/html; charset=UTF-8" '''
command += '''--content-encoding "gzip" '''
command += '''--cache-control "public, max-age={0}"'''.format(seconds)

subprocess.call(command,shell=True)

When run, this will build and execute the following command:

aws s3api copy-object --copy-source ojisanseiuchi.com/index.html --key index.html --bucket ojisanseiuchi.com --metadata-directive "REPLACE" --expires 2016-03-22T12:42:44.706536 --acl public-read --content-type "text/html; charset=UTF-8" --content-encoding "gzip" --cache-control "public, max-age=3600"

This will ensure caching only for 1 hour.

Automating the post-processing

As I’ve written before, I use Grunt to automate blogging tasks. To run the post-processing I’ve described about, I simply add it as a task in the Gruntfile.js

To initialize the post-processing task:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
grunt.initConfig({
shell: {
fixImageCacheHeaders: {
options: {
stdout: true,
execOptions: {
cwd: '.'
}
},
command: 'python fixCacheHeaders.py'
}
}
// etc...
}

To register the task:

1
2
3
4
5
6
7
8
9
10
11
12
13
grunt.registerTask('deploy', ['shell:clean', 'shell:generate', 'sitemap:production', 'robotstxt:production', 's3']);
grunt.registerTask('logpre', function() {
grunt.log.writeln('*** Fix metadata ***');
});
grunt.registerTask('logpost', function() {
grunt.log.writeln('*** Fixed metadata ***');
})
grunt.registerTask('deployf', function() {
grunt.task.run(['shell:clean', 'shell:generate', 'sitemap:production', 'robotstxt:production', 's3']);
grunt.task.run('logpre');
grunt.task.run('shell:fixImageCacheHeaders');
grunt.task.run('logpost');
})

Now I can deploy the blog and run the post-processing using grunt deployf.

The entire metadata post-processing script is available as a gist. My updated Gruntfile.js is too.

Modern textbook design: an architecture for distraction

The design of textbooks in common use at all levels from elementary school through high school are appallingly bad. I’ve come to this conclusion after several years of carefully looking at my sons’ books as they went through public middle and high school. What follows is a critique of very common design “features” in these books in reference to visual information design principles. Since I’m not a subject expert in the content of the disciplines presented, I’ll just refer to the visual design, typography and information design principles in general.

I’ll start with a mathematics textbook used in Canada in Grade 6, the Pearson “Math Makes Sense” text. A sample page is depicted below.

Improved page

The most obvious design abuse is the heavy graphical “fluff” on the page. When the margin is included, the top banner takes up 19% of the vertical extent of the page and its sole purpose is to identify the page as the beginning of the third lesson which is about multiples.

Nothing

Within the body of the page, the most egregious offense is an enormous photograph of a radio announcer with a speech bubble that says nothing about the mathematical concept being presented. This gratuitous figure takes up about 18% of the content area of the page. It would be a minor offense if it wasted only the paper, but it wastes a far scarcer resource - the student’s attention. The figure adds nothing to the concept that the authors are trying to present, so it should be removed. This is a textbook for 6th graders who are in no need of infantilization. Unnecessary silly graphics degrade the importance of the content and invariably lead students to conclude that the content is as unimportant as a radio call-in contest.

The sections of each lesson are demarcated by a heavy section graphics connected to bold garish leader lines. The cheap three dimensional effects, garish colors and unnecessary boldness are distracting. This is a poignant example of the structure of the content overwhelming the content itself. The section header graphics are probably meant to resemble buttons on a web page circa 1994, but printed material has a mode of consumption different from that of the web and its format should respect the difference.

The page footer is less intrusive by unnecessarily complex. There is no need for the strangely fading, amateurish blue lozenge behind the “Lesson focus.” If this is the focus of the lesson, shouldn’t the student be aware of it first? Placing the goal at the bottom of the page hides the purpose from the reader and leads him to assume that it’s busy work. The goal should be obvious before the student begins the section.

Other examples of violations of good taste in typography and color can easily be cleaned up. A more serious issue is how the authors chose to present the identification of common multiples. Look closely at the following chart:

Table errors

It depicts a “one hundred board” - a graphic that will be familiar to most students. But no key is provided. It takes a bit of detective work to figure out that the multiples of 6 are circled and the multiples of 4 are embedded in a green background. But what about the numbers with a yellow background? They mean nothing. The yellow was used gratuitously to give a splash of color to the page. But again, it’s worse than gratuitous, it could be misleading or just slow the student down in understanding the concept. We can easily reformat the table to remove extraneous signals and to apply the principle of minimum necessary difference:

Improved table

The new table is not alarmingly large. Its grid is just distinct enough to see the structure without making the figures appear to be imprisoned at Alcatraz. And the typeface is the more legible Gill Sans. Instead of using the mixed signals for cell color, I’ve used a consistent white background color with only multiples of 4 colored medium red. I don’t have a problem with the ragged edge border, although a more authentic representation would should the last row of the standard table with figures to 100. But this redesign will suffice.

By cleaning up the typography, removing gratuitous graphics, simplifying the table and linking it more logically to the text, the page has a less distracting appearance.

Improved page