Extracting mp3 file from web page with Python and ApplesScript

As I’ve mentioned before I use Anki extensively to memorize and practice Russian vocabulary. With language learning in particular, adding spoken pronunciations to the cards makes an enormous difference. Since I use Open Russian extensively to provide information to built my Anki cards, it’s a natural source of audio data, too. To optimize my learning time, I built two small scripts to grab and rename the audio files from the Open Russian site. First, I’ll describe my workflow.

My vocabulary workflow

Each morning, I pull 6 words from the a Russian word frequency list to add to my Anki deck. With each word, I use Open Russian to look up the complete definition, example sentences, syllabic stress, and other pieces of information that go on the flashcard. To facilitate OpenRussian.org opening in its own dedicate browser window, I built a Fluid application out of it. Having common workflow-related sites like this in their own dedication applications makes a lot of sense for task isolation.

Finally, for many words, I like to extract the audio from the site and add it to the card that I’m building. It turns out to be a cumbersome step because the audio doesn’t play in a QuickTime or other player that allows me to save the file. The source sound files can be downloaded from Shtooka but this is yet another step. This is where my enhanced workflow comes in.

What should the enhanced workflow do?

Optimally, I should be able to grab the URL that is displayed in the Open Russian Fluid application. Using the content of that page, I should be able to obtain the URL of the mp3 file for that word and save it to the desktop using the Russian word as the filename.

The solution

First is a Python application that grabs the URL from the Fluid app, extracts the audio file URL, and downloads it to the desktop.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#!/usr/bin/python
# -*- coding: utf-8 -*-

import re
import urllib2
import urlparse
from os.path import expanduser, normpath, basename, join

""" Obtain the URL from the OpenRussian application,
which is just a Fluid browser application.
If obtaining URL from Safari:
scpt = '''
tell application "Safari"
set theURL to URL of current tab of window 1
end tell'''
"""

def getOpenRussianURL():
from subprocess import Popen, PIPE


scpt = '''
tell application "OpenRussian"
set theURL to URL of browser window 1
end tell'''


p = p = Popen(['osascript'], stdin=PIPE, stdout=PIPE, stderr=PIPE)
stdout, stderr = p.communicate(scpt)
return stdout

""" Extract the audio file mp3 from
the content of the OpenRussian.org page.
"""

def audioURL(html):
m = re.search("<audio.+(http.+mp3)", html)
return m.group(1)

def saveMP3(url,path):
mp3file = urllib2.urlopen(url)
with open(path,'wb') as output:
output.write(mp3file.read())

""" Fetch mp3 to which aURL points and save
it to the Desktop using the word as the filename
"""

def fetchMP3(aURL):
response = urllib2.urlopen(aURL)
content = response.read()

url = audioURL(content)
path = join(expanduser("~"),'Desktop',basename(normpath(url)))
saveMP3(url, path)

url = getOpenRussianURL()
fetchMP3(url)

To make this even faster, I assigned the script to a Quicksilver keystroke trigger. It’s that simple. Once little twist that I discovered was that difficulty in launching a Python application from a Quicksilver trigger. Although there must be an easier way, I haven’t found it. Instead, I just wrote an AppleScript that runs the application in question and I used that as the triggered script in Quicksilver:

1
2
3
4
5
6
7
8
9
10
11
12
--
-- Created by: Alan Duncan
-- Created on: 2016-11-05
--
-- Copyright (c) 2016 Ojisan Seiuchi
-- All Rights Reserved
--

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

do shell script "/Users/alan/Documents/dev/scripts+tools/fetchOpenRussianMP3.py"

There may be a way to finish the process and add this to the Anki card in one step. I’ll have to work on that.


How to tell if you're being pandered to

chameleon.jpg

You might be the subject of political pandering if:

  1. Fear, uncertainty, and doubt are the main tricks in the politician’s kit.

    A politician who never tires of scapegoating a feared group, or a feared outcome is undoubtedly pandering. Or a demagogue. Or both. Whether it’s Mexicans, or Jews, or Muslims, or gay people, they never seem to stop talking about why you should be afraid of someone or something.

    Or they intentionally raise doubts around the edges of established facts. Donald Trump, for example, continues to plant seeds of doubt about President Obama’s birthplace, years after proof has been established.

    The cure, of course, is to turn the doubt around 180° and ask for data and context. You aren’t likely to be mowed down by a Islamic extremist. You’re more likely to succumb to cardiovascular disease because you’re inactive, smoke, and don’t eat a proper diet. If a politician makes a claim without citing data, you should disregard what they say and look it up. Recently a jetliner crashed in the Mediterranean Sea enroute from Paris to Cairo. Within 24 hours, Trump was on record as saying the crash was an act of terror. Anyone with an ounce of sense knows that accident investigations are lengthy fact-gathering, hypothesis-testing, and data-analyzing procedures. Short-circuiting these fact-checking exercises is a timely and convenient tool of the panderer.

    The vaccine against pandering is readily available. It’s a vaccine of the mind. Read opposing points of view. Seek primary data. Understand how the branches of government work. Look for sources of bias. Be skeptical about everything. Every. Single. Thing.

  2. The politician appeals to commonality of religious belief

    Indisputably one of the foundational principles of the U.S. is religious liberty. The First Amendment explicitly protects free exercise of religious belief. But it also protects the integrity of government by ensuring that the government is not a tool of religion. The Establishment Clause is widely-understood to prevent the use of government through its authority to promote religious belief and practice.

    Religious belief is a private affair. Most religious groups gather in a communal practice but in an official sense, they are private, not public groups. Politicians who go out of their way to emphasize their religious affiliations are almost certainly pandering. Adherence to many different creeds brings people to do good things. We’re better off demanding to know exactly what a politician has done in the public sphere and what he or she intends to do in the future than about what church they attend.

    The panderer can also turn this sort of pandering around in the ugliest sort of way by scapegoating and denigrating particular religious groups. Sometimes, though, this isn’t pandering but honest hateful demagoguery.

    We should demand that politicians base their arguments in the broadest, most foundational terms. The proscription against baseless harm to others, for example, is common to practically all cultures. Let’s make sure that our appeals to goodness, fairness, and justice appeal to those ideals in human terms.

  3. There is an incoherence between stated positions and documented actions

    Honest people exhibit a coherence between what they say and what they do. They don’t go out of their way to create an image, let alone one that differs wildly from their easily observed actions. But panderers are crafty. With some, the gulf between their public works and their language is vast. Often there is even an incoherence between statements they make. Most of us in day-to-day conversation use language in such a way that our stated principles pervade our speech. Not so with the panderer where inconsistencies of all sorts abound.

  4. The politician claims to be misunderstood

    Occasionally the panderer is caught in an inconsistency. That’s the way deception works. The internet has a long memory. A common escape is to claim that he was misunderstood. It is more likely, however, that the message was simply shifting to suit the audience.

  5. If you really, really like a candidate

    There are some political candidates with whom we identify because of ideological similarities, or some other factor. Before committing to a candidate we should look again for opposing data-driven viewpoints, sources of bias, and other ways in which we might be influenced through personal appeal.

Pandering is a pervasive tool of the political trade. There’s also a fine line separating the genuine effort to put language into the right context for the audience from the purposeful manipulation of group through deception. By applying a few heuristics, panderers aren’t hard to uncover.

Well that has a familiar ring to it

gun.jpg

The U.S. has become well-rehearsed in its response to mass shootings. An event. The pondering over terrorism vs. generalized craziness. The outpouring of prayers and support. Then the internet outrage. And more internet outrage. More meme pictures about guns and love. More color-your-profile picture trends. Empty scripted responses from pious politicians. A week or two, then back to our regularly scheduled programming.

News flash: this isn’t getting better. It’s not going to get better.

Why?

  1. We focus on single causal factors.

    It’s the guns. No, it’s evil people/crazy people/bad guys with guns. Stop with it. Just stop. Have you people never heard of the fallacy of the excluded middle? The false dilemma?

    Only a hair-splitting fool would claim that guns have nothing to do with a crime in which a gun was used to kill people. Likewise it takes a different kind of fool to claim that guns have everything to do with it. (Fortunately, there aren’t too many of the latter. But of the former…)

    We’re stand no chance of reducing these incidents if we don’t think systemically. (See, I didn’t say stopping these incidents.)

  2. As a species, our default programming seems to make us conflate mythology with truth.

    This is the part that everyone is too polite to do. The simpleton nationalists grab their giant brush and paint every Muslim a terrorist. The liberals claim make the equally ridiculous claim that Islam is the religion of peace. Remember that fallacy of the excluded middle? How about Islam is the religion of both peace and violence?

    The problem isn’t Islam. It’s the conflation of mythology with truth. Remember there was a time when people were dead serious about Zeus up there on Mt. Olympus hurling lightning bolts? It wasn’t a mythology back then. It was the real deal. Now it’s a new “real deal.” It’s a new improved monotheistic flavor of Zeus and his buddies toying with us.

    Most of us are too polite to say it. Here’s the questions we should be asking: “What do you claim?”, “How, exactly do you know that?”, “Do you have any corroborating evidence?”, “On what grounds do your claims create an exception to the general prohibition against harm to others?” We ought to be willing to say: “Gee, that sounds an awful lot like nonsense to me. Perhaps you’d be willing to explain that logically to me.” The Trumpsters go astray by attacking people. People are just instances of a bigger problem.

    Our recorded history is just too short to see a lot of evolution on this front. Maybe in another 10,000 years or so.

  3. Gun ownership is baked into the U.S. Constitution.

    This is a tough nut to crack because the weapons of choice for mass killers are a coddled, protected entity in the U.S. And for some reason, we can’t admit in large enough numbers that like the entire Constitution, the Second Amendment is purposely ambiguous.[1] It’s ambiguous because the authors of the Constitution imagined we would sit down and have reasonable debates and compromise over some foundational principle. But we can’t get beyond “Ban all guns.” and “There should be no restrictions on gun ownership.”

    We worship the Founders like secular heroes. But I’m calling the Second Amendment a bone-head move on their part. No, I’m not for banning guns in the U.S. There’s a giant middle ground here. And no, dear conspiracy theorists, High Priests of the NRA, and other lunatics, Obama’s not coming for your guns.

So let the grief turn to internet outrage. Let the impotent internet outrage turn to nothing. I’ve lost my idealism about any of this. I’m not changing my Facebook picture. I’m not posting any meme pictures of politicians saying stuff about guns or terrorists. It’s all empty.


  1. Perhaps you don't agree that it's ambiguous. Then what does the "a well-regulated militia" have to do with private gun ownership?

EC: An Environment Canada data plugin for Indigo

Environment Canada

Indigo is a well-known home automation controller software package for Mac OS X. I’ve written a plugin for Indigo 6 that allows you to create a virtual weather station from Environment Canada data. If you live in Canada, this will be a useful way of using weather data in your Indigo rules. For example, you could use wind and temperature data to adjust your irrigation schedule.

You can download the plugin from its git repo. After downloading the files, you’ll just need to configure them as a plugin. To do this, create a new folder and rename it EC.indigoPlugin. Copy the Contents folder that you just downloaded. Right-click on the EC.indigoPlugin bundle and Show Package Contents. Paste the Contents folder here. To install in Indigo, double-click the bundle file.

Using Python and AppleScript to get notified if a site is down

I manage a handful of websites, like this one. Having built a few on other platforms, such as Drupal, I’m familiar with the dreaded error “The website encountered an unexpected error. Please try again later.” On sites that I don’t check on frequently, it can be an embarrassment when people begin emailing you with questions about the site being down.

I wrote the following Python script to deal with the problem:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
#!/usr/bin/python

import urllib
from subprocess import Popen, PIPE

RECIPIENT = "your.recipient@me.com"
URL_TO_CHECK = "http://www.example.com"
ERR_MSG = "Your website is down."

def sendMessage(message):
scpt = '''
tell application "Messages" to send "{0}" to buddy "{1}" of (service 1 whose service type is iMessage)
'''.format(message,RECIPIENT)

args = []
p = Popen(['osascript', '-'] + args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
stdout, stderr = p.communicate(scpt)

try:
fh = urllib.urlopen(URL_TO_CHECK)
except IOError:
sendMessage(ERR_MSG)
else:
# handle database type errors from Drupal sites
site_content = fh.read()
target_str = "The website encountered an unexpected error. Please try again later."
if site_content.find(target_str) != -1:
sendMessage(ERR_MSG)
else:
print "No error"

I run this as a scheduled job using launchd and as long as I have a Messages-capable device with me, I’ll get notifications of issues with the site.

Dynamic UI lists in Indigo 6

Indigo 6 is a popular home automation controller software package on the Mac. Extensibility is one of its main features and it allows users to add a range of features to suit their needs.

Using Python scripting, users can create plugins that provide extended functionality. These plugins can provide a custom configuration UI to the user. Since the documentation around a particular feature - dynamic lists was lacking, I’ve written up my approach here.

Since I live in Canada, the excellent NOAA plugin doesn’t work for me. However Environment Canada provides an XML-based weather data API that we could package into an Indigo plugin. Since the number of Environment Canada station locations is large, I would like the user to select a province first then select locations within that province. This means that I must use a dynamic list for the locations and reload the location list dynamically when the province changes. The solution turned out to be simple. Perhaps it could be even simpler. This is just what I came up with.

Devices.xml configuration

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
<?xml version="1.0"?>
<Devices>
<!-- define devices -->
<Device type="custom" id="station">
<Name>Weather station</Name>
<ConfigUI>
<!-- choose location -->
<Field id="province" type="menu">
<Label>Province:</Label>
<List class="self" filter="" method="listProvinces"/>
<CallbackMethod>provinceChanged</CallbackMethod>
</Field>
<!-- choose location within province -->
<Field id="location" type="menu">
<Label>Location:</Label>
<List class="self" filter="" method="listStations" dynamicReload="true"/>
</Field>
</ConfigUI>
</Device>
</Devices>

In the device configuration I’ve specified a province field and a location field. The former provides a callback method provinceChanged where I can deal with filtering the locations based on the province selection. The other key here is to make the location field dynamically-reloadable (dynamicReload="true".) By doing this, we get another call to the list generator method listStations when the province is selected.

Province selection callback

In plugin.py, I must provide a callback method provinceChanged to save my selection:

1
2
def provinceChanged(self, valuesDict, typeId, devId):
self.selectedProvince = valuesDict['province']

Here’s where the solution might be simpler. The documentation is ambiguous about the status of valuesDict if the device hasn’t been saved yet. Based on that ambiguity, I decided to save the selected province as an instance variable of my Plugin class.

Providing a filtered location list

My dynamic list generator for the locations takes the selected province instance variable into consideration so that when the list is dynamically reloaded, I get a chance to filter the list by province.

1
2
3
4
5
6
7
8
9
10
11
def listStations(self, filter="", valuesDict=None, typeId="", targetId=0):
locations = []
stations = []
self.debugLog(u"Generating stations")
stations = self.locationDB.stationsForProvice(self.selectedProvince)
for loc in stations:
option = loc[0]
city,province = loc[1].encode('utf-8'),loc[2].encode('utf-8')
stationName = "{0} ({1})".format(city,province)
locations.append(stationName)
return locations

I have a suspicion there’s an easier way. If you know of one, let me know and I’ll share it.

Import and tag with Hazel and DEVONthink Pro Office

Hazel and DEVONthink make a great pair as I’ve written before. Using AppleScript, it’s possible to take the import workflow even further by tagging incoming files automatically.

Use case

I download a lot of mp3 files containing pronunciation of words in a language I’ve been learning. I keep a record of these words and tag them appropriately using my hierarchical tagging system.

I’d like to download the files to a directory on the desktop. Keep them there for a few minutes until I’m done working with them, then import the file to DEVONthink Pro Office, tag the file there and delete the original.

Read on to see how the Hazel rule is written, including the AppleScript to make it happen.

Read More

Using AppleScript with MailTags

I’m a fan of using metadata to classify and file things rather than declarative systems of nested folders. Most of the documents and data that I store for personal use are in DEVONthink which has robust support for metadata. On the email side, there’s MailTags which lets you apply metadata to emails. Since MailTags also supports AppleScript, I began to wonder whether it might be possible to script workflows around email processing. Indeed it is, once you discover the trick of what dictionary to use.

The key is to use MailTagsHelper for the dictionary. To access the terms from that dictionary, you need to embed the code in the following block:

1
2
3
using terms from application "MailTagsHelper"
-- access MailTags properties here
end using terms from

Read More

Using AdBlock Plus to block YouTube comments

YouTube comments are some of the most offensive on the web. Even serious videos attract trolls bent on inscribing their offensiveness and cruelness on the web.

Here’s one method of dealing with YouTube comments. Treat the comments block as an advertisement and block it.[1]

1. Download AdBlock Plus

Download the AdBlock Plus extension for the browser you use and install it.

2. Create a custom ad filter

In this step you will create a filter that treats the entire comments section of a YouTube page as an advertisement.

  • Navigate to YouTube and load any video page.
  • Click on the AdBlock icon in the toolbar to bring up its contextual menu
AdBlock Plus contextual menu
  • Choose “Block an ad on this page”
  • Navigate to an area of the page just above the “COMMENTS” header where the ads are located. Once the entire ads area of the page is highlighted, click there.
Block YouTube comments
  • AdBlock will ask you to confirm the block. If it looks right to you, agree.[2]

If all goes well, you’ll have comment-free YouTube pages now.


  1. There are other ways of avoiding YouTube comments. I've used ViewPure but it's hard to find content that way even though they seem to be working on making it more seamless to get from YouTube to ViewPure.

  2. The div to be blocked is <DIV id="watch-discussion" class="branded-page-box yt-card scrolldetect" >. Don't be surprised if that changes and you need to update your filter as YouTube changes their page structure.

Introducing AnkiStats & AnkiStatsServer

The spaced repetition software system Anki is the de facto standard for foreign language vocabulary learning. Its algorithm requires lots of performance data to schedule flashcards in the most efficient way. Anki displays these statistics in a group of thorough and informative statistical graphs and descriptive text.

However, they aren’t easily available for the end-user to export. Thus, the reason behind the companion projects AnkiStats and AnkiStatsServer.

The premise is that you can run your own more extensive experiments and statistical tests on the data once you have it in hand. A bit of technical expertise is needed to get it operational but if you are up to it, clone the github repos above and go for it.