Working with the Anki database on mac OS using Python

Not long ago I ran across this post detailing a method for opening and inspecting the Anki database using Python outside the Anki application environment. However, the approach requires linking to the Anki code base which is inaccessible on mac OS since the Python code is packaged into a Mac app on this platform.

The solution I’ve found is inelegant; but just involves downloading the Anki code base to a location on your file system where you can link to it in your code. You can find the Anki code here on github.

Once that’s done, you’re ready to load an Anki collection. First, the preliminaries:

#!/usr/bin/python

import sys

#   paths
ANKI_PATH = 'path to where you downloaded the anki codebase'
COLLECTION_PATH = "path to the Anki collection"

sys.path.append(ANKI_PATH)
from anki import Collection

Now we’re ready to open the collection:

col = Collection(COLLECTION_PATH)

And execute a simple query to print out the total number of cards in the collection:

query = """SELECT COUNT(id) from cards"""
totalCards = col.db.scalar(query)

print 'There are {:.5g} total cards.'.format(totalCards)

Then close the collection:

col.close()

That’s it. Ideally, we’d be able to link to the Anki code bundled with the Mac application. Maybe there’s a way. In the meanwhile, here’s the entire little app:

#!/usr/bin/python

import sys

#   paths
ANKI_PATH = '/Users/alan/Documents/dev/projects/PersonalProjects/anki'
COLLECTION_PATH = "/Users/alan/Documents/Anki/Alan - Russian/collection.anki2"

sys.path.append(ANKI_PATH)
from anki import Collection

col = Collection(COLLECTION_PATH)

query = """SELECT COUNT(id) from cards"""
totalCards = col.db.scalar(query)

print 'There are {:.5g} total cards.'.format(totalCards)

col.close()

Process automation in building Anki vocabulary cards

For the last two years, I’ve been working through a 10,000 word Russian vocabulary ordered by frequency. I have a goal of finishing the list before the end of 2019. This requires not only stubborn persistence but an efficient process of collecting the information that goes onto my Anki flash cards.

My manual process has been to work from a Numbers spreadsheet. As I collect information about each word from several websites, I log it in this table.

For each word, I do the following:

  1. From Open Russian I obtain an example sentence or two.
  2. From Wiktionary I obtain, the definition, more example phrases, any particular grammatical information I need, and audio of the pronunciation if it is available. I also capture the URL from this site onto my flash card.
  3. From the Russian National Corpus I capture the frequency according to their listing in case I want to reorder my frequency list in the future.

This involves lots of cutting, pasting and tab-switching. So I devised an automated approach to loading up this information. This most complicated part was downloading the Russian pronunciation from Wiktionary. I did this with Python.

Downloading pronunciation files from Wiktionary

class WikiPage(object):
    """Wiktionary page - source for the extraction"""
    def __init__(self, ruWord):
        super(WikiPage, self).__init__()
        self.word = ruWord
        self.baseURL = u'http://en.wiktionary.org/wiki/'
        self.anchor = u'#Russian'
    def url(self):
        return self.baseURL + self.word + self.anchor

First, we initialize a WikiPage object by building the main page URL using the Russian word we want to capture. We can capture the page source and look for the direct link to the audio file that we want:

def page(self):
        return requests.get(self.url())
def audioLink(self):
    searchObj = re.search("commons(\\/.+\\/.+\\/Ru-.+\\.ogg)", self.page().text, re.M)
    return searchObj.group(1)

The function audioLink returns a link to the .ogg file that we want to download. Now we just have to download the file:

def downloadAudio(self):
    path = join(expanduser("~"),'Downloads',self.word + '.ogg')
    try:
        mp3file = urllib2.urlopen(self.fullAudioLink())
    except AttributeError:
        print "There appears to be no audio."
        notify("No audio","Wiktionary has no pronunciation", "Pronunciation is not available for download.", sound=True)
    else:
	    with open(path,'wb') as output:
            output.write(mp3file.read())

Now to kick-off the process, we just have to get the word from the mac OS pasteboard, instantiate a WikiPage object and call downloadAudio on it:

word = xerox.paste().encode('utf-8')
wikipage = WikiPage(word)
if DEBUG:
    print wikipage.url()
    print wikipage.fullAudioLink()
wikipage.downloadAudio()

If you’d like to see the entire Python script, the gist is here.

Automating Google Chrome

Next we want to automate Chrome to pull up the word in the reference websites. We’ll do this in AppleScript.

set searchTerm to the clipboard as text
set openRussianURL to "https://en.openrussian.org/ru/" & searchTerm
set wiktionaryURL to "https://en.wiktionary.org/wiki/" & searchTerm & "#Russian"

There we grab the word off the clipboard and build the URL for both sites. Next we’ll look for a tab that contains the Russian National Corpus site and execute a page search for our target word. That way I can easily grab the word frequency from the page.

tell application "Google Chrome" to activate

-- initiate the word find process in dict.ruslang.ru
tell application "Google Chrome"
	--	find the tab with the frequency list
	set i to 0
	repeat with t in (every tab of window 1)
		set i to i + 1
		set searchURLText to (URL of t) as text
		if searchURLText begins with "http://dict.ruslang.ru/" then
			set active tab index of window 1 to i
			exit repeat
		end if
	end repeat
end tell

delay 1

tell application "System Events"
	tell process "Google Chrome"
		keystroke "f" using command down
		delay 0.5
		keystroke "V" using command down
		delay 0.5
		key code 36
	end tell
end tell

Then we need to load the word definition pages using the URLs that we built earlier:

-- load word definitions
tell application "Google Chrome"
	activate
	set i to 0
	set tabList to every tab of window 1
	repeat with theTab in tabList
		set i to i + 1
		set textURL to (URL of theTab) as text
		-- load the word in open russian
		if textURL begins with "https://en.openrussian.org" then
			set URL of theTab to openRussianURL
		end if
		-- load the word in wiktionary
		if textURL begins with "https://en.wiktionary.org" then
			set URL of theTab to wiktionaryURL
			--	make the wiktionary tab the active tab
			set active tab index of window 1 to i
		end if
	end repeat
end tell

Finally, using do shell script we can fire off the Python script to download the audio. Actually, I have the AppleScript do that first to allow time to process the audio as I’ve described previously. Finally, I create a Quicksilver trigger to start the entire process from a single keystroke.

Granted, I have a very specific use case here, but hopefully you’ve been able to glean something useful about process automation of Chrome and using Python to download pronunciation files from Wiktionary. Cheers.

More Javascript with Anki

I wrote a piece previously about using JavaScript in Anki cards. Although I haven’t found many uses for employing this idea, it does come up from time-to-time including a recent use-case I’m writing about now. After downloading a popular French frequency list deck for my daughter to use, I noticed that it omits the gender of nouns in the French prompt. In school, I was always taught to memorize the gender along with the noun.

An approach to dealing with spurious sensor data in Indigo

Spurious sensor data can wreak havoc in an otherwise finely-tuned home automation system. I use temperature data from an Aeotech Multisensor 6 to monitor the environment in our greenhouse. Living in Canada, I cannot rely solely on passive systems to maintain the temperature, particularly at night. So, using the temperature and humidity measurements transmitted back to the controller over Z-wave, I control devices inside the greenhouse that heat and humidify the environment.

Follow the intent.

With Trump the usual advice of “Follow the money.” doesn’t work because Congress refuses to force him to disclose his conflicts of interest. As enormous and material as those conflicts must be, I’m just going to focus on what I can see with my own eyes, the man’s apparent intent. In his public life, Donald Trump has never done anything that did not personally and directly benefit him. Most of us, as we go through life, assemble a collection of acts that are variously self-serving and other-serving.

They're just paid protesters

In an effort to strip protesters of their legitimacy, Trump and Fox News claim that protesters are simply there because they’re paid by powerful oppositional interests. Never mind that Trump has no evidence for his claim; he has no evidence for practically anything that emerges from his loud mouth. What is more interesting to me is that if money delegitimizes authenticity then presumably we can use this effect to come to additional conclusions.

@realDonaldTrump Russian Twitter bot

Someday, when I have time to burn, I’m going to write a Twitter bot that takes all of Trump’s vacuous tweets and translate them into Russian. It’ll look like this: There’s something ludicrous about the idea of the Trump, who is distractible, impatient, and incurious being able to learn Russian, an incredibly difficult language.

marking time

marking time,eyes glazed, pupils constrictedto the head of a pinfrom facing the blue white sterile lightfor too longa zombie tribenumbering in the millionsif not morewaits.this throng, agitatedin a subdued anesthetizedway,crowns one of its owna clown of sortsknowing little of the pastless of the presentand practically nothingof the future.“why not? it could be worse.“in a strange unrealitya vaudeville show becomesits own rehearsal,a dreamish state from whichonly an atomic flashcan awaken a person.

13 Random thoughts about Canada after living here for a year.

On January 1, 2016 we packed up all our earthly goods and headed south to Canada. (Yes, it’s true. When you live in Minnesota, it’s possible to move south to Canada. Look at the map!) Having lived here for a little over a year, here are some thoughts about living here, in no particular order: “Sorry” is more of a greeting than just an apology.