2018 is my year of experiments (Why? TL;DR: New Year’s resolutions are over-rated and have a high failure rate. Anyone can run an experiment for a month.) My first experiment (No news for a month) is nearly done and I’ll declare it a success.

Background

The round-the-clock sensational news cycle exists in large part to create wealth for the already-too-wealthy. Little of it is actionable, leaving us at the same time both outraged and impotent. Mostly I decided to give up on the news because of Donald Trump, the demented psychopathic moron who managed to get elected president.[1] Since Trump took office, like others, I’ve found myself cycling repeatedly through the stages of grief. But mostly I’ve been stuck on anger. There’s something about willful ignorance that does that to me.

Experiment

The methodology was simple. I simply willed myself to avoid the news for an entire month. After briefly considering the use of tools that would block news websites, I decided to go cold-turkey.

Results

Some of the things that I noticed:

  • Airports are saturated with news. I travelled a bit during the month. With TV’s blaring the news in every terminal area, it’s impossible to avoid hearing the news. I learned that a book highly critical of Trump was published and that the man himself was displeased. I learned that Congressional Republicans are trying to stop Special Counsel Robert Mueller’s investigation without looking like that’s what they’re doing.
  • Social media can be a significant vector of news. The sidebar on Facebook likes to trumpet the latest bush crash, earthquake, and political twist. But I also discovered that you can resize your browser to make the sidebar go away. Presto!
  • I tended to want to look at the news when I was bored. If I had a moment of boredom, I’d think about the news. Given that the news is supposed to serve in large part the factual needs of an informed electorate, seeking it out of boredom is more in keeping with the values of the entertainment industry, not those of journalism.
  • Outsourcing the news to others slows down the cycle. It was impossible to avoid the news completely. I heard others talking about political happenings and other current events. In fact, I even asked about them. But by outsourcing the news-seeking to others, I was able to slow down the process and keep it at a distance in a way that made it seem more abstract. I didn’t feel as outraged.
  • I felt more productive Once I eliminated the desire to read the news, I was able to stay with purposeful tasks longer.

Conclusions

After a month of no news, I miss reading good journalism. I may go back to it. Or I may not. The experiment was such a success that it would be hard to go back. The real problem for most of us is that the overlap between our circle of interest (what’s going on in the world) and our circle of influence is very small. David Cain noticed the same thing when he quit the news: “Being concerned makes us feel like we’re doing something when we’re not.”

Now off to my next experiment - a month of practicing a secular technology “sabbath”.


  1. I use these terms very carefully. Many have speculated that he suffers from some form of dementia owing to events where he slurs his words and perseverates. His sociopathic or psychopathic behaviours are well-documented; he is man devoid of empathy. And finally, his lack of reading is well-known. For all I can tell, the man is a functional illiterate. In contrast, his predecessor is a bibliophile and read widely and voraciously throughout his tenure.

That title is a mouthful!

TL;DR: One approach to developing good second language pronunciation and rhythm is to repeat a sentence many times while simultaneously listening to a native speaker. If you do this while gradually reducing the source amplitude, you will be speaking on your own without help. This is an AppleScript that automates this process on the Mac platform.

Background

For adult learners of a second language (L2), pronunciation and prosody (the rhythm and cadence of language) can be difficult. A method devised by Swedish linguist and medical doctor Olle Kjellin seeks to remedy this problem by applying a method of chorus repetitions of sentence in the L2. While listening to the sentence over and over, the learner repeats the same sentence aloud, attempting to match the native speaker’s pronunciation and cadence. By gradually reducing the volume of the native speaker, the learner gradually hears more of his own voice. This shaping process has sound neurocognitive underpinnings and Kjellin’s explanation of the method is definitely worth reading.

Automating the process

One of the ideas that Kjellin discusses is gradual reduction in the native speaker’s volume. That rationale is that as the learner begins to hear less of the native speaker’s voice, he begins to hear more of his own. In this way, he learns to shape his pronunciation and developing prosody while the auditory stimulus is gradually withdrawn.

It is possible to do this automatically on the Mac plattorm.[1] For this approach, I use AppleScript to ask the user for the intended track duration in minutes and then it begins playing the current track, gradually reducing the volume over the course of the desired duration. To simplify the choices the user must make, the script only asks for the duration. The minimum volume is hard-coded as is the linear shape of the decay. With a little ingenuity, these choices could be modified. For example, the volume decay could be faster, leaving some of the remaining time at the minimum volume.[2]

Installing

You’ll need to grab the source code from Github and paste it into a new empty script in AppleScript Editor.app.[3] From AppleScript Editor, you need to save it to the iTunes script directory which is located at ~/Library/Library/Scripts/Applications/iTunes.[4] Sorry this is a little cumbersome but I can help you. Just send me a note via my Shortwhale link.

Source code

For the intrepid and the techies, here’s the source code for you:


  1. Sorry Windows and Linux users, this approach relies on AppleScript which of course doesn't run on these other platforms. Almost certainly there are platform-specific approaches there but that's for someone else to figure out!

  2. Currently when the minimum volume is finally reached, playback stops.

  3. You have it, it's just hard to find. Look in /Applications/Utilities.

  4. You can access the iTunes scripts folder from the scripts menu when iTunes is the frontmost application by going to the scripts menu > Open Scripts Folder > Open iTunes Scripts Folder. That's where you need to save the script.

New Year’s resolution time is at hand. But not for me; at least not in a traditional sense. I was inspired by David Cain’s experiments. In short, he conducts monthly experiments in self-improvement.

The idea of an experiment is appealing in ways that a resolution is not. A resolution presumes an outcome and relies only on the long application of will to see it through. An experiment on the other hand, makes only a conjecture about the outcome and can be conducted for a shorter period.

Here’s my list of experiments for 2018, month by month. Some of these experiments are only about pushing the limits of my own personal projects. For example, I have an obsessional interest in become more fluent in Russian; so two of the experiments are very specific to that. Otherwise, they are commonsense ideas that apply to all of us. Some are connected by a theme of reducing the influence of technology on my life.

January

No news for one month - Reading the news every day is like watching an accident that never stops happening. The U.S. is a disaster. The U.S. president and his ilk are going to say and do outrageous things. Apart from voting, there’s little I can do. So I say, skip it. I tend toward the negative; so I’m curious about how kicking the news habit will affect my mood.

February

Technology sabbath - Since we are a secular family, the idea of sabbath is more like “time away from the mundane.” The intent here is to take a break one day each week away from technology (computer, cell phone, iPad, etc.) This experiment extends the last month’s efforts to break the cell phone habit.

March

No phone day - One day a week, I will power-down my cell phone and put it in the drawer. Since we don’t have a landline, that effectively means others will have find different ways of contacting me. Or wait. I’m curious about whether having a single day a week away from the smartphone is enough to break the habit of picking it up and looking at it during the week.

April

Use reading as sole Anki input - I use a spaced-repetition application called Anki to memorize and practice Russian vocabulary and grammar. On an ordinary day I can add about 10 words a day while reviewing all of the old material. This experiment is about changing the source of my input to find out whether it is sustainable and whether it affects my long-term retention rates. Typically, I work from a frequency list; but for this month I’ll shift to using reading as the source.

May

Pronounce 10,000 sentences - This month, I’ll log 10,000 sentence utterances. Swedish neuroscientist and linguist Olle Kjellin, a champion of patterned repetition of canonical L2 sentences, recommends this way of practicing pronunciation and prosody.

June

No complaining - This is one of David Cain’s experiments. Having a negative bias myself, I’m curious about how forcing myself to reframe events and people’s actions in a neutral or positive way will affect me.

July

Social media once a day (or less) - The degree to which social media frames our perspectives in algorithmic and involuntary ways is frightening. Nonetheless, it has the ability to connect people in ways that can be interesting and touching. This is about trying to slow down the input and make it manageable rather than a burden.

August

No alcohol or caffeine - Simple. No dependencies.

September

Meditate for 10 or more minutes daily - The benefits of meditation are well-known.

October

Declutter daily - Spend 15 minutes a day devoted to organizing and decluttering to observe how it affects my mood, perceptions of order and how it fits into the day’s workload.

November

Aerobic exercise daily - Not too many years ago, I was a committed road cyclist. I’ve ridden up almost all of the major mountain passes in Colorado. Now I’m a couch potato. Time to get moving.

December

Lift weights daily - We lose muscle mass as we age. This experiment is about trying to blunt the effects of age on this reduction by lifting a modest amount of weight every day.

Wish me luck. I’ll need it.

hrplot.png

Yet another diversion to keep me from focusing on actually using Anki to learn Russian. I stumbled on the R programming language, a language that focuses on statistical analysis.

Here’s a couple snippets that begin to scratch the surface of what’s possible. Important caveat: I’m an R novice at best. There are probably much better ways of doing some of this…

Counting notes with a particular model type

Here we’ll use R to do what we did previously with Python.

First load some of the libraries we’ll need:

1
2
3
library(rjson)
library(RSQLite)
library(DBI)

Next we’ll connect to the database and extract the model information:

1
2
3
4
5
6
7
# connect to the Anki database
dbpath <- "path to your collection"
con = dbConnect(RSQLite::SQLite(),dbname=dbpath)

# get information about the models
modelInfo <- as.character(dbGetQuery(con,'SELECT models FROM col'))
models <- fromJSON(modelInfo)

Since the model information is stored as JSON, we’ll need to parse the JSON to build a data frame that we can use to extract the model ID that we’ll need.

1
2
3
4
5
6
7
names <- c()
mid <- names(models)
for(i in 1:length(mid))
{
names[i] <- models[[mid[i]]]$name
}
models <- data.frame(cbind(mid,names))

Next we’ll extract the model ID (mid) from this data frame so that we can find all of the notes with that model ID:

1
2
3
4
5
verbmid <- as.numeric(as.character(models[models$names=="Русский - глагол","mid"]))

# query the notes database for notes with this model
query <- paste("SELECT COUNT(id) FROM notes WHERE mid =",verbmid)
res <- as.numeric(dbGetQuery(con,query))

And of course, close the connection to the Anki SQLite database:

1
dbDisconnect(con)

As of this writing, res tells me I have 702 notes with the verb model types (named “Русский - глагол” in my collection.)

Counting hours per month in Anki

Ever wonder how many hours per month you spend reviewing in Anki? Here’s an R program that will grab review time information from the database and plot it for you. I ran across the original idea in this blog post by Gene Dan, but did a little work on the x-axis scale to get it to display correctly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
library(RSQLite)
library(DBI)
library(rjson)
library(anytime)
library(sqldf)
library(zoo)
library(ggplot2)

dbpath <- "/Users/alan/Library/Application Support/Anki2/Alan - Russian/collection.anki2"
con = dbConnect(RSQLite::SQLite(),dbname=dbpath)
#get reviews
rev <- dbGetQuery(con,'select CAST(id as TEXT) as id
, CAST(cid as TEXT) as cid
, time
from revlog')


cards <- dbGetQuery(con,'select CAST(id as TEXT) as cid, CAST(did as TEXT) as did from cards')

#Get deck info - from the decks field in the col table
deckinfo <- as.character(dbGetQuery(con,'select decks from col'))
decks <- fromJSON(deckinfo)

names <- c()
did <- names(decks)
for(i in 1:length(did))
{
names[i] <- decks[[did[i]]]$name
}

decks <- data.frame(cbind(did,names))
#decks$names <- as.character(decks$names)

cards_w_decks <- merge(cards,decks,by="did")
#Date is UNIX timestamp in milliseconds, divide by 1000 to get seconds
rev$revdate <- as.yearmon(anydate(as.numeric(rev$id)/1000))

# Assign deck info to reviews
rev_w_decks <- merge(rev,cards_w_decks,by="cid")
time_summary <- sqldf("select revdate, sum(time) as Time from rev_w_decks group by revdate")
time_summary$Time <- time_summary$Time/3.6e+6

ggplot(time_summary,aes(x=revdate,y=Time))+geom_bar(stat="identity",fill="#d93d2a")+
scale_x_yearmon()+
ggtitle("Hours per Month") +
xlab("Review Date") +
ylab("Time (hrs)") +
theme(axis.text.x=element_text(hjust=2,size=rel(1))) +
theme(plot.title=element_text(size=rel(1.5),vjust=.9,hjust=.5)) +
guides(fill = guide_legend(reverse = TRUE))

dbDisconnect(con)

You should get a plot like this the one at the top of the post.

I’m anxious to learn more about R and apply it to understanding my performance in Anki.

Since one of the cornerstones of my approach to learning the Russian language has been to track how many words I’ve learned and their frequencies, I was intrigued by reading the following statistics today:

  • The 15 most frequent words in the language account for 25% of all the words in typical texts.
  • The first 100 words account for 60% of the words appearing in texts.
  • 97% of the words one encounters in a ordinary text will be among the first 4000 most frequent words.

In other words, if you learn the first 4000 words of a language, you’ll be able to understand nearly everything.

Source - Five Cornerstones for Second-Language Acquisition - the Neurophysiological Opportunist’s Way - Olle Kjellin, M.D., Ph.D. but originally from The Cambridge Encyclopedia of Language (Crystal, 1995)

Continuing my series on accessing the Anki database outside of the Anki application environment, here’s a piece on accessing the note type model. You may wish to start here with the first article on accessing the Anki database. This is geared toward mac OS. (If you’re not on mac OS, then start here instead.)

The note type model

Since notes contain flexible fields in Anki, the model for a note type is in JSON. The best guess definition of the JSON is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
{
"css": "CSS, shared for all templates",
"did":
"Long specifying the id of the deck that cards are added to by default",
"flds": [
"JSONArray containing object for each field in the model as follows:",
{
"font": "display font",
"media": "array of media. appears to be unused",
"name": "field name",
"ord": "ordinal of the field - goes from 0 to num fields -1",
"rtl": "boolean, right-to-left script",
"size": "font size",
"sticky": "sticky fields retain the value that was last added \
when adding new notes"

}

],

"id": "model ID, matches cards.mid",
"latexPost": "String added to end of LaTeX expressions",
"latexPre": "preamble for LaTeX expressions",
"mod": "modification time in milliseconds",
"name": "model name",
"req": [
"Array of arrays describing which fields are required \
for each card to be generated",

[
"array index, 0, 1, ...",
"? string, all",
"another array",
["appears to be the array index again"]
]
],

"sortf": "Integer specifying which field is used for sorting (browser)",
"tags": "Anki saves the tags of the last added note to the current model",
"tmpls": [
"JSONArray containing object of CardTemplate for each card in model",
{
"afmt": "answer template string",
"bafmt": "browser answer format: used for displaying answer in browser",
"bqfmt": "browser question format: \
used for displaying question in browser"
,

"did": "deck override (null by default)",
"name": "template name",
"ord": "template number, see flds",
"qfmt": "question format string"
}

],

"type": "Integer specifying what type of model. 0 for standard, 1 for cloze",
"usn": "Update sequence number: used in same way as other usn vales in db",
"vers": "Legacy version number (unused)"
}

Our goal today is to count all of the notes that have a given note type. Fortunately, there’s a built-in method for this:

1
verbModel = col.models.byName(u'Русский - глагол')

Here we find the model object (a Python dictionary) named ‘Русский - глагол’ (that’s Russian verb, by the way.) To access its id:

1
modelID = verbModel['id']

Now we just have to count:

1
2
3
4
query = """SELECT COUNT(id) from notes WHERE mid = {}""".format(verbModel['id'])
verbNotes = col.db.scalar(query)

print 'There are {:.5g} verb notes.'.format(verbNotes)

And that’s it for this little adventure in the Anki database.

See also:

I previously wrote about accessing the Anki database using Python on mac OS. Extending that post, I’ll show how to work with a specific deck in this short post.

To use a named deck you’ll need its deck ID. Fortunately there’s a built-in method for finding a deck ID by name:

1
2
col = Collection(COLLECTION_PATH)
dID = col.decks.id(DECK_NAME)

Now in queries against the cards and notes tables we can apply the deck ID to restrict them to a certain deck. For example, to find all of the cards currently in the learning stage:

1
2
3
4
query = """SELECT COUNT(id) FROM cards where type = 1 AND did = dID"""
learningCards = col.db.scalar(query)

print 'There are {:.5g} learning cards.'.format(learningCards)

And close the collection:

1
col.close()

See also:

Not long ago I ran across this post detailing a method for opening and inspecting the Anki database using Python outside the Anki application environment. However, the approach requires linking to the Anki code base which is inaccessible on mac OS since the Python code is packaged into a Mac app on this platform.

The solution I’ve found is inelegant; but just involves downloading the Anki code base to a location on your file system where you can link to it in your code. You can find the Anki code here on github.

Once that’s done, you’re ready to load an Anki collection. First, the preliminaries:

1
2
3
4
5
6
7
8
9
10
#!/usr/bin/python

import sys

# paths
ANKI_PATH = 'path to where you downloaded the anki codebase'
COLLECTION_PATH = "path to the Anki collection"

sys.path.append(ANKI_PATH)
from anki import Collection

Now we’re ready to open the collection:

1
col = Collection(COLLECTION_PATH)

And execute a simple query to print out the total number of cards in the collection:

1
2
3
4
query = """SELECT COUNT(id) from cards"""
totalCards = col.db.scalar(query)

print 'There are {:.5g} total cards.'.format(totalCards)

Then close the collection:

1
col.close()

That’s it. Ideally, we’d be able to link to the Anki code bundled with the Mac application. Maybe there’s a way. In the meanwhile, here’s the entire little app:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#!/usr/bin/python

import sys

# paths
ANKI_PATH = '/Users/alan/Documents/dev/projects/PersonalProjects/anki'
COLLECTION_PATH = "/Users/alan/Documents/Anki/Alan - Russian/collection.anki2"

sys.path.append(ANKI_PATH)
from anki import Collection

col = Collection(COLLECTION_PATH)

query = """SELECT COUNT(id) from cards"""
totalCards = col.db.scalar(query)

print 'There are {:.5g} total cards.'.format(totalCards)

col.close()

For the last two years, I’ve been working through a 10,000 word Russian vocabulary ordered by frequency. I have a goal of finishing the list before the end of 2019. This requires not only stubborn persistence but an efficient process of collecting the information that goes onto my Anki flash cards.

My manual process has been to work from a Numbers spreadsheet. As I collect information about each word from several websites, I log it in this table.

numbers-sheet-ru.png

For each word, I do the following:

  1. From Open Russian I obtain an example sentence or two.
  2. From Wiktionary I obtain, the definition, more example phrases, any particular grammatical information I need, and audio of the pronunciation if it is available. I also capture the URL from this site onto my flash card.
  3. From the Russian National Corpus I capture the frequency according to their listing in case I want to reorder my frequency list in the future.

This involves lots of cutting, pasting and tab-switching. So I devised an automated approach to loading up this information. This most complicated part was downloading the Russian pronunciation from Wiktionary. I did this with Python.

Downloading pronunciation files from Wiktionary

1
2
3
4
5
6
7
8
9
class WikiPage(object):
"""Wiktionary page - source for the extraction"""
def __init__(self, ruWord):
super(WikiPage, self).__init__()
self.word = ruWord
self.baseURL = u'http://en.wiktionary.org/wiki/'
self.anchor = u'#Russian'
def url(self):
return self.baseURL + self.word + self.anchor

First, we initialize a WikiPage object by building the main page URL using the Russian word we want to capture. We can capture the page source and look for the direct link to the audio file that we want:

1
2
3
4
5
def page(self):
return requests.get(self.url())
def audioLink(self):
searchObj = re.search("commons(\\/.+\\/.+\\/Ru-.+\\.ogg)", self.page().text, re.M)
return searchObj.group(1)

The function audioLink returns a link to the .ogg file that we want to download. Now we just have to download the file:

1
2
3
4
5
6
7
8
9
10
def downloadAudio(self):
path = join(expanduser("~"),'Downloads',self.word + '.ogg')
try:
mp3file = urllib2.urlopen(self.fullAudioLink())
except AttributeError:
print "There appears to be no audio."
notify("No audio","Wiktionary has no pronunciation", "Pronunciation is not available for download.", sound=True)
else:
with open(path,'wb') as output:
output.write(mp3file.read())

Now to kick-off the process, we just have to get the word from the mac OS pasteboard, instantiate a WikiPage object and call downloadAudio on it:

1
2
3
4
5
6
word = xerox.paste().encode('utf-8')
wikipage = WikiPage(word)
if DEBUG:
print wikipage.url()
print wikipage.fullAudioLink()
wikipage.downloadAudio()

If you’d like to see the entire Python script, the gist is here.

Automating Google Chrome

Next we want to automate Chrome to pull up the word in the reference websites. We’ll do this in AppleScript.

1
2
3
set searchTerm to the clipboard as text
set openRussianURL to "https://en.openrussian.org/ru/" & searchTerm
set wiktionaryURL to "https://en.wiktionary.org/wiki/" & searchTerm & "#Russian"

There we grab the word off the clipboard and build the URL for both sites. Next we’ll look for a tab that contains the Russian National Corpus site and execute a page search for our target word. That way I can easily grab the word frequency from the page.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
tell application "Google Chrome" to activate

-- initiate the word find process in dict.ruslang.ru
tell application "Google Chrome"
-- find the tab with the frequency list
set i to 0
repeat with t in (every tab of window 1)
set i to i + 1
set searchURLText to (URL of t) as text
if searchURLText begins with "http://dict.ruslang.ru/" then
set active tab index of window 1 to i
exit repeat
end if
end repeat
end tell

delay 1

tell application "System Events"
tell process "Google Chrome"
keystroke "f" using command down
delay 0.5
keystroke "V" using command down
delay 0.5
key code 36
end tell
end tell

Then we need to load the word definition pages using the URLs that we built earlier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
-- load word definitions
tell application "Google Chrome"
activate
set i to 0
set tabList to every tab of window 1
repeat with theTab in tabList
set i to i + 1
set textURL to (URL of theTab) as text
-- load the word in open russian
if textURL begins with "https://en.openrussian.org" then
set URL of theTab to openRussianURL
end if
-- load the word in wiktionary
if textURL begins with "https://en.wiktionary.org" then
set URL of theTab to wiktionaryURL
-- make the wiktionary tab the active tab
set active tab index of window 1 to i
end if
end repeat
end tell

Finally, using do shell script we can fire off the Python script to download the audio. Actually, I have the AppleScript do that first to allow time to process the audio as I’ve described previously. Finally, I create a Quicksilver trigger to start the entire process from a single keystroke.

Granted, I have a very specific use case here, but hopefully you’ve been able to glean something useful about process automation of Chrome and using Python to download pronunciation files from Wiktionary. Cheers.