anki

Pre-processing Russian text for the AwesomeTTS add-on in Anki

The Anki add-on AwesomeTTS has been a vital tool for language learners using the Anki application on the desktop. It allows you to have elements of the card read aloud using text-to-speech capabilities. The new developer of the add-on has added a number of voice options, including the Microsoft Azure voices. The neural voices for Russian are quite good. But they have one major issue, syllabic stress marks that are sometimes seen in text intended for language learners cause the Microsoft Azure voices to grossly mispronounce the word.

Factor analysis of failed language cards in Anki

After developing a rudimentary approach to detecting resistant language learning cards in Anki, I began teasing out individual factors. Once I was able to adjust the number of lapses for the age of the card, I could examine the effect of different factors on the difficulty score that I described previously. Findings Some of the interesting findings from this analysis: Prompt-answer direction - 62% of lapses were in the Russian → English (recognition) direction.

Refactoring Anki language cards

Regardless of how closely you adhere to the 20 rules for formating knowledge, there are cards that seem destined to leechdom. For me part of the problem is that with languages, straight-up vocabulary cards take words out of the rich context in which they exist in the wild. With my maturing collection of Russian decks, I recently started to go through these resistant cards and figure out why they are so difficult.

Parsing Russian Wiktionary content using XPath

As readers of this blog know, I’m an avid user of Anki to learn Russian. I have a number of sources for reference content that go onto my Anki cards. Notably, I use Wiktionary to get word definitions and the word with the proper syllabic stress marked. (This is an aid to pronunciation for Russian language learners.) Since I’m lazy to the core, I came up with a system way of grabbing the stress-marked word from the Wiktionary page using lxml and XPath.

Directly setting an Anki card's interval in the sqlite3 database

It’s always best to let Anki set intervals according to its view of your performance on testing. That said, there are times when directly altering the interval makes sense. For example, to build out a complete representation of the entire Russian National Corpus, I’m forced to enter vocabulary terms that should be obvious to even elementary Russian learners but which aren’t yet in my nearly 24,000 card database. Therefore, I’m entering these cards gradually.

Regex to match a cloze

Anki and some other platforms use a particular format to signify cloze deletions in flashcard text. It has a format like any of the following: {{c1::dog::}} {{c2::dog::domestic canine}} Here’s a regular expression that matches the content of cloze deletions in an arbitrary string, keeping only the main clozed word (in this case dog.) {{c\d::(.*?)(::[^:]+)?}} To see it in action, here it is in action in a Python script:

An alternative method for keyboard input switching on macOS

macOS offers a variety of virtual keyboard layouts which are accessible through System Preferences > Keyboard > Input Sources. Because I spend about half of my time writing in Russian and half in English, rapid switching between keyboard layouts is important. Optionally in the Input Sources preference pane, you can choose to use the Caps lock key to toggle between sources. This almost always works well with the exception of Anki.

Sunday, September 16, 2018

Regex 101 is a great online regex tester. Speaking of regular expressions, for the past year, I’ve used an automated process for building Anki flash cards. One of the steps in the process is to download Russian word pronunciations from Wiktionary. When Wiktionary began publishing transcoded mp3 files rather than just ogg files, they broke the URL scheme that I relied on to download content. The new regex for this scheme is: (?

Peering into Anki using R

Yet another diversion to keep me from focusing on actually using Anki to learn Russian. I stumbled on the R programming language, a language that focuses on statistical analysis. Here’s a couple snippets that begin to scratch the surface of what’s possible. Important caveat: I’m an R novice at best. There are probably much better ways of doing some of this… Counting notes with a particular model type Here we’ll use R to do what we did previously with Python.

Anki database adventures: Counting notes by model type

Continuing my series on accessing the Anki database outside of the Anki application environment, here’s a piece on accessing the note type model. You may wish to start here with the first article on accessing the Anki database. This is geared toward mac OS. (If you’re not on mac OS, then start here instead.) The note type model Since notes contain flexible fields in Anki, the model for a note type is in JSON.