Factor analysis of failed language cards in Anki

After developing a rudimentary approach to detecting resistant language learning cards in Anki, I began teasing out individual factors. Once I was able to adjust the number of lapses for the age of the card, I could examine the effect of different factors on the difficulty score that I described previously.

Findings

Some of the interesting findings from this analysis:

  • Prompt-answer direction - 62% of lapses were in the Russian → English (recognition) direction.1
  • Part of speech - Over half (51%) of lapses were among verbs. Since the Russian verbal system is rich and complex, it’s not surprising to find that verb cards often fail.
  • Noun gender - Between a fifth and a quarter (22%) of all lapses were among neuter nouns and among failures due to nouns only, neuter nouns represented 69% of all lapses. This, too, makes intuitive sense because neuter nouns often represent abstract concepts that are difficult to represent mentally. For example, the Russian words for community, representation, and indignation are all neuter nouns.

Interventions

With a better understanding of the factors that contribute to lapses, it is easier to anticipate failures before they accumulate. For example, I will immediately implement a plan to surround new neuter nouns with a larger variety of audio and sample sentence cards. For new verbs, I’ll do the same, ensuring that I include multiple forms of the verb, varying the examples by tense, number, person, aspect and so on.

Future directions

I’d like to extend this approach to a more statistically-rigorous prediction scheme, so that I can more accurately target efforts to prevent the accumulation of lapses.

References


  1. Note that the fractions in the fx all lapses column for the card direction group do not add up to 1.0 because I excluded a minute number of image cards from the analysis. ↩︎

Refactoring Anki language cards

Regardless of how closely you adhere to the 20 rules for formating knowledge, there are cards that seem destined to leechdom. For me part of the problem is that with languages, straight-up vocabulary cards take words out of the rich context in which they exist in the wild. With my maturing collection of Russian decks, I recently started to go through these resistant cards and figure out why they are so difficult.

Why do some vocabulary cards resist learning?

  1. Rare words are rare in ordinary daily use, so they are not reinforced outside of Anki. - Of course this is one of the reasons that some recommend just deleting or suspending leeches. I’m unwilling to that except in the rarest of cases. It feels like giving up. But the point stands, since you don’t encounter the word frequently, if at all, outside of Anki reviews, the little hits of memory that would ordinarily prop up the forgetting curve don’t occur.
  2. The card may be badly formatted. - For example a card with an image on one side and a Russian word on the other side may be difficult simply because the meaning conveyed by the image is ambigous.
  3. Synonyms are tough. - As you progress in the breadth of your vocabulary, you will accumulate words with similar meanings. When faced with an English → Russian card (a production card) which variant is intended?
  4. Interference with similar words can be a problem. - For example the two words угол and уголь are major interferers for me. Yes the spelling is different and yes, the pronunciation is somewhat different, but to the ears of a native English speaker, not different enough. So I might be close when I see one of them, but not absolute.

Finally, some cards fail regularly because of some inexplicable reason. For example, I have a vocabulary card for the verb pair полагаться/положиться “to rely (on)". There is no obvious reason why that should fail as often as it does for me. It’s used frequently in speech and writing, but something about that aspect pair in isolation draws a blank for me.

Bolstering vocabulary card performance

Before addressing how to improve card performance, I’ll take a step back and address a way of finding resistant cards. The simplest approach in the card browser is to browse the deck of interest and sort by lapse count. This is a great start but I noticed that it also identifies cards that were resistant at one time, but have improved subsequently. If your goal is to order resistant cards by some metric of difficulty for the purpose of prioritizing them, then you need some metric. Since for any similar of difficulty, the lapse count will be greater for cards that have been around longer. Also cards with a higher ease factor are possibly cards that accumulated a lot of lapses early then stabilized (and thus recovered their ease.)

While completely arbitrary, the scoring formula that I’m using is given by:

where l is the number of lapses, d is the number of days since card creation and f is the ease factor of the card in the format it is stored in the database (so 250% = 2.50). The formula is nearly entirely arbitrary except for the idea of scaling the number of lapses by the age of the card to approximate a sort of lapses per unit of time measurement. The factor by which the card ease influence the score was determined arbitrarily by finding the smallest integer that gave positive scores across a sample of cards with the most frequent lapses. I haven’t yet come up with a better ways of objectively modelling this.

To implement this formula, I exported all cards with more than 10 lapses to a CSV file and exported it to Numbers. (Excel will work too.) After sorting by this score, I was able to get a better idea where my priorities should be. I was also able to make a more educated guess as to why the card was failing and implement strategies on a per-card (or per-note) basis.

Strategies for improving performance

  1. Ensure that there is at least one sample sentence card, preferably more. - One of the lessons I’ve learned through years of using Anki for language learning is that context matters a lot. So before I do anything, I look through my sentence decks for any cards that contain the word in question. This in itself can be complicated in a highly inflected languaged like Russian. So I have a field lemmas on each sentence card that captures all of the relevant lemmas (uninflected root word forms) in the sentence. To lemmatize all of my several thousand sentence initially, I created a custom script that employs the natural language processing module stanza to extract all of the lemmas and load them into the lemmas field.1
  2. Add a synonyms field to straight vocabulary cards - The mapping between concepts in two languages, like Russian and English is not 1:1 so several Russian words may match an English prompt on a card. This ambiguity can be resolved by adding a field for synonyms on the card, so that you can at least see what the answer is not.
  3. Reduce the ambiguity of image cards. - Since some images can be ambigous, using clarifying additions such as comments, arrows, and other callouts can add enough specificity to make the card more useful.
  4. Add monolingual definition cards. - To make words appear more often in a different form, I’ve added a third card type that displays the Russian language definition of the Russian word. The adds memorability by piling on a little more context, albeit distributed in time.
  5. Add even more sentences that you generate yourself. - Sentences that are personally-relevant or have other “interestingness” hooks2 can be valuable here. I use DeepL for translating English expressions into Russian.3
  6. Add audio-only sentence cards. - Create cards where the prompt is a spoken Russian sentence that employs the word in question. You can use the AwesomeTTS add-on in Anki to generate the audio for these cards. Several authentic text-to-speech providers are available. I use Microsoft Azure and Google Cloud TTS in addition to the built-in macOS text-to-speech system.
  7. Make certain that your language-learning “diet” is as diverse as possible. - Since it is unreasonable to expect that the only time you’re going to encounter a word is inside of Anki, don’t rely solely on Anki. A language is much more than the sum of all of its words. Setting aside ample time for reading, listening to podcasts, reading fiction, reading news, speaking and so forth is more important than any intervention that you can implement inside of Anki.

References

  1. Anki / Spaced Repetition Tip: Review your Weak Flashcards - A similar post on refactoring weak Anki cards, how to detect weak cards, etc.

  1. I would post the script here but it is so deeply specific to my card setup that I can’t imagine it would be very useful to anyone. But feel free to contact me if you are interested. ↩︎

  2. This is one of the principles used in the Method of Loci for enhancing memory. The more raw, vivid, and odd the thing or situation being described, the stickier it is. ↩︎

  3. I’ve found that DeepL translations are more natural-sounding than those from Google Translate and also allow you to choose alternate words in the resulting sentence. ↩︎

Parsing Russian Wiktionary content using XPath

As readers of this blog know, I’m an avid user of Anki to learn Russian. I have a number of sources for reference content that go onto my Anki cards. Notably, I use Wiktionary to get word definitions and the word with the proper syllabic stress marked. (This is an aid to pronunciation for Russian language learners.) Since I’m lazy to the core, I came up with a system way of grabbing the stress-marked word from the Wiktionary page using lxml and XPath.

Being grateful for those who push our buttons

We need people to push our buttons, otherwise how are we to know what buttons we have? Jetsunma Tenzin Palmo Ten Percent Happier podcast, February 8, 2021 Jetsunma Tenzin Palmo is a Buddhist nun interviewed on the excellent Ten Percent Happier podcast. It’s always possible to reframe situations where someone “pushes our buttons” to see it as an opportunity to better understand that there are these buttons, these sensitivities that otherwise evade our awareness.

Directly setting an Anki card's interval in the sqlite3 database

It’s always best to let Anki set intervals according to its view of your performance on testing. That said, there are times when directly altering the interval makes sense. For example, to build out a complete representation of the entire Russian National Corpus, I’m forced to enter vocabulary terms that should be obvious to even elementary Russian learners but which aren’t yet in my nearly 24,000 card database. Therefore, I’m entering these cards gradually.

Where the power lies in 2021

From an article recently on the BBC Russian Service: Блокировка уходящего президента США в “Твиттере” и “Фейсбуке” привела к необычной ситуации: теоретически Трамп еще может начать ядерную войну, но не может написать твит. “Blocking the outgoing U.S. President from Twitter and Facebook has led to an unusual situation: theoretically Trump can still start a nuclear war, but cannot write a Tweet." In only a week, he won’t be able to do either.

More on integrating Hazel and DEVONthink

Since DEVONthink is my primary knowledge-management and repository tool on the macOS desktop, I constantly work with mechanisms for efficiently getting data into and out of it. I previously wrote about using Hazel and DEVONthink together. This post extends those ideas about and looks into options for preprocessing documents in Hazel before importing into DEVONthink as a way of sidestepping some of the limitations of Smart Rules in the latter. I’m going to work from a particular use-case to illustrate some of the options.

Undoing the Anki new card custom study limit

Recently I hit an extra digit when setting up a custom new card session and was stuck with hundreds of new cards to review. Desparate to fix this, I started poking around the Anki collection SQLite database, I found the collection data responsible for the extra cards. In the col table, find the newToday key and you’ll find the extra card count expressed as a negative integer. Just change that to zero and you’ll be good.

Copy Zettel as link in DEVONthink

Following up on my recent article on cleaning up Zettelkasten WikiLinks in DEVONthink, here’s another script to solve the problem of linking notes. Backing up to the problem. In the Zettelkasten (or archive) - Zettel (or notes) are stored as list of Markdown files. But what happens when I want to add a link to another note into one that I’m writing? Since DEVONthink recognizes WikiLinks, I can just start typing but then I have to remember the exact date so that I can pick the item out of the contextual list that DEVONthink offers as links.

Cleaning up Zettelkasten WikiLinks in DEVONthink Pro

Organizing and reorganizing knowledge is one my seemingly endless tasks. For years, I’ve used DEVONthink as my primary knowledge repository. Recently, though I began to lament the fact that while I seemed to be collecting and storing knowledge in a raw form in DEVONthink, that I wasn’t really processing and engaging with it intellectually.1 In other words, I found myself collecting content but not really synthesizing, personalizing and using it. While researching note-taking systems in the search for a better way to process and absord the information I had been collecting, I discovered the Zettelkasten method.