Welcome to Part III of a deep dive into my Anki language learning decks. In Part I I covered the principles that guide how I setup my decks and the overall deck structure. In the lengthy Part II I delved into my vocabulary deck. In this installment, Part III, we’ll cover my sentence decks.
Principles
First, sentences (and still larger units of language) should eventually take precedence in language study. What help is it to know the word for “tomato” in your L2, if you don’t know how to slice a tomato, how to eat a tomato, how to grow a tomato plant? Focus on larger units of language increases your success rate in integrating vocabulary into daily use.
Second, I don’t want sentence learning to require a lot of extra effort. If I’m learning a new word, I don’t want to have to create a separate sentence card while I’m making a vocabulary card. (Fortunately I have a solution for that!)
Types of sentence cards
Cloze deletion cards
My sentence cards are almost all have a cloze deletion format.
Above is the front side of a typical cloze deletion card. The card is asking to recall the bracketed word(s) and the back will reveal them.
On the back, we reveal the clozed text and also expose any notes, such as usage notes, alternative translations and so forth. And that’s the essence of the cloze deletion card.
Two methods of generating cloze deletion cards
There are two ways that cloze deletion cards come about. The standard method relies on Anki’s built in cloze mechanism. The other way uses a script Anki Cloze Anything that simulates a cloze card. The outcome looks the same but the generation mechanism differs.
Straight cloze deletion cards
My straight cloze deletion cards use a version of the built-in cloze deletion note type. I’ve added several fields that are specific to my purposes, but it’s basically a cloze deletion note. To designate a block of text to be clozed out, the format looks like this:
Anki Cloze Anything cards
I hinted at this idea when I wrote about vocabulary cards in Part II. The problem that this solves is the need to use a cloze deletion note type in order to access cloze deletion functionality. Since Anki notes are capable of generating multiple different card types, this seems an unnecessary distinction. Anki Cloze Anything (ACA) is a JavaScript that allows you to bypass this limitation by simulating a cloze deletion card in any standard note type.
In this way, I can add a sentence cloze deletion inside my standard vocabulary cards.
Note that the format that ACA uses is slightly different from the built-in cloze. Instead of curly braces, it uses parentheses. But the outcome is the same; the resulting cards look exactly like a built-in cloze deletion card!
There are two important issues with the ACA mechanism: the appearance of the sentence on non-ACA cloze, and the pronunciation of the sentence when using AwesomeTTS. Both can be solved, but they require some work.
ACA sentences and display on non-cloze cards
If you try to display an example sentence that has ACA markup on a non-cloze card, it shows the markup still in place:
Fortunately, we can solve this easily by applying some text manipulation using a regular expression in JavaScript:
/*
_fix_cloze_anything_example_sentence.js
2022-06-04
On any card that shows an example sentence that has
Cloze Anything markup and is in a span
that has rusentence class, strip that markup from it.
*/functionfix_cloze_anything_example_sentence(){document.querySelectorAll('span.rusentence').forEach((el,idx)=>{lettext=el.textContent;letre=/\(\(c\d::([^\)\(:]+)(?:::[^\):]+)?\)\)/g;text=text.replace(re,"$1");text=text.replace(newRegExp('`','g'),"");el.textContent=text;});}
When this is wrapped in a <script></script> block in the card template, the marked-up sentence will appear normal.
ACA sentences and AwesomeTTS
The harder problem to solve is with TTS. The pronunciation file seems to be generated before the Anki Cloze Anything script has a chance to process and strip the markup. As a result, AwesomeTTS pronounces the sentence with the markup in place. Needless to say, that won’t work.
The only solution I’ve found is to create a separate copy of the sentence without the ACA markup and use that for the pronunciation. Not ideal, but I’ve written a Keyboard Maestro macro that ingests the original marked up sentence, removes the formatting and pastes in a dedicated pronounceable field on my template. It could be worse…
Parting words on sentences
A few miscellaneous thoughts that have informed my use of sentence cards:
You don’t have to cloze single words. Prepositional phrases, whole clauses, even entire sentences are excellent candidates for cloze deletion and pushes you in the direction of larger and larger units of language.
Sentences can come from anywhere. The best source is real life or whatever you’re reading in the target language; that’s where a majority of mine come from. But Tatoeba is a great source of sentences that are verified by native speakers.
Text-to-speech (TTS) is excellent. There are several ways to go about this. I use AwesomeTTS. The developers now are saying that it is being phased out in favour of HyperTTS. As of this writing, I haven’t made the transition.
I’ve begun adding images to my sentence cards. Anything to reinforce the learning using different sensory modalities.
That’s what I have on sentence cards. In the next article in the series, I’ll describe my grammar cards. In the meanwhile, if you would like to contact me about something in this article or any of my Anki-related posts, you can use this contact form.
In Part I of my series on my Anki language-learning setup, I described the philosophy that informs my Anki setup and touched on the deck overview. Now I’ll tackle the largest and most complex deck(s), my vocabulary decks.
First some FAQ’s about my vocabulary deck:
Do you organize it as L1 → L2 or as L2 → L1, or both? Actually, it’s both and more. Keep reading.
Do you have separate subdecks by language level, or source, or some other characteristic? No, it’s just a single deck. First, I’m perpetually confused by how subdecks work. I’d rather subdecks just act as organizational, not functional, tools. But other users don’t see it that way. That’s why I use tags rather than subdecks to organize content.1
Do you use frequency lists? No, I extract words from content that I’m reading, that I encounter when listening to moviews or podcasts, or words that my tutor mentions in conversation. That’s what goes in Anki.
Since this is a big topic, I’m going to start with a quick overview of the fields in the main note type that populates my vocabulary deck and then go into each one in more detail and how they fit together in each of my many card types. At the very end of the post, I’ll talk about verb cards which are similar in most ways to the straight vocabulary card, but which account from the complexities of the Russian verbal system.2
Before diving in, just note that I’m on version 2.1.49. For reasons.3
If you prefer to skip to a particular part of the post, here’s a quick outline:
Note ID - like it says, the note ID. It’s because I don’t want to uniquely identify by the word, but by some other factor. Basically I reserve the right to have what Anki would otherwise consider duplicates.
Front - this is the Russian (L2) word
Pronunciation - the link to the audio file of the pronunciation
Back - the English (L1) definition
sentence_ru - an example sentence in Russian (L2)
sentence_en - the English (L1) translation of the sentence_ru
Notes - notes about usage, always in Markdown. More on that later.
Frequency - part of an abandoned idea, just too lazy to remove it.
URL - if I have a link for more content on a word, it goes here. Mostly unused.
Hint - This field is used to help disambiguate synonmys. It’s usually the first two letters + ellipsis.
Synonyms - a list of synonyms
RNC_frequency - the numerical frequency in the Russian National Corpus. If the word doesn’t appear in that corpus, just “NA”
RussianDef - the Russian language definition of the word
recognition_only - if non-empty, this field turns the card into a recognition-only card, meaning only L2 → L1
ExpressionCloze1, ExpressionCloze2, ExpressionCloze3 - these are flags for the Anki Cloze Anything script. More on this in a bit. This is such an important add-on that no-one knows about.
image - if the word can be depicted in an image, this is where it goes
@antonyms - like synonyms, but the opposite
pronounce_sentence_ru - complicated to explain. For now, I’ll just say that it’s needed for the AwesomeTTS functionality.
@is_numeral_card - if non-empty, this becomes a card for testing numbers. I’ll discuss that in a future post.
@numeral_text - related to above
@is_sentence_translation_card - another field that’s a little complicated to explain. I’ll discuss it in a future post about my sentence deck.
Since all of my vocabulary cards have pronunciations, the replay button appears on the card to hear the audio again. Of course, the audio only plays on the back side of this card. (Otherwise it would give away the answer!) The styling of the replay button is custom because the base styling makes the button absurdly large.
L2 → L1 cards
These, of course, display Russian on the front and English on the reverse.
Now onto some of the more interesting card types. Note that all of these card types are built from the same note. They take the field list above and and just format them in different ways.
And on the reverse (answer) side:
So far, this is the first time we’ve encountered a card with Notes. Whenever the Notes field is non-empty, we display it. This field is always assumed to contain HTML. But I don’t write out the HTML by hand. Instead, I write it in Markdown and I have a Keyboard Maestro macro that grabs the content from the editor field, transforms the Markdown content into HTML and pastes it into the HTML field editor. I’ve previously written in detail about the process of Generating HTML from Markdown in Anki fields
While we’re on a digression about Markdown and optional sections on my cards, I should mention two other optional sections: synonyms and antonyms. Not every card has this feature; but it looks like this:
These sections collapse and expand by clicking on the disclosure triangle.
L2 definition → L2 word
This isn’t quite a true monolingual card, but the intent is the same. On the front side is the L2 defintion and on the reverse is the L2 word (along with the L1 meaning and other information.)
The back is a somewhat denser presentation than we’ve seen so far:
One new feature here is the appearance of grammatical information. The way I get this data should probably be its own post, but for now, I’ll describe it at a high level. I run an instance of russian_grammar_server that provides a number of endpoints, one of which is /pos. This API accepts a Russian word and returns part of speech information. The template for this card just makes a call to that server and then format the response on the card. If for some reason the server is unreachable, we just omit that info.
Although I believe that making your own cards is the best way to acquire a useful vocabulary, not every part of the process is valuable. Time that I spend attending to extracting and formatting information takes away from the time that I could be studying. So my goal has been to automate these processes. Since almost all of these automations are finely tuned to my individual requirements, my hope is that you’ll be able to adopt some of the concepts that I use, if not the exact routines. Again, I use macOS and the solutions that I employ are heavily dependent on applications that run only on that OS. With those caveats, let’s talk about my process.
Researching words
When I research a new word, I want to know:
the English language definition
the Russian language (monolingual) definition
the pronunciation
an example sentence or two
For each of these elements, I have a single go-to site:
Because each of these sites has a predictable URL pattern for loading words, I can use AppleScript to load the word into four adjacent tabs.
setsearchTermtothe clipboardas textsetopenRussianURLto"https://en.openrussian.org/ru/"&searchTermsetwiktionaryURLto"https://en.wiktionary.org/wiki/"&searchTerm&"#Russian"setforvoURLto"https://forvo.com/search/"&searchTerm&"/ru/"setruWiktionaryURLto"https://ru.wiktionary.org/wiki/"&searchTermtellapplication"Safari"toactivate-- load word definitionstellapplication"Safari"activatesetito0settabListtoeverytabofwindow1settabCounttocountoftabListrepeattabCounttimestellwindow1setitoi+1settextURLto(URLoftabi)as text-- load the word in open russianiftextURLbegins with"https://en.openrussian.org"thensetencodedURLtourlEncode(openRussianURL)ofmesetURLoftabitoencodedURLendif-- load the word in wiktionaryiftextURLbegins with"https://en.wiktionary.org"thensetURLoftabitourlEncode(wiktionaryURL)ofme-- make the wiktionary tab the active tabtrysetcurrenttabofwindow1totabiendtryendififtextURLbegins with"https://forvo.com"thensetURLoftabitourlEncode(forvoURL)ofmeendififtextURLbegins with"https://ru.wiktionary.org"thensetURLoftabitourlEncode(ruWiktionaryURL)ofmeendifendtellendrepeatendtell-- encode Cyrillic test as "%D0" type stringsonurlEncode(input)tellcurrent application's NSStringtosetrawUrltostringWithString_(input)settheEncodedURLtorawUrl's stringByAddingPercentEscapesUsingEncoding:4-- 4 is NSUTF8StringEncodingreturntheEncodedURLas Unicode textendurlEncode
I have a Keyboard Maestro macro that responds to ⌃L. For any word, I just copy it to the clipboard with ⌘V and then ⌃L to research. It’s a little more complicated if I encounter a word that’s in an inflected form. Then I have to use Wiktionary or some other source first to find the uninflected form.
The research macro does one more action which is to download the pronunciation file from Forvo. This is a little beyond the scope of what I wanted to present in this post; but I promise to write something about the forvodl tool that I wrote for this purpose.
Extracting word research data into Anki
Most of the heavy lifting here is done again by a Keyboard Maestro macro. All that’s necessary is to copy the word of interest to the clipboard and in the Anki new card editor, I active with macro with ⇧⌃L and the macro takes over. It uses UI navigation to move between fields in the note editor, stopping at each field to extract the relevant piece of information for that field. I’ll walk through some of those “stops” to discuss how I get the data.
Front field - Headword
The first field into which I extract research data is the Front field of the card. That’s the Russian (L2) word. To extract this word, which Wiktionary calls the headword, I use a custom tool I wrote called rheadword. It works by parsing the HTML of the Wiktionary page and extracting the element for the headword.
#!/usr/bin/env python3fromurllib.requestimporturlopen,Requestimporturllib.parsefromrandom_user_agent.user_agentimportUserAgentfromrandom_user_agent.paramsimportSoftwareName,OperatingSystem,HardwareTypeimportcopyimportreimportsysfrombs4importBeautifulSoup__version__=0.9# accept word as either argument or on stdintry:raw_word=sys.argv[1]exceptIndexError:raw_word=sys.stdin.read()raw_word=raw_word.replace(" ","_").strip()word=urllib.parse.quote(raw_word)url=f'https://en.wiktionary.org/wiki/{word}#Russian'hn=[HardwareType.COMPUTER.value]user_agent_rotator=UserAgent(hardware_types=hn,limit=20)user_agent=user_agent_rotator.get_random_user_agent()headers={'user-agent':user_agent}try:response=urlopen(Request(url,headers=headers))excepturllib.error.HTTPErrorase:ife.code==404:print("Error - no such word")else:print(f"Error: status {e.code}")sys.exit(1)# first extract the Russian content because# we may have other languages. This just# simplifies the parsing for the headwordnew_soup=BeautifulSoup('','html.parser')soup=BeautifulSoup(response.read(),'html.parser')forh2insoup.find_all('h2'):forspaninh2.children:try:if'Russian'inspan['id']:new_soup.append(copy.copy(h2))# capture everything in the Russian sectionforcurr_siblinginh2.next_siblings:ifcurr_sibling.name=="h2":breakelse:new_soup.append(copy.copy(curr_sibling))breakexcept:pass# use the derived soup to pick out the headword from# the Russian-specific contentheadwords=[]forstronginnew_soup.find_all('strong'):node_lang=strong.get('lang')node_class=strong.get('class')ifnode_lang=="ru":if"Cyrl"innode_class:if"headword"innode_class:raw_headword=strong.textheadwords.append(raw_headword)try:print(headwords[0])sys.exit(0)exceptSystemExit:# this just avoids triggering an exception due# to a normal exitpassexceptIndexError:# we didn't find any wordsprint("Error")sys.exit(1)
The macro simply pastes the output of the script and tabs for the next field.
Pronunciation field
Since we’ve already downloaded the pronunciation file from Forvo, what’s left is to insert it into the Anki note. Here we fire off a simple shell script that takes care of that.
#!/bin/zsh
USERDIR="/Users/$(whoami)"APPSUPPORTDIR="$USERDIR/Library/Application Support/Anki2"COLLDIR="$APPSUPPORTDIR/Alan - Russian"MEDIARDIR="$COLLDIR/collection.media"# locate the file we downloadedFILE="$(ls $HOME/Documents/mp3 | head -1)"# play it so we can hear
afplay "$HOME/Documents/mp3/$FILE"# copy it to the collection.media directory
cp $HOME/Documents/mp3/$FILE"$MEDIARDIR/$FILE"# insert the link in the Anki fieldecho"[sound:$FILE]"
Since once of my card types is monolingual (this one), I need to extract the Russian language definition. Again, another script. I use a technique similar to the one presented in this post; but of course the Russian Wiktionary page structure is different from the English version. There are also some interesting subtleties that have to be dealt with. Again, I promise to write about that, too!
Word frequency in Russian National Corpus
The macro now advances to the last field that we automatically complete. That field is the RNC_frequency field. The Russian National Corpus (RNC) is a comprehensive (though seemingly incomplete!) collection of words used in the Russian language. In the post Searching the Russian National Corpus I described the creation of a sqlite3 database of terms from the RNC. Essentially, this step in the macro is a just a script that searches that database for the term and fetches the frequency.
And that ends the process. In all, it takes a couple seconds to run through all of the extractions. The only work left is to identify an example sentence. While I could probably automate the extract of a sentence from Tatoeba, I’d prefer to not leave it to chance as some sentences are more suitable than others. So here, I do my own search. It’s also my chance to add one additional touch which involves create a cloze-type card from the sentence. But I’ll discuss that in the next installment on sentence cards.
Stuff I promised to write in more detail about
Process for extract Russian language definitions from ru.wiktionary.org
Extracting audio from the Forvi API
Longer term promises
Release a sample deck - when 🤷🏻♂️
This isn’t always true as I have some legacy non-vocabulary decks that are organized by source. But there’s a reason for that which I’ll get to in a later post. ↩︎
Since Russian verbs mostly come in aspect pairs… ↩︎
As we go on, you’ll see that many of the scripts I use to efficiently create cards rely on AppleScript UI scripting. The > 2.1.49 updates completely break those features. At some point I will have to go through all of these scripts and update them. But for now, I’m staying with 2.1.49 because I’m too lazy to go through all of that. ↩︎
Although I’ve been writing about Anki for years, it’s been in bits and pieces. Solving little problems. Creating efficiencies. But I realized that I’ve never taken a top-down approach to my Anki language learning system. So consider the post the launch of that overdue effort.
Caveats A few caveats at the outset:
I’m not a professional language tutor or pedagogue of any sort really. Much of what I’ve developed, I’ve done through trial-and-error, some intuition, and a some reading on relevant topics.
In my perpetual attempt to make my language learning process using Anki more efficient, I’ve written a tool to extract English-language definitions from Russian words from Wiktionary. I wrote about the idea previously in Scraping Russian word definitions from Wikitionary: utility for Anki but it relied on the WiktionaryParser module which is good but misses some important edge cases. So I rolled up my sleeves and crafted my own solution. As with WiktionaryParser the heavy-lifting is done by the Beautiful Soup parser.
A few years ago, I wrote about my problems with HTML in Anki fields. If you check out that previous post you’ll get the backstory about my objection.
The gist is this: If you copy something from the web, Anki tries to maintain the formatting. Basically it just pastes the HTML off the clipboard. Supposedly, Anki offers to strip the formatting with Shift-paste, but I’ve point out to the developer specific examples where this fails.
I would like to propose a constitutional amendment that prohibits Sen. Ted Cruz (F-TX)1 from speaking or tweeting for seven days after a national tragedy. I’d also be fine with an amendment that prohibits him from speaking ever.
The “F” designation stands for Fascist. The party to which Cruz nominally belongs is more aligned with WW2-era Axis dictatorships than those of a legitimate free civil democracy. ↩︎
I was using a REST API at https://textance.herokuapp.com/title but it seems awfully fragile. Sure enough this morning, the entire application is down. It’s also not open-source and I have no idea who actually runs this thing.
Here’s the solution:
#!/bin/bash url=$(pbpaste) curl $url -so - | pup 'meta[property=og:title] attr{content}' It does require pup. On macOS, you can install via brew install pup.
There are other ways using regular expressions but no dependency on pup but parsing HTML with regex is not such a good idea.