Regex to match a cloze

Anki and some other platforms use a particular format to signify cloze deletions in flashcard text. It has a format like any of the following:

  • {{c1::dog::}}
  • {{c2::dog::domestic canine}}

Here’s a regular expression that matches the content of cloze deletions in an arbitrary string, keeping only the main clozed word (in this case dog.)

{{c\d::(.*?)(::[^:]+)?}}

To see it in action, here it is in action in a Python script:

import re

def stripCloze(searchText):
    return re.sub(r'{{c\d::(.*?)(::[^:]+)?}}', r"\1", searchText)

print(stripCloze("The {{c1::passengers::tourist riders}} spotted a breaching {{c2::whale}}."))

It should return The passengers spotted a breaching whale.

Removing stress marks from Russian text

Previously, I wrote about adding syllabic stress marks to Russian text. Here’s a method for doing the opposite - that is, removing such marks (ударение) from Russian text.

Although there may well be a more sophisticated approach, regex is well-suited to this task. The problem is that

def string_replace(dict,text):
   sorted_dict = {k: dict[k] for k in sorted(dict)}
   for n in sorted_dict.keys():
      text = text.replace(n,dict[n])
   return text

dict = { "а́" : "а", "е́" : "е", "о́" : "о", "у́" : "у",
      "я́" : "я", "ю́" : "ю", "ы́" : "ы", "и́" : "и",
      "ё́" : "ё", "А́" : "А", "Е́" : "Е", "О́" : "О",
      "У́" : "У", "Я́" : "Я", "Ю́" : "Ю", "Ы́" : "Ы",
      "И́" : "И", "Э́" : "Э", "э́" : "э"
   } 
   
print(string_replace(dict, "Существи́тельные в шве́дском обычно де́лятся на пять склоне́ний."))

This should print: Существительные в шведском обычно делятся на пять склонений.

"Delete any app that makes money off your attention."

Listening to Cal Newport interviewed on a recent podcast, something he said resonated. I’m probably paraphrasing, but a key piece of advice was: “Delete any app that makes money off your attention.”

Seems like really good advice. A smartphone is a collection of tools embedded in a tool. Use it like a tool and not an entertainment device and you’ll be find. For a while, in an effort to pry myself loose from the psychic hold of the smartphone I went back to using some kind of old flip phone. But I realized that I went too far. So much of our communication is via text now, it was really hard to communicate. But now, though I’m back to using a smartphone, I find that I’m much more careful about what I install on it:

URL-encoding URLs in AppleScript

The AppleScript Safari API is apparently quite finicky and rejects Russian Cyrillic characters when loading URLs.

For example, the following URL https://en.wiktionary.org/wiki/стоять#Russian throws an error in AppleScript. Instead, Safari requires URL’s of the form https://en.wiktionary.org/wiki/%D1%81%D1%82%D0%BE%D1%8F%D1%82%D1%8C#Russian whereas Chrome happily consumes whatever comes along. So, we just need to encode the URL thusly:

use framework "Foundation"

-- encode Cyrillic test as "%D0" type strings
on urlEncode(input)
   tell current application's NSString to set rawUrl to stringWithString_(input)
   -- 4 is NSUTF8StringEncoding
   set theEncodedURL to rawUrl's stringByAddingPercentEscapesUsingEncoding:4 
   return theEncodedURL as Unicode text
end urlEncode

When researching Russian words for vocabulary study, I use the URL encoding handler to load the appropriate words into several reference sites in sequential Safari tabs.

Consume media outside one's bubble?

That “reality bubbles” contribute heavily to increasing political polarization is well-known. Customized media diets at scale and social media feeds that are tailored to individual proclivities progressively narrow our understanding of perspectives other than our own. Yet, the cures are difficult and uncertain. Often, though, we’re advised to consume media from the other side of the political divide.

A sentence from a recent piece in The Atlantic encapsulates why I think this is such a fraught idea:

Свидетельство того или тому?

I was puzzled by this sentence on the BBC Russian Service:

Нет свидетельств тому, что на нынешних выборах дело обстоит иначе.

ББС
  <cite>Мошенничество на выборах в США? Проверяем факты в речи Трампа</cite>

It means “There is no evidence that in the current election things are any different.” but the puzzle isn’t the meaning, it’s the grammatical case in which the author has placed the demonstrative pronoun то , which is dative here тому . The thing is that you see examples where either the genitive or the dative follows свидетельство . So what’s the difference?

Escaping "Anki hell" by direct manipulation of the Anki sqlite3 database

There’s a phenomenon that verteran Anki users are familiar with - the so-called “Anki hell” or “ease hell.”

Origins of ease hell

The descent into ease hell has to do with the way Anki handles correct and incorrect answers when it presents cards for review. Ease is a numerical score associated with every card in the database and represents a valuation of the difficulty of the card. By default, when cards graduate from the learning phase, an ease of 250% is applied to the card. If you continue to get the card correct, then the ease remains at 250% in perpetuity. As you see the card at its increasing intervals, the ease will remain the same. All good. Sort of.

Typing Russian stress marks on macOS

While Russian text intended for native speakers doesn’t show accented vowel characters to point out the syllabic stress (ударение) , many texts intended for learners often do have these marks. But how to apply these marks when typing?

Typically, for Latin keyboards on macOS, you can hold down the key (like long-press on iOS) and a popup dialog will show you options for that character. But in the standard Russian phonetic keyboard it doesn’t work. Hold down the e key and you’ll get the option for the letter ë (yes, it’s regarded as a separate letter in Russian - the essential but misbegotten ë .)

Stripping surveillance parameters from Facebook and Google links

While largely opaque to most users, Facebook and Google massage any links that you acquire on their sites to include data used to track you around the web. This script attempts to strip these surveillance parameters from the URL’s. It is by no means all-inclusive. Imaginably, there are links that I haven’t yet encountered and that need to be considered in a future version. So consider this a proof-of-concept.

The problem

For example, I performed a Google search1 for “Smarties”. Inspecting the first link - to Wikipedia, I see:

Predictions 2021

Predictions for 2021

Humans are notoriously poor at assigning probabilities to events, even those that are highly relevant to their daily lives. This year I’m making a deliberate attempt to calibrate my prediction abilities by correlating predictions with reality. The judgments of truth of these outcomes will be made on December 31, 2021, although some of the outcomes will have been decided substantially in advance of that.

Coronavirus

  1. An effective vaccine will be widely available in Canada: 70%.
  2. I will have received a coronavirus vaccine: 65%
  3. I will have personally contracted coronavirus infection: 20%
  4. Someone in my household will have contracted coronavirus: 20%
  5. Schools in London-Middlesex will close due to coronavirus outbreak: 30%
  6. U.S. deaths from COVID-19 > 300,000: 60%
  7. YAPCA will resume in-person activities before end of term because of lifting coronavirus restrictions: 15%
  8. Violin lessons will resume in-person before the end of term because of lifting coronavirus restrictions: 20%
  9. Daily case counts exceed 30 on any day in 2021 for London-Middlesex: 50%.

Politics

  1. Joe Biden will be elected to the U.S. Presidency: 80%
  2. Donald Trump will officially concede the election if he is defeated: 10%
  3. The U.S. Senate will change to Democratic control: 60%
  4. The U.S. House of Representatives will remain in Democratic control: 99%
  5. Joe Biden will die or become impaired in office: 10%
  6. Florida’s electoral votes go to Biden: 45%
  7. Michigan’s electoral votes go to Biden: 50%
  8. Pennsylvania’s electoral votes go to Biden: 60%
  9. Ohio’s electoral votes go to Biden: 20%
  10. Wisonsin’s electoral votes go to Biden: 40%
  11. Arizona’s electoral votes go to Biden: 55%
  12. Lindsey Graham is defeated: 30%
  13. Mitch McConnell is defeated: 10%
  14. Susan Collins is defeated: 45%
  15. Results of election are known by November 5, 2020: 60%
  16. Donald Trump attends the Inauguration ceremonies: 20%
  17. Boris Johnson is still UK PM: 60%
  18. Justin Trudeau is still Canadian PM: 70%
  19. Queen Elizabeth dies: 20%
  20. Prince Philip dies: 30%
  21. Roe v. Wade is overturned: 10%
  22. Coney-Barrett is confirmed: 100%

Family

  1. [redacted]: 70%
  2. [redacted]: 50%
  3. [redacted]: 60%
  4. [redacted]: 80%
  5. [redacted]: 30%
  6. [redacted]: 50%
  7. We own a third dog: 25%
  8. [redacted]: 20%
  9. [redacted]: 20%
  10. Any member of our immediate family household travels on an airliner: 40%
  11. Audra has a new car: 25%
  12. [redacted]: 20%
  13. Interlochen holds in-person summer camp: 40%

Russian

  1. I complete Anki reviews on 100% of days: 70%
  2. I complete Anki reviews on at least 80% of days: 80%
  3. My tutor-rated speaking ability is improved by at least 25% on a 0-10 scale: 70%
  4. I’ve read at least 6 short stories in Russian: 25%
  5. I do prosody practice on at least 50% of days: 10%

Writing

  1. I write more than 5 articles on Suzuki Experience: 40%
  2. I write more than 12 articles on Ojisanseiuichi.com: 60%

Technology/Economy

  1. I purchase a new laptop: 15%
  2. I purchase a new cell phone: 10%
  3. I set up a VPN for privacy purposes: 65%
  4. I cancel my Facebook account: 20%
  5. I check Facebook less than twice a day on 80% of days: 90%
  6. I resume using Instagram: 20%
  7. I’m using a text editor other than Sublime or Atom: 50%
  8. I unblock Twitter: 10%
  9. DJIA closes above 30,000: 60% 10 I update to new major macOS version: 60%

Personal

  1. I workout on at least 80% of days: 20%
  2. I workout on at least 50% of days: 40%
  3. I workout on at least 25% of days: 50%
  4. I take an SSRI or related medication: 30%
  5. [redacted]: 60%
  6. I sit zazen on at least 80% of days: 10%
  7. I sit zazen on at least 50% of days: 30%
  8. I sit zazen on at least 25% of days: 40%
  9. I write 2021 goals: 95%
  10. I complete all 2021 goals: 10%
  11. I complete more than 50% of 2021 goals: 50%
  12. We begin kitchen renovation: 15%
  13. [redacted]: 60%
  14. I read more than 10 books: 20%
  15. I read more than 5 books: 90%
  16. I read more than 4 novels: 15%
  17. I travel anywhere on an airliner: 10%
  18. I install radio transceiver in back lock: 60%
  19. I install USB charger outlet behind office cabinet: 25%
  20. I can play Rachmaninoff partita transcription from memory: 30%