In my perpetual effort to get out of work, I’ve developed a suite of automation tools to help file statements that I download from banks, credit cards and others. While my setup described here is tuned to my specific needs, any of the ideas should be adaptable for your particular circumstances. For the purposes of this post, I’m going to assume you already have Hazel. None of what follows will be of much use to you without it.
In DEVONthink, I tag a lot. It’s an integral part of my strategy for finding things in my paperless environment. As I wrote about previously hierarchical tags are a big part of my organizational system in DEVONthink. For many years, I tagged subject matter with tags that emmanate from a single tag named topic_, but it was really an unnecessary top-level complication. So, the first item on my to-do list was to get rid of the all tags with a topic_ first level.
I have written previously about stripping syllabic stress marks from Russian text using a Perl-based regex tool. But I needed a means of doing in solely in Python, so this just extends that idea.
#!/usr/bin/env python3 def strip_stress_marks(text: str) -> str: b = text.encode('utf-8') # correct error where latin accented ó is used b = b.replace(b'\xc3\xb3', b'\xd0\xbe') # correct error where latin accented á is used b = b.replace(b'\xc3\xa1', b'\xd0\xb0') # correct error where latin accented é is used b = b.
For one-off projects that target Anki collections, I often use Python in a standalone application rather than an Anki add-on. Since I’m not going to distribute these little creations that are specific to my own needs, there’s no reason to create an add-on. These are just a few notes - nothing comprehensive - on the process.
One thing to be aware of is that there must be a perfect match between the Anki major and minor version numbers for the Python anki module to work.
This may be obvious to some, but visually-recognizing character encoding at a glance is not always obvious.
For example, pronunciation files downloaded form Forvo have the following appearance:
pronunciation_ru_оÑбÑвание.mp3
How can we extact the actual word from this gibberish? Optimally, the filename should reflect that actual word uttered in the pronunciation file, after all.
Step 1 - Extracting the interesting bits The gibberish begins after the pronunciation_ru_ and ends before the file extension.
I’ve written a lot about applying and removing syllabic stress marks in Russian text because I use it a lot when making Anki cards.
This iteration is a command line tool for applying the stress mark at a particular character index. The advantage of these little shell tools is that they can be composable, integrating into different tools as the need arises.
#!/usr/local/bin/zsh while getopts i:w: flag do case "${flag}" in i) index=${OPTARG};; w) word=${OPTARG};; esac done if [ $word ]; then temp=$word else read temp fi outword="" for (( i=0; i<${#temp}; i++ )); do thischar="${temp:$i:1}" if [ $i -eq $index ]; then thischar=$(echo $thischar | perl -C -pe 's/(.
Introducing sterilize-ng [GitHub link] - a URL sterilizer made to work flexibily on the command line.
Background The surveillance capitalist economy is built on the relentless tracking of users. Imagine going about town running errands but everywhere you go, someone is quietly following you. When you pop into the grocery, they examine your receipt. They look into the bags to see what you bought. Then they hop in the car with you and keep careful records of where you go, how fast you drive, whom you talk with on the phone.
One of the things that I love about Keyboard Maestro is the ability to chain together disparate technologies to achieve some automation goal on macOS.
In most of my previous posts about Keyboard Maestro macros, I’ve used Python or shell scripts, but I decided to draw on some decades-old experience with Perl to do a little text processing for a specific need.
Background I want this text from Wiktionary:
to look like this:
Russian text intended for learners sometimes contains marks that indicate the syllabic stress. It is usually rendered as a vowel + a combining diacritical mark, typically the combining acute accent \u301. Here are a couple ways of stripping these marks on the command line:
First is a version using Perl
#!/bin/bash f='покупа́ешья́'; echo $f | perl -C -pe 's/\x{301}//g;' And then another using the sd tool:
#!/bin/bash f='покупа́ешья́'; echo $f | sd "\u0301" "" Both rely on finding the combining diacritical mark and removing it with regex.
It seems like the command line is one of those places where you can accomplish crazy efficient things with one-liners.
Here’s a perfect use case for a CLI one-liner:
In Anki, I often add lists of synonyms and antonyms to my vocabulary cards, but I like them formatted as a bulleted list. My usual route to that involves Markdown. But how to convert this:
известный, точный, определённый, достоверный
to
известный
- точный
- определённый
- достоверный
After trying to come up with a single text replacement strategy to make this work, the best I could do was this: