programming

There are a couple features of the Dataview plugin for Obsidian that aren’t documented and are potentially useful. For the start of the week, use date(sow) and for the end of the week date(eow). Since there’s no documentation as of yet, I’ll venture a guess that they are locale-dependendent. For me (in Canada), sow is Monday. Since I do my weekly notes on Saturday, I have to subtract a couple days to point to them.

Most language learners are familiar with Forvo, a site that allows users to download and contribute pronunciations for words and phrases. For my Russian studies, I make daily use of the site. In fact, to facilitate my Anki card-making workflow, I am a paid user of the Forvo API. But that’s where the trouble started. When the Forvo API works, it works OK, often extremely slow. But lately, it has been down more than up.

Recently, someone asked a question on r/Anki about changing and existing cloze-type note to a regular note. Part of the solution involves stripping the cloze markup from the existing cloze’d field. A cloze sentence has the form Play {{c1::studid}} games. or Play {{c1::stupid::pejorative adj}} games. To handle both of these cases, the following regular expression will work. Just substitute for $1. {{c\d::([^:}]+)(?:::+[^}])}} However, the Cloze Anything markup is different. It uses ( and ) instead of curly braces.

I was using a REST API at https://textance.herokuapp.com/title but it seems awfully fragile. Sure enough this morning, the entire application is down. It’s also not open-source and I have no idea who actually runs this thing. Here’s the solution: #!/bin/bash url=$(pbpaste) curl $url -so - | pup 'meta[property=og:title] attr{content}' It does require pup. On macOS, you can install via brew install pup. There are other ways using regular expressions but no dependency on pup but parsing HTML with regex is not such a good idea.

Making the Hugo → S3 upload process much more efficient by tracking file hashes.

Dealing with PUNCT nodes in interlinear glossing.

Still thinking about interlinear glossing for my language learning project. The leizig.js library is great but my use case isn’t really what the author had in mind. I really just need to display a unit consisting of the word as it appears in the text, the lemma for that word form, and (possibly) the part of speech. For academic linguistics purposes, what I have in mind is completely non-standard. The other issue with leizig.

leipzig.js is a library for applying interlinear gloss to texts for linguistic analysis. In this post, I experiment a little with this libary to evaluate whether it would work for a little project of mine.

Starting a new devlog about Hedghog, a new language learning app and some thoughts about the interlinear display of lemmas.

Splitting text into sentences is one of those tasks that looks simple but on closer inspection is more difficult than you think. A common approach is to use regular expressions to divide up the text on punction marks. But without adding layers of complexity, that method fails on some sentences. This is a method using spaCy.

programming

Week functions in Dataview plugin for Obsidian

Scraping Forvo pronunciations

A regex to remove Anki's cloze markup

Extracting title title of a web page from the command line

Hugo static site upload woes and a way forward

Interlinear glossing dealing with punctuation

Three-line (though non-standard) interlinear glossing

Experimenting with leipzig.js for interlinear gloss

Hedghog and interlinear lemmas

Splitting text into sentences: Russian edition