Automated marking of Russian syllabic stress

One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context.

I was delighted to discover Dr. Robert Reynolds’ work on natural language processing of Russian text to mark stress based on grammatical analysis of the text. What follows is a brief description of the installation and use of this work. The project page on Github has installation instructions; but I found a number of items that needed to be addressed that were not covered there. For example, this project (UDAR) depends on Stanza; which in turn requires a language-specific (Russian) model.

sed matching whitespace on macOS

sed is such a useful pattern-matching and substitution tool for work on the command line. But there’s a little quirk on macOS that will trip you up. It tripped me up. On most platforms, \s is the character class for whitespace. It’s ubiquitous in regexes. But on macOS, it doesn’t work. In fact, it silently fails.

Consider this bash one-liner which looks like it should work but doesn’t:

# should print I am corrupt (W.Barr)
# instead it prints I am corrupt by W.Barr
echo "I am corrupt by W.Barr" | sed -E 's|^(.+)\sby\s(.+)|\1 (\2)|g'

What does work is the character class [:space:]:

# prints I am corrupt (W.Barr)
echo "I am corrupt by W.Barr" | sed -E 's|^(.+)[[:space:]]by[[:space:]](.+)|\1 (\2)|g'

Or just a space without a character class seems to work:

# prints I am corrupt (W.Barr)
sed -E 's|^(.+) by (.+)|\1 (\2)|g'

The [:blank:] character class works also:

sed -E 's|^(.+)[[:blank:]]by[[:blank:]](.+)|\1 (\2)|g'

Bracket expressions in sed

It turns out that if you RTFM for sed, the explanation is clear. There are several character classes documented in the sed manual and each must be enclosed in brackets []. Pertinent to our issue, the [:space:] character class matches the following: tab, newline, vertical tab, form feed, carriage return, and space. On the other hand [:blank:] is more restrictive, matching only space and tab. The manual is definitely worth looking at because there are other metacharacter classes that are simply not available. For example \w is unusable, requiring [:alnum:] instead, as in:

# prints foobar
echo "foo        bar" | sed -E 's|^([[:alnum:]]+)[[:space:]]+([[:alnum:]]+)$|\1\2|g'

References

  • macOS man page for sed - no mention of \s though.
  • This question about whitespace and sed on Superuser is worth reviewing.
  • The sed manual section on character classes and bracket expressions is a must-read. (Or the contents page of the sed manual.)

Partitioning a large directory into subdirectories by size

Since I’m not fond of carrying around all my photos on a cell phone where they’re perpetually at list of loss, I peridiocally dump the image and video files to a drive on my desktop for later burning to optical disc.1 Saving these images in archival form is a hedge against the bet that my existing backup system won’t fail someday.

I’m using Blue-Ray optical discs to archive these image and video files; and each stores 25 GB of data. So my plan was to split the iPhone image dump into 24 GB partitions. H

More chorus repetition macros for Audacity

In a previous post I described macros to support certain tasks in generating source material for L2 chorus repetition practice. Today, I’ll describe two other macros that automate this practice by slowing the playback speed of the repetition.

Background

I’ve described the rationale for chorus repetition practice in previous posts. The technique I describe here is to slow the sentence playback speed to give the learner time to build speed by practicing slower repetitions. By applying the Change Tempo... effect^[Change tempo effect in the Audacity manual] in Audacity. In my own practice, I will often begin complex Russian sentences at -50% speed and progress to -25% speed before practicing the pronunciation at native-level speed. By practicing at slow speeds, it gives the learner time to appreciate how syllables are connected to each other. The prosody is more apparent.

Audacity macros to support chorus repetition practice

Achieving fluid, native-quality speech in a second language is difficult task for adult learners. For several years, I’ve used Dr. Olle Kjellin’s method of “chorus repetition” for my Russian language study. In this post, I’m presenting a method for scripting Audacity to facilitate the development of audio source material to support his methodology.

Background

For detailed background on the methodology, I refer you to Kjellin’s seminal paper “Quality Practise Pronunciation with Audacity - The Best Method!” on the subject of chorus repetition practice. The first half of the paper outlines the neurophysiologic rational for the method and the second half describes the practical use of the cross-platform tool Audacity to generate source material for this practice.

Scripting Apple Music on macOS for chorus repetition practice

This is an update to my previous post on automating iTunes on macOS to support chorus repetition practice. You can read the original post for the theory behind the idea; but in short, one way of developing prosody and quality pronunciation in a foreign language is to do mass repetitions in chorus with a recording of a native speaker.

Because in macOS 10.15, iTunes is no more, I’ve updated the script to work with the new Music app. It turns out that it’s a lot simpler. No need to dive into the application classes.

A meritocracy reading list

Meritocracy has been on everyone’s minds lately, it seems. Reading Daniel Markovits’ “The Meritocracy Trap,” I was fully ready to condemn the concept completely. I may be still; but I need to take a moment to think about it more fully.

Here’s the problem with condemning meritocracy outright: if we look at ability on a case-by-case basis, would you rather a well-trained, accomplished pilot or a mediocre one? Would you rather go to a concert performed by a scratchy third-rate violinist or someone whose pedigree includes Juilliard, Curtis, or the like? Maybe the problem with meritocracy is simply that it doesn’t scale well in capitalist markets. (Don’t hold me to that idea; I’m not quite ready to embrace it fully.) In the process of scaling to the level of a large society, does any inherent rightness of merit confer a right to so distort the economic life of a country that only narrower and narrower slices of it garner larger and larger portions of the economic output?

A folder-based image gallery for Hugo

Hugo is the platform I use to publish this weblog. Occasionally I have the need to include a collection of images in a post. Mostly this comes up on other sites that I publish. Fancybox can do this; but it wasn’t immediately clear how to direct Fancybox to create a gallery of images in a page based on images in a directory. Previously, I’ve solved this in different ways, but I was anxious to find a simple shortcode-based method.

An alternative method for keyboard input switching on macOS

macOS offers a variety of virtual keyboard layouts which are accessible through System Preferences > Keyboard > Input Sources. Because I spend about half of my time writing in Russian and half in English, rapid switching between keyboard layouts is important. Optionally in the Input Sources preference pane, you can choose to use the Caps lock key to toggle between sources. This almost always works well with the exception of Anki. Presumably Anki’s non-standard text management system thwarts the built-in Caps Lock/toggle mechanism for reasons that are not clear to me. Equally unclear is why this worked previously but now does not. I’ve not updated either Anki or the system software. It’s a mystery. Nonetheless, began to search for an alternative method for switching between keyboard layout switching. What I developed relies on several tools: