Automated marking of Russian syllabic stress

One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context.

I was delighted to discover Dr. Robert Reynolds' work on natural language processing of Russian text to mark stress based on grammatical analysis of the text. What follows is a brief description of the installation and use of this work. The project page on Github has installation instructions; but I found a number of items that needed to be addressed that were not covered there. For example, this project (UDAR) depends on Stanza; which in turn requires a language-specific (Russian) model.

Installation

The first step is to installation a few dependencies:

  1. Install the pexpect module:
sudo pip3 install pexpect
  1. Install stanza
sudo pip3 install stanza
  1. Install Stanza’s Russian model:
#!/usr/local/bin/python3
import stanza
stanza.download('ru')

Note the my python3 is the Homebrew version; so your hashbang may be different.

  1. The project depends on hfst1 and vislcg32 which can be installed by downloading the following script, i.e.. I had to download the script and run it in CodeRunner.

  2. Install udar:

sudo pip3 install --user git+https://github.com/reynoldsnlp/udar

Basic usage

See the project page on Github for more comprehensive details; but I was quickly able to create my own example following the documentation. For example:

#!/usr/local/bin/python3
import udar
doc1 = udar.Document('Моя собака внезапно прыгнула на стол.')
print(doc1.stressed())

which prints the correctly-marked Моя соба́ка внеза́пно пры́гнула на сто́л.

I’m looking forward to exploring the capabilities of this NLP tool further.

References


  1. Helsinki Finite-State Transducer. ↩︎

  2. Constraint grammar - implementation CG-3. ↩︎