Automated marking of Russian syllabic stress

August 18, 2020

One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context.

I was delighted to discover Dr. Robert Reynolds' work on natural language processing of Russian text to mark stress based on grammatical analysis of the text. What follows is a brief description of the installation and use of this work. The project page on Github has installation instructions; but I found a number of items that needed to be addressed that were not covered there. For example, this project (UDAR) depends on Stanza; which in turn requires a language-specific (Russian) model.

Installation

The first step is to installation a few dependencies:

Install the pexpect module:

sudo pip3 install pexpect

Install stanza

sudo pip3 install stanza

Install Stanza’s Russian model:

#!/usr/local/bin/python3
import stanza
stanza.download('ru')

Note the my python3 is the Homebrew version; so your hashbang may be different.

The project depends on hfst¹ and vislcg3² which can be installed by downloading the following script, i.e.. I had to download the script and run it in CodeRunner.
Install udar:

sudo pip3 install --user git+https://github.com/reynoldsnlp/udar

Basic usage

See the project page on Github for more comprehensive details; but I was quickly able to create my own example following the documentation. For example:

#!/usr/local/bin/python3
import udar
doc1 = udar.Document('Моя собака внезапно прыгнула на стол.')
print(doc1.stressed())

which prints the correctly-marked Моя соба́ка внеза́пно пры́гнула на сто́л.

I’m looking forward to exploring the capabilities of this NLP tool further.

References

Reynolds, Robert J. “Russian natural language processing for computer-assisted language learning: capturing the benefits of deep morphological analysis in real-life applications” PhD Diss., UiT–The Arctic University of Norway, 2016. https://hdl.handle.net/10037/9685
UDAR - NLP system for applying syllabic stress markings

Helsinki Finite-State Transducer. ↩︎
Constraint grammar - implementation CG-3. ↩︎