Automated marking of Russian syllabic stress
One of the challenges that Russian learners face is the placement of syllabic stress, an essential determinate of pronunciation. Although most pedagogical texts for students have marks indicating stress, practically no tests intended for native speakers do. The placement of stress is inferred from memory and context.
I was delighted to discover Dr. Robert Reynolds' work on natural language processing of Russian text to mark stress based on grammatical analysis of the text. What follows is a brief description of the installation and use of this work. The project page on Github has installation instructions; but I found a number of items that needed to be addressed that were not covered there. For example, this project (UDAR) depends on Stanza; which in turn requires a language-specific (Russian) model.
Installation
The first step is to installation a few dependencies:
- Install the pexpect module:
sudo pip3 install pexpect
- Install stanza
sudo pip3 install stanza
- Install Stanza’s Russian model:
#!/usr/local/bin/python3
import stanza
stanza.download('ru')
Note the my python3 is the Homebrew version; so your hashbang may be different.
-
The project depends on hfst1 and vislcg32 which can be installed by downloading the following script, i.e.. I had to download the script and run it in CodeRunner.
-
Install udar:
sudo pip3 install --user git+https://github.com/reynoldsnlp/udar
Basic usage
See the project page on Github for more comprehensive details; but I was quickly able to create my own example following the documentation. For example:
#!/usr/local/bin/python3
import udar
doc1 = udar.Document('Моя собака внезапно прыгнула на стол.')
print(doc1.stressed())
which prints the correctly-marked Моя соба́ка внеза́пно пры́гнула на сто́л.
I’m looking forward to exploring the capabilities of this NLP tool further.
References
- Reynolds, Robert J. “Russian natural language processing for computer-assisted language learning: capturing the benefits of deep morphological analysis in real-life applications” PhD Diss., UiT–The Arctic University of Norway, 2016. https://hdl.handle.net/10037/9685
- UDAR - NLP system for applying syllabic stress markings
-
Helsinki Finite-State Transducer. ↩︎
-
Constraint grammar - implementation CG-3. ↩︎