Stripping Russian stress marks from text from the command line
Russian text intended for learners sometimes contains marks that indicate the syllabic stress. It is usually rendered as a vowel + a combining diacritical mark, typically the combining acute accent \u301
. Here are a couple ways of stripping these marks on the command line:
First is a version using Perl
#!/bin/bash
f='покупа́ешья́';
echo $f | perl -C -pe 's/\x{301}//g;'
And then another using the sd
tool:
#!/bin/bash
f='покупа́ешья́';
echo $f | sd "\u0301" ""
Both rely on finding the combining diacritical mark and removing it with regex.
(Yes, I know the example word is nonsense Russian; it’s just meant to illustrate stripping the stress marks from any vowel.)