Removing stress marks from Russian text
Previously, I wrote about adding syllabic stress marks to Russian text. Here’s a method for doing the opposite - that is, removing such marks (ударение) from Russian text.
Although there may well be a more sophisticated approach, regex is well-suited to this task. The problem is that
def string_replace(dict,text):
sorted_dict = {k: dict[k] for k in sorted(dict)}
for n in sorted_dict.keys():
text = text.replace(n,dict[n])
return text
dict = { "а́" : "а", "е́" : "е", "о́" : "о", "у́" : "у",
"я́" : "я", "ю́" : "ю", "ы́" : "ы", "и́" : "и",
"ё́" : "ё", "А́" : "А", "Е́" : "Е", "О́" : "О",
"У́" : "У", "Я́" : "Я", "Ю́" : "Ю", "Ы́" : "Ы",
"И́" : "И", "Э́" : "Э", "э́" : "э"
}
print(string_replace(dict, "Существи́тельные в шве́дском обычно де́лятся на пять склоне́ний."))
This should print: Существительные в шведском обычно делятся на пять склонений.