russian

Language word frequencies

Since one of the cornerstones of my approach to learning the Russian language has been to track how many words I’ve learned and their frequencies, I was intrigued by reading the following statistics today: The 15 most frequent words in the language account for 25% of all the words in typical texts. The first 100 words account for 60% of the words appearing in texts. 97% of the words one encounters in a ordinary text will be among the first 4000 most frequent words.

Detecting Russian letters with regex

How to identify Russian letters in a string? The short answer is: [А-Яа-яЁё] but depending on your regex flavor, [\p{Cyrillic}] might work. What in the word does this regex mean? It’s just like [A-Za-z] with a twist. The Ёё at the end adds support for ё (“yo”) which is in the Latin group of characters. See this question on Stack Overflow.