Registering a custom collation to query Anki database

While working on a project that requires querying the Anki database directly outside of the Anki desktop application, I encountered an interesting issue with sqlite3 collations. This is is just a short post about how I went about registering a collation in order to execute SQL queries against the Anki db.

CAUTION
Before you run anything that accesses the Anki database, you absolutely should backup your database first.

The problem

Let’s try a simple query. Open the Anki database:

sqlite3 collection.anki2
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.

Great, we’re in. Now let’s see what our decks are in this collection:

sqlite> SELECT name FROM decks;
Default

That works. But now let’s try a WHERE clause:

sqlite> SELECT name FROM decks WHERE name != "dogfish";
Parse error: no such collation sequence: unicase

This is the crux of the problem we need to address here. We can use a built-in collation such as NOCASE and it works in some, but not all cases. For example:

sqlite> SELECT name FROM decks WHERE name COLLATE NOCASE != "dogfish";
Default

But if we added a deck to our collection called Ёлка, then we can demonstrate how that fails. Since the е character can be interpreted as either ё or е this is an opportunity for the straight NOCASE collation to fail. Let’s see if it does:

sqlite> SELECT id, name FROM decks;
1|Default
1735994559972|Ёлка

So far, so good, but we aren’t doing in text comparisons yet. How about:

sqlite> SELECT name FROM decks WHERE name COLLATE NOCASE = "елка";
# crickets...

This is why we need a custom Unicode-sensitive collation when performing certain text comparisons in queries against the Anki database. Before outlining such a collation, let’s take one step back and discuss what sqlite3 collations are.

A brief introduction to SQLite collations

Collations are just rules about the ordering of text. Does a come before or after b? However the built-in collations NOCASE, RTRIM, and BINARY aren’t sufficient for dealing with comparisons in diverse internationalized text. This is why we often need to provide a custom collation.

A custom unicase collation for Rust

Since the project I’m working on is written in Rust, I’ll show you how to register a custom unicase collation in Rust, but it can easily be adapted to other languages as well through their own sqlite3 connector.

The Cargo.toml file will need to include unicase = "2.6.0" (or whatever version of that module is current.)

To register the collation:

fn register_unicase_collation(conn: &Connection) -> Result<()> {
    conn.create_collation("unicase", |s1: &str, s2: &str| {
        let s1_key = UniCase::new(s1);
        let s2_key = UniCase::new(s2);
        s1_key.cmp(&s2_key)
    })?;
    Ok(())
}

and to open the database with this collation:

fn open_database_with_collation(db_path: &str) -> Result<Connection> {
    let conn = Connection::open(db_path)?;
    register_unicase_collation(&conn)?;
    Ok(conn)
}

Now to open our database, it’s just:

let conn = open_database_with_collation(self.db_path.to_str().unwrap())?;

What Anki tables use the unicase collation?

To answer that question, we can query the schemas:

#!/bin/bash

if [ "$#" -ne 1 ]; then
    echo "Usage: $0 database_path"
    exit 1
fi

DB=$1

sqlite3 "$DB" "
SELECT m.tbl_name as 'Table',
       group_concat(p.name) as 'Columns with UNICASE'
FROM sqlite_master m
JOIN pragma_table_info(m.tbl_name) p
WHERE m.sql LIKE '%COLLATE unicase%'
  AND p.type LIKE 'text%'
  AND m.sql LIKE '%' || p.name || ' text%COLLATE unicase%'
GROUP BY m.tbl_name;" -box

which gives us:

┌─────────────┬──────────────────────┐
│    Table    │ Columns with UNICASE │
├─────────────┼──────────────────────┤
│ deck_config │ name                 │
│ decks       │ name                 │
│ fields      │ name                 │
│ notetypes   │ name                 │
│ tags        │ tag                  │
│ templates   │ name                 │
└─────────────┴──────────────────────┘

Whenever we are executing queries that involve comparisons that involve these tables and columns, we need to employ the unicase collation that we have register, e.g.:

// SQL query with the `COLLATE unicase` collation
let query = "
            SELECT name
            FROM decks
            WHERE name COLLATE unicase LIKE '%' || ?1 || '%'
            ORDER BY name COLLATE unicase;
        ";

// Open the database and register the `unicase` collation
let conn = open_database_with_collation(self.db_path.to_str().unwrap())?;
let mut stmt = conn.prepare(query)?;

let matching_decks = stmt
    .query_map(params![deck_name], |row| row.get(0))?
    .collect::<Result<Vec<String>, _>>()?;