Registering a custom collation to query Anki database
While working on a project that requires querying the Anki database directly outside of the Anki desktop application, I encountered an interesting issue with sqlite3
collations. This is is just a short post about how I went about registering a collation in order to execute SQL queries against the Anki db.
The problem
Let’s try a simple query. Open the Anki database:
sqlite3 collection.anki2
SQLite version 3.43.2 2023-10-10 13:08:14
Enter ".help" for usage hints.
Great, we’re in. Now let’s see what our decks are in this collection:
sqlite> SELECT name FROM decks;
Default
That works. But now let’s try a WHERE
clause:
sqlite> SELECT name FROM decks WHERE name != "dogfish";
Parse error: no such collation sequence: unicase
This is the crux of the problem we need to address here. We can use a built-in collation such as NOCASE
and it works in some, but not all cases. For example:
sqlite> SELECT name FROM decks WHERE name COLLATE NOCASE != "dogfish";
Default
But if we added a deck to our collection called Ёлка, then we can demonstrate how that fails. Since the е
character can be interpreted as either ё
or е
this is an opportunity for the straight NOCASE
collation to fail. Let’s see if it does:
sqlite> SELECT id, name FROM decks;
1|Default
1735994559972|Ёлка
So far, so good, but we aren’t doing in text comparisons yet. How about:
sqlite> SELECT name FROM decks WHERE name COLLATE NOCASE = "елка";
# crickets...
This is why we need a custom Unicode-sensitive collation when performing certain text comparisons in queries against the Anki database. Before outlining such a collation, let’s take one step back and discuss what sqlite3
collations are.
A brief introduction to SQLite collations
Collations are just rules about the ordering of text. Does a come before or after b? However the built-in collations NOCASE
, RTRIM
, and BINARY
aren’t sufficient for dealing with comparisons in diverse internationalized text. This is why we often need to provide a custom collation.
A custom unicase
collation for Rust
Since the project I’m working on is written in Rust, I’ll show you how to register a custom unicase
collation in Rust, but it can easily be adapted to other languages as well through their own sqlite3 connector.
The Cargo.toml
file will need to include unicase = "2.6.0"
(or whatever version of that module is current.)
To register the collation:
fn register_unicase_collation(conn: &Connection) -> Result<()> {
conn.create_collation("unicase", |s1: &str, s2: &str| {
let s1_key = UniCase::new(s1);
let s2_key = UniCase::new(s2);
s1_key.cmp(&s2_key)
})?;
Ok(())
}
and to open the database with this collation:
fn open_database_with_collation(db_path: &str) -> Result<Connection> {
let conn = Connection::open(db_path)?;
register_unicase_collation(&conn)?;
Ok(conn)
}
Now to open our database, it’s just:
let conn = open_database_with_collation(self.db_path.to_str().unwrap())?;
What Anki tables use the unicase
collation?
To answer that question, we can query the schemas:
#!/bin/bash
if [ "$#" -ne 1 ]; then
echo "Usage: $0 database_path"
exit 1
fi
DB=$1
sqlite3 "$DB" "
SELECT m.tbl_name as 'Table',
group_concat(p.name) as 'Columns with UNICASE'
FROM sqlite_master m
JOIN pragma_table_info(m.tbl_name) p
WHERE m.sql LIKE '%COLLATE unicase%'
AND p.type LIKE 'text%'
AND m.sql LIKE '%' || p.name || ' text%COLLATE unicase%'
GROUP BY m.tbl_name;" -box
which gives us:
┌─────────────┬──────────────────────┐
│ Table │ Columns with UNICASE │
├─────────────┼──────────────────────┤
│ deck_config │ name │
│ decks │ name │
│ fields │ name │
│ notetypes │ name │
│ tags │ tag │
│ templates │ name │
└─────────────┴──────────────────────┘
Whenever we are executing queries that involve comparisons that involve these tables and columns, we need to employ the unicase
collation that we have register, e.g.:
// SQL query with the `COLLATE unicase` collation
let query = "
SELECT name
FROM decks
WHERE name COLLATE unicase LIKE '%' || ?1 || '%'
ORDER BY name COLLATE unicase;
";
// Open the database and register the `unicase` collation
let conn = open_database_with_collation(self.db_path.to_str().unwrap())?;
let mut stmt = conn.prepare(query)?;
let matching_decks = stmt
.query_map(params![deck_name], |row| row.get(0))?
.collect::<Result<Vec<String>, _>>()?;