Using the Raspberry Pi to communicate over the I2C bus using C

I recently wrote about using the excellent bcm2835 library to communicate with peripheral devices over the SPI bus using C. In this post, I’ll talk about using the same library to communicate over the I2C bus. Nothing particularly fancy, but you’ll need to pay careful attention to the datasheet of the device we’re using. TheTSL2561 is a sophisticated little light sensor that has a very high dynamic range and is available on a breakout board from Adafruit. I’m not going to delve into the hookup of this device as you can take a look at the Adafruit tutorial for that. Note that we’re not going to use their library. (Well, I borrowed a bunch of their #define statements for device constants.)

TSL2561 functions

The TSL2561 has two analog-digitial (ADC) channels. Channel 0 responds to broad spectrum visible and IR wavelengths, whereas channel 1 responds to IR only. For most applications, you’ll address channel 0.

TSL2561 I2C interface

The TSL2561 datasheet is a little confusing because the device family also uses the SMBus and the format differences get lost between the text and the figures. The bottom line with the TSL2561 is that if you want to read a register, you write to the COMMAND register, then read a byte. It’s important to understand how the COMMAND register is configured so that you can read and write to the appropriate registers. Here is the COMMAND register format:

Note that the CMD bit (7) must always be set. For ordinary read/write operations, we’ll leave the CLEAR, WORD, and BLOCK bits unset. The remaining 3:0 ADDRESS bits specify the register that we are addressing. The registers are found in Table 2, reproduced below:

Editorial note: don’t be tempted to figure out the bits and encode the command yourself. Always use symbolic references for bit positions. By using symbolic references to bit positions and register addresses you will make your code much more readable. If you configure the COMMAND register as 0x8A, then I have convert the hex to binary and refer back to the datasheet to understand what you’re trying to do. On the other hand, if you configure the command as TSL2561_COMMAND_BIT | TSL2561_REGISTER_ID then I can immediately see you are addressing the ID register.

Sample code

I will go through a working example section by section and provide a github link at the end where you can grab the entire code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
char buf[3];
uint8_t err;

printf("Running ... \n");

if (!bcm2835_init())
{
printf("bcm2835_init failed. Are you running as root??\n");
return 1;
}

if (!bcm2835_i2c_begin())
{
printf("bcm2835_i2c_begin failed. Are you running as root??\n");
return 1;
}

In our main function, we begin by declaring variables we’ll need later and call two important functions on the bcm2835 library: bcm2835_init() and bcm2835_i2c_begin(). The former sets up our library and from the documentation:

Initialises the library by opening /dev/mem (if you are root) or /dev/gpiomem (if you are not) and getting pointers to the internal memory for BCM 2835 device registers. You must call this (successfully) before calling any other functions in this library (except bcm2835_set_debug). If bcm2835_init() fails by returning 0, calling any other function may result in crashes or other failures. If bcm2835_init() succeeds but you are not running as root, then only gpio operations are permitted, and calling any other functions may result in crashes or other failures.

bcm2835 library I2C module

The latter starts I2C operations by forcing P1-03 (SDA) and P1-05 (SCL) to their alternate function ALT0 thereby enabling them for I2C use. After all I2C operations are done, the program should call bcm2835_i2c_end() to return those pins to their regular functions. Note that for the purposes of this demonstration, I check all of the return codes and printf an informative messages. In a robust application we would want to deal with this in a more fault-tolerant way.

Next we’ll set up some features of the bus:

1
2
bcm2835_i2c_setSlaveAddress(TSL2561_ADDR_FLOAT);
bcm2835_i2c_setClockDivider(BCM2835_I2C_CLOCK_DIVIDER_150);

After that, we ready to work with the device. Let’s begin with a simple reading of the ID register. To simplify matters, we’ll create a reusable function readRegister():

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
uint8_t readRegister(uint8_t reg, uint8_t *fail) {
uint8_t b[2];
b[0] = TSL2561_COMMAND_BIT | reg;
int err = bcm2835_i2c_write(b,1);
if( err != BCM2835_I2C_REASON_OK ) {
printf("Unable to write command register %02x\n",err);
*fail = 1; return 1;
}
err = bcm2835_i2c_read(b,1);
if( err != BCM2835_I2C_REASON_OK ) {
printf("Unable to read last command response %02x\n",err);
*fail = 1; return 1;
}
*fail = 0;
return b[0];
}

When we want to read a register, we just need to pass the address of the register and a pointer to a uint8_t in which we’ll return the status (0 for success and 1 for failure.) Why don’t we just return a status? It’s becuase we’re already returning the results of the read. When the caller passes the address of a status variable, we can fill it, and the caller just looks at it afterwards.

In lines 2-3, we are building the COMMAND “register” value to send. Because the datasheet says to set the CMD bit, we do that. Then we logical OR the address into bits 3:0. Then we write the COMMAND register to the device and read a byte. Remember that we’ve already set the hardware address previously.

So calling readRegister() to read the hardware ID will look like:

1
2
3
4
5
6
7
//	Read the ID register

uint8_t id = readRegister(TSL2561_REGISTER_ID, &err);
if( err == 1) {
printf("Check ID register failed.\n"); return 1;
}
printf("The ID is %02x.\n",id);

We can do something similar to read another register, such as the TIMING register 0x01h:

1
2
3
4
5
6
7
8
//	Read the timing register

uint8_t tr = readRegister(TSL2561_REGISTER_TIMING,&err);
if(err == 1) {
printf("Check timing register failed.\n");
return 1;
}
printf("The timing register is %02x.\n",tr);

On my device I get a value of 0x03 which is the default power-up value according to the datasheet.

Now we need to get down to the business of writing to a register. Since we have to explicitly turn on the ADC, we’ll have to write to a control register. A generic writeRegister() should help with this. Again our design uses a pointer to a uint8_t to return the status. We don’t have to do this because a write operation has no useful return, but for API symmetry, I wrote the function the same way.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
void writeRegister(uint8_t reg, uint8_t val, uint8_t *fail) {
uint8_t b[2];
b[0] = TSL2561_COMMAND_BIT | reg;
int err = bcm2835_i2c_write(b,1);
if( err != BCM2835_I2C_REASON_OK ) {
printf("Unable to write command register %02x\n",err);
*fail = 1; return;
}
b[0] = val;
err = bcm2835_i2c_write(b,1);
if( err != BCM2835_I2C_REASON_OK ) {
printf("Unable to write command register %02x\n",err);
*fail = 1; return;
}
err = bcm2835_i2c_read(b,1);
if( err != BCM2835_I2C_REASON_OK ) {
printf("Unable to read following write command register %02x\n",err);
*fail = 1; return;
}
*fail = 0;
return;
}

Writing to a register is similar to reading except that after addressing the register, we have to send it some data in a subsequent write operation. Following those two operations, we have an obligatory read and move on.

Lines 3-9 address the COMMAND register as we did before. Lines 9-14 write the caller’s specified value to the address specified in the preceding COMMAND call. Then a read that we can disregard and return to the caller.

Turn on the ADC

Turning on the ADC couldn’t be easier; we just need to address the CONTROL register 0x00. The CONTROL register documentation tells us that we simply need to set the POWER bits (1:0) to 0x03 to power up the device or 0x00 to power it down.

Doing that in code using our generic write function couldn’t be simpler:

1
2
3
4
writeRegister(TSL2561_REGISTER_TIMING,TSL2561_CONTROL_POWERON, &err );
if( err == 1 ) {
printf("Unable to power on the TSL2561.\n"); return 1;
}

Take a broad spectrum reading on Channel 0

Now we come to the reason we started working with the device, to take a light measurement. We’re going to focus on the visible + IR channel (Channel 0) but the same principles apply to either channel. We’re just going to do sequential reads from the two channel 0 registers and assemble the result:

1
2
3
4
5
6
7
8
9
10
uint8_t LSB0 = readRegister(TSL2561_REGISTER_CHAN0_LOW, &err);
if( err == 1 ) {
printf("Unable to read LSB0\n"); return 1;
}
uint8_t MSB0 = readRegister(TSL2561_REGISTER_CHAN0_HIGH, &err);
if( err == 1 ) {
printf("Unable to read MSB0\n"); return 1;
}
int lux = (int)(MSB0 << 8) | (int)LSB0;
printf("Light value is %d lux.\n",lux);

There’s a lot more that we could cover, both about the operation of the device and about using I2C on the Raspberry Pi in general, but this should be enough to get you started with luminosity measurement using the TSL2561 or in beginning to code your own I2C interfaces using the BCM2835 library on the Raspberry Pi.

References

Implementing ADC using Raspberry Pi and MCP3008

Several years ago I wrote about adding analog-to-digital capabilities to the Raspberry Pi. At that time, I used an ATtinyx61 series MCU to provide ADC capabilities, communicating with the RPi via an I2C interface. In retrospect it was much more complicated than necessary. What follows is an attempt to re-do that project using an MCP3008, a 10 bit ADC that communicates on the SPI bus.

MCP3008 device

The MCP3008 is an 8-channel 10-bit ADC with an SPI interface[1]. It has a 4 channel cousin, the MCP3004 that has similar operating characteristics. The device is capable of performing single-ended or differential measurements. For the purposes of this write-up, we’ll only concern ourselves with single-ended measurement. A few pertinent details about the MCP3008:

  • It is capable of conversion rates of around 200 kilosamples per second.
  • It operates on SPI modes 0,0 or 1,1[2]

If you have done any work with SPI, you’ll know that there are 4 signals. MOSI stands for “master out, slave in” whereas MISO stands for “master in, slave out”. The two other signals are the clock which provides a time standard for the digital transaction and the SS (slave select), also called CE (chip enable) or CS (chip select.)

SPI communication in 8-bit read/write frames

In this example, we are going to use an SPI library to communicate with the MCP3008 in 8-bit frames, so the pertinent section of the datasheet is on page 21, section 6.1 Using the MCP3004/MCP3008 with Microcontroller (MCU) SPI Ports. The Figure 6-1 (reproduced below) shows how we will go about communicating with the device over the SPI bus.

From the communication diagram above, we get an excellent overview of the entire transaction. First, we must drop CS to initiate the transaction. With the CS low, we begin clocking in and out data. Figure 6-1 shows that we must clock in a single start bit (0x01) followed by mode and channel select bits. Table 5-2 shows the configuration bits that we must clock-in to return an ADC reading.

For example, if we wish to make a single-ended reading on channel 0, we must clock in the bits 1000. Note from figure 6-1, we must shift the bits by 4 binary places, so that for a single-ended reading from channel 0, we would clock in 0b1000000 or 0x80.

Software implementation

I chose to implement this in C rather than Python this time. There are a handful of libraries for the BCM2835. I used the bcm2835 library which is excellent. It is low-level enough that I can what’s going on, but not completely “bare metal” programming. You can find out more about the spi module of this library.

I will start with the code section-by-section then provide a link to the entire source code. First, of course, you’ll need to install the library. You can find a version-agnostic install script here. I used it; it works.

First, we’ll include a couple libraries that we need, and set up three constants. The first is the 0b00000001 that we need to transfer as the start bit. The second is the end bits 0b00000000 that we clock in to the MCP3008 so that we can clock out 8 bits of the ADC value. Finally, since I set up my test circuit to measure on channel 0, I just define a constant for that.

1
2
3
4
5
6
#include <stdio.h>
#include <bcm2835.h>

uint8_t start = 0x01;
uint8_t end = 0x00;
uint8_t chan = 0x00;

Next I declare my function prototypes. Just C business as usual.

1
2
int readADC(uint8_t chan);
float volts_adc(int adc);

In the body of main, I start by testing whether I can initiate the SPI interface on the Pi:

1
2
3
4
5
6
7
8
9
10
11
if (!bcm2835_init())
{
printf("bcm2835_init failed. Are you running as root??\n");
return 1;
}

if (!bcm2835_spi_begin())
{
printf("bcm2835_spi_begin failed. Are you running as root??\n");
return 1;
}

If we pass those tests, we’re ready to go. Let’s set up the interface.

1
2
3
4
5
bcm2835_spi_setBitOrder(BCM2835_SPI_BIT_ORDER_MSBFIRST);      // The default
bcm2835_spi_setDataMode(BCM2835_SPI_MODE0); // The default
bcm2835_spi_setClockDivider(BCM2835_SPI_CLOCK_DIVIDER_65536); // The default
bcm2835_spi_chipSelect(BCM2835_SPI_CS0); // The default
bcm2835_spi_setChipSelectPolarity(BCM2835_SPI_CS0, LOW); // the default

To read the ADC value, we have to prepare the bytes that we’ll clock in first. All of that is done in a function readADC.

1
2
3
4
5
6
int readADC(uint8_t chan){
char buf[] = {start, (0x08|chan)<<4,end};
char readBuf[3];
bcm2835_spi_transfernb(buf,readBuf,3);
return ((int)readBuf[1] & 0x03) << 8 | (int) readBuf[2];
}

It looks like there’s a lot going on here, but basically we are performing bit manipulations to get the input bits in the right order and the same for the output bits. First we declare an output buffer buf[] whose contents are three bytes. The first is the start bit 0b00000001, followed by the mode selections bytes, and terminated by a junk byte so that we can finish clocking out the resulting data. How do we interpret the value of (0x08|chan)<<4? Start from the inside of the parenthesis. 0x08 is 0b00001000 where the 1 bit here represents the selection of single-ended mode on the ADC. We logical OR that with the channel that we want to read. Finally, outside the parenthesis, we shift it over by 4 bits so these bits are in the upper nibble. Remember we have to clock in the data MSB first?

Next we declare an input buffer readBuf[3] to hold the data we’re reading in. Then we perform a 3 byte transfer. Now, what do we do with the results? Ouch. Well, remember we reading in 3 bytes. The first lines up with our start bit, so it’s junk and we’ll just ignore readBuf[0]. What about the next byte readBuf[1]? From Figure 6 of the datasheet, you can see that we only care about the 2 lower bits of the first byte which will become the upper two bits of the 10-bit ADC result. First we logical AND those with 0x03 (0b00000011) to get rid of anything above the first two bits. Then we shift it over by 8 bits, so that when we logical OR it with the lower 8 bits in readBuf[2] it coheres into a single 16 bit int. The casts just keep everything in 16 bits along the way.

Real life

So, does the software work? We can test it by applying a logical probe instrument and find out. I used an Intronix logic analyzer to watch the conversion. Here’s the result:

Compare the logic analyzer image to the datasheet. Looks similar! On the MISO line, we can ignore the first byte 0x07. With the second byte, 0xFB (0b11111011) we only care about the bottom two bits (11). In the third byte, we use all 8 bits. Putting those 10 bits together we have 0b1111111111 or 0x3FF, 1023 decimal. That’s the largest number we can express in 10 bits. That’s because I tied channel 0 to the 3.3v out of the Raspberry Pi. Now we can calculate the voltage. Using the reference of 3.3v, the ADC value of 1023 represents 3.3v and we can compute an arbitrary value using a function:

1
2
3
float volts_adc(int adc) {
return (float)adc*3.3f/1023.0f;
}

And that’s it - a working example of reading the MCP3008 using C on the Raspberry Pi. If you’d like the entire code for the example application, you can find the gist here..

References


  1. Datasheet can be found here.

  2. The SPI bus can operate in different ways depending on the clock polarity and phase and how the data relates to clock transitions. "Mode 0,0" means that the clock polarity is 0 and its phase is 0 whereas "mode 1,1" means that the clock polarity and phase are both 1.

2018: Experiment No. 1

2018 is my year of experiments (Why? TL;DR: New Year’s resolutions are over-rated and have a high failure rate. Anyone can run an experiment for a month.) My first experiment (No news for a month) is nearly done and I’ll declare it a success.

Background

The round-the-clock sensational news cycle exists in large part to create wealth for the already-too-wealthy. Little of it is actionable, leaving us at the same time both outraged and impotent. Mostly I decided to give up on the news because of Donald Trump, the demented psychopathic moron who managed to get elected president.[1] Since Trump took office, like others, I’ve found myself cycling repeatedly through the stages of grief. But mostly I’ve been stuck on anger. There’s something about willful ignorance that does that to me.

Experiment

The methodology was simple. I simply willed myself to avoid the news for an entire month. After briefly considering the use of tools that would block news websites, I decided to go cold-turkey.

Results

Some of the things that I noticed:

  • Airports are saturated with news. I travelled a bit during the month. With TV’s blaring the news in every terminal area, it’s impossible to avoid hearing the news. I learned that a book highly critical of Trump was published and that the man himself was displeased. I learned that Congressional Republicans are trying to stop Special Counsel Robert Mueller’s investigation without looking like that’s what they’re doing.
  • Social media can be a significant vector of news. The sidebar on Facebook likes to trumpet the latest bush crash, earthquake, and political twist. But I also discovered that you can resize your browser to make the sidebar go away. Presto!
  • I tended to want to look at the news when I was bored. If I had a moment of boredom, I’d think about the news. Given that the news is supposed to serve in large part the factual needs of an informed electorate, seeking it out of boredom is more in keeping with the values of the entertainment industry, not those of journalism.
  • Outsourcing the news to others slows down the cycle. It was impossible to avoid the news completely. I heard others talking about political happenings and other current events. In fact, I even asked about them. But by outsourcing the news-seeking to others, I was able to slow down the process and keep it at a distance in a way that made it seem more abstract. I didn’t feel as outraged.
  • I felt more productive Once I eliminated the desire to read the news, I was able to stay with purposeful tasks longer.

Conclusions

After a month of no news, I miss reading good journalism. I may go back to it. Or I may not. The experiment was such a success that it would be hard to go back. The real problem for most of us is that the overlap between our circle of interest (what’s going on in the world) and our circle of influence is very small. David Cain noticed the same thing when he quit the news: “Being concerned makes us feel like we’re doing something when we’re not.”

Now off to my next experiment - a month of practicing a secular technology “sabbath”.


  1. I use these terms very carefully. Many have speculated that he suffers from some form of dementia owing to events where he slurs his words and perseverates. His sociopathic or psychopathic behaviours are well-documented; he is man devoid of empathy. And finally, his lack of reading is well-known. For all I can tell, the man is a functional illiterate. In contrast, his predecessor is a bibliophile and read widely and voraciously throughout his tenure.

Automate iTunes for chorus repetitions of L2 pronunciation practice

That title is a mouthful!

TL;DR: One approach to developing good second language pronunciation and rhythm is to repeat a sentence many times while simultaneously listening to a native speaker. If you do this while gradually reducing the source amplitude, you will be speaking on your own without help. This is an AppleScript that automates this process on the Mac platform.

Background

For adult learners of a second language (L2), pronunciation and prosody (the rhythm and cadence of language) can be difficult. A method devised by Swedish linguist and medical doctor Olle Kjellin seeks to remedy this problem by applying a method of chorus repetitions of sentence in the L2. While listening to the sentence over and over, the learner repeats the same sentence aloud, attempting to match the native speaker’s pronunciation and cadence. By gradually reducing the volume of the native speaker, the learner gradually hears more of his own voice. This shaping process has sound neurocognitive underpinnings and Kjellin’s explanation of the method is definitely worth reading.

Automating the process

One of the ideas that Kjellin discusses is gradual reduction in the native speaker’s volume. That rationale is that as the learner begins to hear less of the native speaker’s voice, he begins to hear more of his own. In this way, he learns to shape his pronunciation and developing prosody while the auditory stimulus is gradually withdrawn.

It is possible to do this automatically on the Mac plattorm.[1] For this approach, I use AppleScript to ask the user for the intended track duration in minutes and then it begins playing the current track, gradually reducing the volume over the course of the desired duration. To simplify the choices the user must make, the script only asks for the duration. The minimum volume is hard-coded as is the linear shape of the decay. With a little ingenuity, these choices could be modified. For example, the volume decay could be faster, leaving some of the remaining time at the minimum volume.[2]

Installing

You’ll need to grab the source code from Github and paste it into a new empty script in AppleScript Editor.app.[3] From AppleScript Editor, you need to save it to the iTunes script directory which is located at ~/Library/Library/Scripts/Applications/iTunes.[4] Sorry this is a little cumbersome but I can help you. Just send me a note via my Shortwhale link.

Source code

For the intrepid and the techies, here’s the source code for you:

I’ve found chorus repetitions to be an excellent way of honing one’s pronunciation and prosody in L2 practice and I hope this approach of automating the process is helpful.


  1. Sorry Windows and Linux users, this approach relies on AppleScript which of course doesn't run on these other platforms. Almost certainly there are platform-specific approaches there but that's for someone else to figure out!

  2. Currently when the minimum volume is finally reached, playback stops.

  3. You have it, it's just hard to find. Look in /Applications/Utilities.

  4. You can access the iTunes scripts folder from the scripts menu when iTunes is the frontmost application by going to the scripts menu > Open Scripts Folder > Open iTunes Scripts Folder. That's where you need to save the script.

2018: A year of experiments

New Year’s resolution time is at hand. But not for me; at least not in a traditional sense. I was inspired by David Cain’s experiments. In short, he conducts monthly experiments in self-improvement.

The idea of an experiment is appealing in ways that a resolution is not. A resolution presumes an outcome and relies only on the long application of will to see it through. An experiment on the other hand, makes only a conjecture about the outcome and can be conducted for a shorter period.

Here’s my list of experiments for 2018, month by month. Some of these experiments are only about pushing the limits of my own personal projects. For example, I have an obsessional interest in become more fluent in Russian; so two of the experiments are very specific to that. Otherwise, they are commonsense ideas that apply to all of us. Some are connected by a theme of reducing the influence of technology on my life.

January

No news for one month - Reading the news every day is like watching an accident that never stops happening. The U.S. is a disaster. The U.S. president and his ilk are going to say and do outrageous things. Apart from voting, there’s little I can do. So I say, skip it. I tend toward the negative; so I’m curious about how kicking the news habit will affect my mood.

February

Technology sabbath - Since we are a secular family, the idea of sabbath is more like “time away from the mundane.” The intent here is to take a break one day each week away from technology (computer, cell phone, iPad, etc.) This experiment extends the last month’s efforts to break the cell phone habit.

March

No phone day - One day a week, I will power-down my cell phone and put it in the drawer. Since we don’t have a landline, that effectively means others will have find different ways of contacting me. Or wait. I’m curious about whether having a single day a week away from the smartphone is enough to break the habit of picking it up and looking at it during the week.

April

TBD

May

Pronounce 10,000 sentences - This month, I’ll log 10,000 sentence utterances. Swedish neuroscientist and linguist Olle Kjellin, a champion of patterned repetition of canonical L2 sentences, recommends this way of practicing pronunciation and prosody.

June

No complaining - This is one of David Cain’s experiments. Having a negative bias myself, I’m curious about how forcing myself to reframe events and people’s actions in a neutral or positive way will affect me.

July

Social media once a day (or less) - The degree to which social media frames our perspectives in algorithmic and involuntary ways is frightening. Nonetheless, it has the ability to connect people in ways that can be interesting and touching. This is about trying to slow down the input and make it manageable rather than a burden.

August

No alcohol or caffeine - Simple. No dependencies.

September

Meditate for 10 or more minutes daily - The benefits of meditation are well-known.

October

Declutter daily - Spend 15 minutes a day devoted to organizing and decluttering to observe how it affects my mood, perceptions of order and how it fits into the day’s workload.

November

Aerobic exercise daily - Not too many years ago, I was a committed road cyclist. I’ve ridden up almost all of the major mountain passes in Colorado. Now I’m a couch potato. Time to get moving.

December

Lift weights daily - We lose muscle mass as we age. This experiment is about trying to blunt the effects of age on this reduction by lifting a modest amount of weight every day.

Wish me luck. I’ll need it.

Peering into Anki using R

Yet another diversion to keep me from focusing on actually using Anki to learn Russian. I stumbled on the R programming language, a language that focuses on statistical analysis.

Here’s a couple snippets that begin to scratch the surface of what’s possible. Important caveat: I’m an R novice at best. There are probably much better ways of doing some of this…

Counting notes with a particular model type

Here we’ll use R to do what we did previously with Python.

First load some of the libraries we’ll need:

1
2
3
library(rjson)
library(RSQLite)
library(DBI)

Next we’ll connect to the database and extract the model information:

1
2
3
4
5
6
7
# connect to the Anki database
dbpath <- "path to your collection"
con = dbConnect(RSQLite::SQLite(),dbname=dbpath)

# get information about the models
modelInfo <- as.character(dbGetQuery(con,'SELECT models FROM col'))
models <- fromJSON(modelInfo)

Since the model information is stored as JSON, we’ll need to parse the JSON to build a data frame that we can use to extract the model ID that we’ll need.

1
2
3
4
5
6
7
names <- c()
mid <- names(models)
for(i in 1:length(mid))
{
names[i] <- models[[mid[i]]]$name
}
models <- data.frame(cbind(mid,names))

Next we’ll extract the model ID (mid) from this data frame so that we can find all of the notes with that model ID:

1
2
3
4
5
verbmid <- as.numeric(as.character(models[models$names=="Русский - глагол","mid"]))

# query the notes database for notes with this model
query <- paste("SELECT COUNT(id) FROM notes WHERE mid =",verbmid)
res <- as.numeric(dbGetQuery(con,query))

And of course, close the connection to the Anki SQLite database:

1
dbDisconnect(con)

As of this writing, res tells me I have 702 notes with the verb model types (named “Русский - глагол” in my collection.)

Counting hours per month in Anki

Ever wonder how many hours per month you spend reviewing in Anki? Here’s an R program that will grab review time information from the database and plot it for you. I ran across the original idea in this blog post by Gene Dan, but did a little work on the x-axis scale to get it to display correctly.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
library(RSQLite)
library(DBI)
library(rjson)
library(anytime)
library(sqldf)
library(zoo)
library(ggplot2)

dbpath <- "/Users/alan/Library/Application Support/Anki2/Alan - Russian/collection.anki2"
con = dbConnect(RSQLite::SQLite(),dbname=dbpath)
#get reviews
rev <- dbGetQuery(con,'select CAST(id as TEXT) as id
, CAST(cid as TEXT) as cid
, time
from revlog')

cards <- dbGetQuery(con,'select CAST(id as TEXT) as cid, CAST(did as TEXT) as did from cards')

#Get deck info - from the decks field in the col table
deckinfo <- as.character(dbGetQuery(con,'select decks from col'))
decks <- fromJSON(deckinfo)

names <- c()
did <- names(decks)
for(i in 1:length(did))
{
names[i] <- decks[[did[i]]]$name
}

decks <- data.frame(cbind(did,names))
#decks$names <- as.character(decks$names)

cards_w_decks <- merge(cards,decks,by="did")
#Date is UNIX timestamp in milliseconds, divide by 1000 to get seconds
rev$revdate <- as.yearmon(anydate(as.numeric(rev$id)/1000))

# Assign deck info to reviews
rev_w_decks <- merge(rev,cards_w_decks,by="cid")
time_summary <- sqldf("select revdate, sum(time) as Time from rev_w_decks group by revdate")
time_summary$Time <- time_summary$Time/3.6e+6

ggplot(time_summary,aes(x=revdate,y=Time))+geom_bar(stat="identity",fill="#d93d2a")+
scale_x_yearmon()+
ggtitle("Hours per Month") +
xlab("Review Date") +
ylab("Time (hrs)") +
theme(axis.text.x=element_text(hjust=2,size=rel(1))) +
theme(plot.title=element_text(size=rel(1.5),vjust=.9,hjust=.5)) +
guides(fill = guide_legend(reverse = TRUE))

dbDisconnect(con)

You should get a plot like this the one at the top of the post.

I’m anxious to learn more about R and apply it to understanding my performance in Anki.

Language word frequencies

Since one of the cornerstones of my approach to learning the Russian language has been to track how many words I’ve learned and their frequencies, I was intrigued by reading the following statistics today:

  • The 15 most frequent words in the language account for 25% of all the words in typical texts.
  • The first 100 words account for 60% of the words appearing in texts.
  • 97% of the words one encounters in a ordinary text will be among the first 4000 most frequent words.

In other words, if you learn the first 4000 words of a language, you’ll be able to understand nearly everything.

Source - Five Cornerstones for Second-Language Acquisition - the Neurophysiological Opportunist’s Way - Olle Kjellin, M.D., Ph.D. but originally from The Cambridge Encyclopedia of Language (Crystal, 1995)

Anki database adventures: Counting notes by model type

Continuing my series on accessing the Anki database outside of the Anki application environment, here’s a piece on accessing the note type model. You may wish to start here with the first article on accessing the Anki database. This is geared toward mac OS. (If you’re not on mac OS, then start here instead.)

The note type model

Since notes contain flexible fields in Anki, the model for a note type is in JSON. The best guess definition of the JSON is:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
{
"css": "CSS, shared for all templates",
"did":
"Long specifying the id of the deck that cards are added to by default",
"flds": [
"JSONArray containing object for each field in the model as follows:",
{
"font": "display font",
"media": "array of media. appears to be unused",
"name": "field name",
"ord": "ordinal of the field - goes from 0 to num fields -1",
"rtl": "boolean, right-to-left script",
"size": "font size",
"sticky": "sticky fields retain the value that was last added \
when adding new notes"
}
],
"id": "model ID, matches cards.mid",
"latexPost": "String added to end of LaTeX expressions",
"latexPre": "preamble for LaTeX expressions",
"mod": "modification time in milliseconds",
"name": "model name",
"req": [
"Array of arrays describing which fields are required \
for each card to be generated",
[
"array index, 0, 1, ...",
"? string, all",
"another array",
["appears to be the array index again"]
]
],
"sortf": "Integer specifying which field is used for sorting (browser)",
"tags": "Anki saves the tags of the last added note to the current model",
"tmpls": [
"JSONArray containing object of CardTemplate for each card in model",
{
"afmt": "answer template string",
"bafmt": "browser answer format: used for displaying answer in browser",
"bqfmt": "browser question format: \
used for displaying question in browser",
"did": "deck override (null by default)",
"name": "template name",
"ord": "template number, see flds",
"qfmt": "question format string"
}
],
"type": "Integer specifying what type of model. 0 for standard, 1 for cloze",
"usn": "Update sequence number: used in same way as other usn vales in db",
"vers": "Legacy version number (unused)"
}

Our goal today is to count all of the notes that have a given note type. Fortunately, there’s a built-in method for this:

1
verbModel = col.models.byName(u'Русский - глагол')

Here we find the model object (a Python dictionary) named ‘Русский - глагол’ (that’s Russian verb, by the way.) To access its id:

1
modelID = verbModel['id']

Now we just have to count:

1
2
3
4
query = """SELECT COUNT(id) from notes WHERE mid = {}""".format(verbModel['id'])
verbNotes = col.db.scalar(query)

print 'There are {:.5g} verb notes.'.format(verbNotes)

And that’s it for this little adventure in the Anki database.

See also:

Accessing the Anki database with Python: Working with a specific deck

I previously wrote about accessing the Anki database using Python on mac OS. Extending that post, I’ll show how to work with a specific deck in this short post.

To use a named deck you’ll need its deck ID. Fortunately there’s a built-in method for finding a deck ID by name:

1
2
col = Collection(COLLECTION_PATH)
dID = col.decks.id(DECK_NAME)

Now in queries against the cards and notes tables we can apply the deck ID to restrict them to a certain deck. For example, to find all of the cards currently in the learning stage:

1
2
3
4
query = """SELECT COUNT(id) FROM cards where type = 1 AND did = dID"""
learningCards = col.db.scalar(query)

print 'There are {:.5g} learning cards.'.format(learningCards)

And close the collection:

1
col.close()

See also: