Extracting title title of a web page from the command line

May 26, 2022

I was using a REST API at https://textance.herokuapp.com/title but it seems awfully fragile. Sure enough this morning, the entire application is down. It’s also not open-source and I have no idea who actually runs this thing.

Here’s the solution:

#!/bin/bash

url=$(pbpaste)
curl $url -so - | pup 'meta[property=og:title] attr{content}'

It does require pup. On macOS, you can install via brew install pup.

There are other ways using regular expressions but no dependency on pup but parsing HTML with regex is not such a good idea.