How License to Gill Works

Every couple of weeks, License to Gill downloads several database files from MusicBrainz containing around 2.6 million musicians and around 40 million records and individual tracks. License to Gill builds a big map of performers and recordings and stores it in a 1.1 GB database file.

There is a database service running at all times that stores the database file in memory. It is written in C for maximum CPU and memory efficiency. The service handles requests to find links between any pair of musicians. PHP pages running on the License to Gill web server connect to the database service using JSON over HTTPS.

The database service uses a breadth-first search (BFS) to find the shortest path between pairs of musicians. If you want to dig further into how shortest-path algorithms work, I recommend Introduction to Algorithms by Cormen, Leiserson, Rivest, and Stein as an excellent place to start. Other algorithms textbooks are likely to cover the subject as well, if that one isn't available. You may also look at materials that I wrote to explain graph algorithms (including BFS) to Duke undergraduate CS students here.

Whenever License to Gill answers a query, the results are cached so that future requests to link to the same actor will occur more quickly. About 95% of all queries can be served instantly from the result cache.

The database server runs on an Amazon EC2 node with Ubuntu Linux. It consumes about 2GB of RAM, about a quarter of which is used for the results cache.