My Quest to Build the Ultimate Music Player

Over the past few years, I have been slowly but surely building my own music player. It's been a wild ride. The codebase has radically changed several times, but is always converging on a better music listening experience.

In this article my goal is to take you along for the ride.

See the project on GitHub

I <3 Amarok 1.4

Back in 2009, my music player of choice was Amarok 1.4. This was by far the best music player I had ever used on Windows, Mac, or Linux, especially when combined with the wonderful ReplayGain plugin. Here's a screenshot:

One way you can tell how much people loved this music player is by looking at the comments on the release blog articles for Amarok 2.0, which rebuilt the player from scratch and took it in a completely different direction. Arguably, they should have picked a different project name and logo. Look at some of these comments, how angry and vitriolic they are: 2.0 2.1 2.2

Even now, 4 years later, the project is at version 2.8 and the release name is titled "Return To The Origin":

Amarok 1.8 is titled "Return To The Origin" as we are bringing back the polish that many users loved from the original 1.x series!

That should give you an idea of how much respect Amarok 1.4 commanded.

Even so, it was not perfect. Notably, the ReplayGain plugin I mentioned above had several shortcomings. Before I get into that, however, let me take a detour and explain what ReplayGain, or more generally, loudness compensation, is and some of its implications.

A Short Explanation of Loudness Compensation

Have you ever seen this 2-minute video explaining the Loudness War?

The video demonstrates a trend in digital audio mastering where songs are highly compressed to sound louder, and how this can compromise the integrity of the music.

While thinking about building a music player, we're not going to make moral judgments about whether or not compression is ruining music for everybody. If users want to listen to highly compressed music, that's a valid use case. So we have to consider a music library which contains both compressed songs and dynamic songs.

Here is a song called The Happiest Days of Our Lives by Pink Floyd, mastered in 1979:

Here is a song called Saying Sorry by Hawthorne Heights, mastered in 2006:

It is immediately obvious that the second one is much louder than the other. So what happens when they are played one after the other in a music player?

When the quieter song comes on first, the user reaches for the volume knob to turn it up so they can hear. Oops. When the next song begins, a surge of adrenaline shoots through the user's body as they scramble to turn the volume down. This goes beyond poor usability; this problem can cause hearing loss.

The solution is to analyze each song before playing it to figure out how "loud" it sounds to humans. Then the music player adjusts the playback volume of each track to compensate for the perceived loudness. This way, the user does not have to adjust the volume for each track that comes on.

The idea is simple enough, but it poses a few subtle challenges.

For one, the loudness of an individual track might be different than the loudness of the album as a whole. A complete loudness compensation solution has to take this into account, both during scanning and playback.

An even trickier problem is avoiding clipping. Music is composed of samples which have a fixed range. For example in floating point format, samples can be between 0.0 and 1.0. Even quiet songs usually have some samples which peak at 1.0, for example on the drums. But we need to turn the volume up on these quiet songs to make them sound as loud as the highly compressed ones.

If we naïvely increased the volume on such a song, we would end up with something like this:

The grey bars above the red lines represent clipping. This causes distortion and generally sounds awful.

The solution is not to increase the volume of the quiet song, but to decrease the volume of the loud song. In order to do this, we introduce an amount called pre-gain. All songs are turned down by this amount, which gives us the headroom we need to turn the quieter ones back up.

It's not a perfect solution though.

The lower the pre-gain, the more the music player will sound quieter than other applications on the computer. The higher the pre-gain, the more likely that there is not enough headroom to increase the volume of a quiet song enough.

The ReplayGain 1.0 Specification outlines this in more detail.

In 2010, the European Broadcasting Union introduced a new standard called R128. This standard outlines a strategy for analyzing media and determining how loud it is. There is a motion to make ReplayGain 2.0 use this standard.

I recommend this excellent Introduction to EBU R128 by Florian Camerer:

Shortcomings of Amarok 1.4

As much as I loved Amarok 1.4, it did not even attempt to address these loudness issues. There is no built-in loudness compensation.

The ReplayGain plugin I mentioned earlier was great, but it was limited in usefulness:

It had to scan every time the playlist updated; it didn't cache the data.
Each format that you wanted to scan had a different command-line utility which had to be installed. This means that the set of songs that Amarok 1.4 could play was completely different than the set of songs that it could scan.
It applied the volume changes on a gradient instead of instantly, and timing was not precise. This means that it might erroneously turn up the loudness far too high in the transition time to the next track. This behavior was distracting and sometimes ear-piercingly painful.
You had to manually decide between track and album mode. This is a pointless chore that the music player should do automatically. Here's a simple algorithm:
- If the previous item in the playlist is the previous item from the same album, or the next item in the playlist is the next item from the same album, use the album ReplayGain information.
- Otherwise, use the track ReplayGain information.

Aside from the loudness compensation, I had a couple other nits to pick:

Dynamic Mode was a useful feature that could continually play random songs from the library. But the random selection was too random; it would often queue the same song within a short period of time.
If the duration tag was incorrect in a song, or if in was a variable rate MP3, the song would seemingly end when the song had not yet gotten to the end. Or in other words, the reported duration was incorrect and seeking would be broken.

I've spent some time criticizing, now let me be more constructive and actually specify some features that I think music players should have.

My Laundry List of Music Player Features

Loudness Compensation using the same scanner as decoder

This is absolutely crucial. If you want to solve the loudness compensation problem, the set of songs which you can decode and play back must be the same set of songs which you can scan for loudness. I should never have to manually adjust the volume because a different song or album came on.

Ideally, loudness scanning should occur lazily when items are added to the play queue and then the generated values should be saved so that the loudness scanning would not have to be repeated.

Do not trust duration tags

A music player already must scan songs to determine loudness compensation values. At the same time, it should determine the true duration of the file and use that information instead of a tag which could be wrong.

If my friends come over, they can control the music playback

Friends should be able to upload and download music, as well as queue, skip, pause, and play.

Ability to listen to my music library even when I'm not home

I should be able to run the music player on my home computer and listen to a real-time stream from work, for example.

Gapless Playback

Many albums are created in order to be a listening experience that transcends tracks. When listening to an album, songs should play seamlessly and without volume changes at the seams. This means that loudness scanning must automatically take into account albums.

Robust codec support

You know how when you need to play some obscure video format, you can always rely on VLC to play it? That must be true for the ultimate music player as well. A music player must be able to play music. If you don't have a wide range of codecs supported, you don't have a music player.

Keyboard Shortcuts for Everything

I should be allowed to never touch the mouse when operating the music player.

Clean up my messy files

One thing that Amarok 1.4 got right is library organization. It offered a powerful way to specify the canonical location for a music file, and then it had an option to organize properly tagged music files into the correct file location.

I don't remember the exact format, but you could specify a format something like this:

%artist/%album/%track %title%extension

Filter Search

There should be a text box where I can type search terms and instantly see the search results live. And it should ignore diacritics. For example, I could type "jonsi ik" and match the song Boy Lilikoi by Jónsi.

Playlist Mode that Automatically Queues Songs

Some names for this feature are:

Dynamic Mode
Party Mode
DJ Mode

The idea is that it automatically queues songs - kind of like a real-time shuffle - so that you don't have to manually decide what to listen to.

One common flaw found in many players is using a truly random algorithm. With true randomness, it will frequently occur that a song which has recently been randomly chosen will be randomly chosen again.

A more sophisticated algorithm weights songs by how long it has been since they have been queued. So any song would be possible to be queued, but songs that have not been queued recently are much more likely to be queued. Queue date is chosen rather than play date because if a song is queued and the user skips the song, this should still count in the weight against it being chosen again.

"PartyBeat"

It would be a long time before my wishlist of features would become a reality. Meanwhile, back in college my buddy made a fun little project which served as a music player that multiple people could control at the same time with a web interface. He installed it in my and my roommate's apartment, and the three of us used it in our apartment as a shared jukebox; anarchy deciding what we would listen to while we worked on our respective jobs, homework, or projects. We dubbed it "PartyBeat".

Here's a screenshot:

You might recognize that UI scheme - it is the Trontastic JQuery UI theme.

This project used Django and xmms2 and had a bug-ridden, barren, and clunky web-based user interface. It was so lacking compared to my usual Amarok 1.4 experience, yet somehow I could not give up the "shared jukebox" aspect. It was simply too fun to listen to music together, regardless of the interface.

So finally I decided to build the ultimate music player. It would still have a web-based interface, but it would behave like a native application - both in feel and in responsiveness. It should be nice enough that even when you want to listen alone it would still be your go-to player of choice.

Fumbling Around with Technology

In early 2011 I started investigating what technology to use to build this thing. I knew that I wanted a backend which could decode many audio formats, do gapless playback, and provide some kind of interface for a web server to control it.

I tinkered a bit with Qt and the Phonon framework, but I didn't get as far as having a web interface controlling it.

Eventually I stumbled upon Music Player Daemon. At the time this seemed like a perfect fit, especially since the XMMS2 wiki admitted that if they had known that MPD existed when they started the project, they would probably have just used it. MPD is a service - it has a config file which tells it, among other things, the location of your music library, and then it runs in the background, listening on a port (typically 6600), where you can issue commands via the protocol telling it to pause, play, skip, queue, unqueue, and all that jazz.

The first iteration of "PartyBeat2" was a small Python 3 server which was merely a proxy between the client-side JavaScript code and MPD, as well as a file server to serve the client-side HTML and JavaScript.

At this point I had a basic proof-of-concept. However, progress slowed for a few months as I embarked on a 12-day hiking trip followed immediately by the first day of work at Amazon, my first out-of-college job.

After a short hiatus, I revisited the project. This was right when socket.io was getting a lot of hype, and it seemed like the perfect fit for my design. Also I had just given Coffee-Script a real chance after snubbing it initially. So I ported over the proxy/file server to Node.js and got a prototype working:

I even did some design drawings on paper:

A week of iterating later, I had the basics of a user interface, and a name:

I named it Groove Basin, after the Sonic the Hedgehog 3 Azure Lake Remix by Rayza. As homage to the original project, I picked a JQuery UI theme for the UI, except this time I chose Dot Luv.

Growing Pains

Progress continued off and on over the period of about a year. As the feature set grew larger and more solidified, the server which used to only be a proxy and file server took on more and more responsibilities.

Dynamic Mode required that the server watch the main playlist and add songs to the end and remove them from the beginning as playback continued from one track to the next.

Last.fm scrobbling required that the server send scrobbles even when the client was not connected.

As the server took on more responsibilities, it began to make less and less sense for it to directly proxy the MPD protocol for the client.

Meanwhile, I became unsatisfied with Coffee-Script. Perhaps digressing too much, some of the issues I took with it were:

Error-prone variable scoping

If you accidentally name a local variable the same as one in an outer scope, you mutate the value of the outer one rather than shadowing it. For example if you path = require('path') and then later innocently use path as the name of a local variable, you're in for a wild ride:

path = require('path')

# ...

makePathName = (song) ->
  path = song.title
  if song.artist
    path = song.artist + '/' + path
  return path

# ...

basename = makePathName(title: 'hello')
path.join(musicDir, basename) # TypeError: Object  has no method 'join'

Messy and inefficient output code

The JavaScript that Coffee-Script produces is frankly quite ugly. Specifically, it does not reuse temporary variables, so you end up with _len, _len2, _len3 and so on:

arr = [1, 2, 3]
f = ->
  for x in arr
    console.log x
  for x in arr
    console.log x
  for x in arr
    console.log x

Produces:

(function() {
  var arr, f;

  arr = [1, 2, 3];

  f = function() {
    var x, _i, _j, _k, _len, _len1, _len2, _results;
    for (_i = 0, _len = arr.length; _i < _len; _i++) {
      x = arr[_i];
      console.log(x);
    }
    for (_j = 0, _len1 = arr.length; _j < _len1; _j++) {
      x = arr[_j];
      console.log(x);
    }
    _results = [];
    for (_k = 0, _len2 = arr.length; _k < _len2; _k++) {
      x = arr[_k];
      _results.push(console.log(x));
    }
    return _results;
  };

}).call(this);

If you look closely at that output JavaScript code, you'll notice something even more annoying. Every function returns a value unless you explicitly put a return statement at the end. This can have surprising side effects. In our example code, Coffee-Script decided to put the output of console.log into an array. This is a controversial feature, but it is not going to change.

Inability to declare functions

In Coffee-Script you cannot declare a function; you can only assign an anonymous function to a variable. This makes it impossible to do what I consider to be the cleanest organization of callback code.

Playing with coco

What I should have done at this time is gone back to plain old JavaScript. But I was still seduced by the features that compile-to-js languages bring to the table. So instead I switched the codebase over to coco. This project solved some of Coffee-Script's problems including all the ones I listed above, and satyr seemed to have a better understanding of compiler design than jashkenas given that coco ran twice as fast.

coco lasted about a year before I removed it. satyr started taking coco in a pretty wild direction, adding things like c = a-b compiling to c = aB rather than c = a - b. The syntax became so complicated that if you made a typo in the source code, you had more of a chance of introducing a subtle bug than of introducing a syntax error.

In the end, though, the biggest factor, and this goes for Coffee-Script as well as coco as well as any other compile-to-js language, is that it alienates possible contributors.

As I gained more experience with Node.js, I realized that most developers used JavaScript directly instead of a compile-to-js language. By using a language that significantly fewer people were familiar with, I made Groove Basin a less attractive project to contribute to.

Rejecting MPD

All this compile-to-js stuff was meta work; it had no fundamental effect on how well the music player performed. Meanwhile there lurked a more substantial problem with the way Groove Basin was designed.

At first, MPD seemed like a great choice. It is in most popular Linux distributions' package managers, including the Raspberry Pi. It had been around for long enough that there are multiple free iPhone and Android apps available to act as a controller. It can play most audio formats. But in the end, there are some critical issues that prevent it from being the right choice for Groove Basin.

Demands control of your music library

The only way to play a song is to add it to the library, and then queue it. If you want to implement your own music database and use MPD as a simple playlist for playback, you still have to keep MPD's library up to date too.

ReplayGain support is laughable

MPD supports only APEv2 ReplayGain tags. This must be some kind of joke, because obviously not every format supports APEv2 tags, and most ReplayGain scanners actually write to ID3v2 tags. But more importantly, MPD misses the entire point. Relying on external ReplayGain scanning makes the set of songs you can play different from the set of songs you can scan. This leads to an inconsistent and rather unpleasant listening experience. Further, relying on tags makes it impossible to store ReplayGain data for songs in the library in a container format which does not support tags. And finally... what the hell? Is the user supposed to set up their own cron job to scan their music collection? How about the music player app does it for the user, silently, in the background, no questions asked?

No tag editing

After demanding control of your music library, MPD provides no way to edit tags of a song.

If you're going to read and write audio tags, the same library should be in charge of both. Otherwise, like the ReplayGain problem I outlined earlier, you end up with discrepancies in what you can read and write.

Protocol is severely limited and poorly designed

MPD is controlled via the MPD Protocol that can be used to control playback and query information. For the most part, all the information that you need is there. However the protocol is massively inefficient.

For one example, consider the use case of a client which wants to keep an index of the music database. That is, it wants to have an updated copy of all the music metadata, so that it can do things such as queuing a random track, or allowing the user to quickly search for a song. In this use case, the mpd client would have to request a copy of the music database index, and then subscribe to a notification when the database changes. Once that notification is sent, the client would then have to re-request the entire index again, an operation which can be upwards of 3MB of data for a music library with 9000 songs. A better behavior would be if the notification included a delta of exactly what changed, so that the client could keep their copy updated without that massive payload.

This same problem exists with the main playlist - you have to request the entire playlist instead of receiving a delta of what changed.

Another problem with the MPD protocol is that although it is intended to support multiple concurrent users controlling the same server, it is riddled with race conditions.

For example, it packages multiple state updates into one. Consider the status update message:

status
volume: 100
repeat: 0
random: 0
single: 0
consume: 1
playlist: 0
playlistlength: 23
xfade: 0
mixrampdb: 0.000000
mixrampdelay: nan
state: play
song: 10
songid: 0
nextsong: 11
nextsongid: 1
time: 69:224
elapsed: 68.985
bitrate: 192
audio: 44100:24:2
OK

Now imagine that one user adjusts volume while another user toggles the repeat state. Because volume and repeat state are sent in the same message, at least one user will receive a status message with incorrect information before receiving a new one with correct information. In practice, this means that the UI on clients will momentarily display bogus state when things change which makes clients feel "glitchy".

As another example, audio files are indexed by filename. This means that if you rename a file, every client which had a handle on that file now has an invalid handle. Even after you download the entire new music library index, there is no way to tell which file got renamed.

One final example. Consider the use case where the user presses the volume up button several times quickly.

The problem is that you receive an event saying that the status has changed. The only reasonable response to this is to ask what the new status is. The new status tells us that the volume is at 3. Now, there are 2 more messages that will arrive soon telling us that the new volume is 4 and then 5. But before they arrive, the UI is updated, the user sees an invalid value, and when they press volume up again, it is from the invalid base position of 3 instead of 5. Consider a simpler alternative which solves this problem. When the client sends a volume update, the server accepts the message and then only notifies other clients that the volume changed.

Various audio playback glitches

MPD supports something called "stickers". These are simple pieces of data MPD clients can add to tracks to use for their own purposes. Groove Basin took advantage of stickers to store "last queued date" on each track in order to implement its random song selection which favors songs you haven't heard recently. Josh discovered that MPD apparently makes two stupid decisions with regard to stickers:

It uses the same thread for audio playback as it uses for updating sticker information.
It somehow is so inefficient at updating sticker information that it causes audio playback to skip a few tenths of a second if you try to do it for upwards of 8 songs at once.

The end result here was that audio playback would glitch if you queued an album.

In addition there were some basic audio playback issues. Sometimes after unpausing, audio playback would stutter quickly until the song ended. Sometimes the HTTP stream would not send audio data quickly enough, so the HTTP stream would have to stop to buffer repeatedly, and the behavior had to be fixed by disabling and re-enabling the audio output in MPD.

Now it is possible to submit patches to MPD to get these bugs fixed. But if I'm going to work on that layer of the problem, why not put that effort toward a project which better fulfills Groove Basin's goals?

We want more control over the HTTP audio stream

In the best case scenario, the music player server would know which of the connected browser clients are actually streaming music. With MPD in control of the HTTP audio stream, we have no access to or control over the HTTP request. We can only provide the URL. Also it requires running on a separate port from the main web interface which is its own can of worms.

We also want more control over the audio stream. When a song is skipped, for example, the best thing to do is flush the audio buffer so that the buffer can start filling up with data from the new song. This is precisely what happens in media players that play locally on your computer. However, with MPD this is not possible. When you skip a song you still have to wait for the audio stream to catch up.

Having direct control over the HTTP audio stream also enables us to experiment with some more creative ideas. For example, recently I updated the HTTP stream so that when a client connects, it receives a burst of 200 KB of encoded audio, followed by a steady stream of exactly as many bytes per second as the encoded audio contains. This gives clients enough data to begin playback immediately. You may have noticed this behavior when you watch a YouTube video - the buffering bar loads very quickly at first, but then slows down to a crawl once playback begins.

In practice, I have observed the delay between connecting to the stream URL and playback beginning to be anywhere from seemingly instantaneous to 300ms with this method, depending on latency and bandwidth. Meanwhile, if you use MPD's HTTP audio stream, clients will take upwards of 10 seconds to buffer audio before playback begins.

Building a Music Player Backend

So, how hard could it be to build my own music player backend? Seems like it would be a matter of solving these things:

Use a robust library for audio decoding. How about the same one that VLC uses?
Support adding and removing entries on a playlist for gapless playback.
Support pause, play, and seek.
Per-playlist-item gain adjustment so that perfect loudness compensation can be implemented.
Support loudness scanning to make it easy to implement for example ReplayGain.
Support playback to a sound device chosen at runtime.
Support transcoding audio into another format so a player can implement, for example, HTTP streaming.
Give raw access to decoded audio buffers just in case a player wants to do something other than one of the built-in things.
Try to get other projects to use it to benefit from code reuse.
- Make the API generic enough to support other music players and other use cases.
- Get it packaged into Debian and Ubuntu.
- Make a blog post about it to increase awareness.

After reading up a little bit on the insane open-source soap-opera that was the forking of libav from ffmpeg (here are two sides to the story: libav side, ffmpeg side), I went with libav simply because it is what is in the Debian and Ubuntu package managers, and one of my goals is to get this music player backend into their package managers.

Several iterations later, I now have libgroove, a C library with what I think is a pretty solid API. How it works:

The API user creates a GroovePlaylist which spawns its own thread and is responsible for decoding audio. The user adds and removes items at will from this playlist. They can also call pause, play, and seek on the playlist. As the playlist decodes audio, where does the decoded audio go? This is where those sinks come in.

A sink is a metaphor of a real-life sink that you would find in a bathroom or kitchen. Sinks can fill up with water, and unless the water is drained the sink will continue to fill until it overflows. Likewise, in audio processing, a sink is an object which collects audio buffers in a queue.

In libgroove, decoded audio is stored in reference-counted buffer objects and passed to each connected sink. Each sink does whatever processing it needs to do and then calls "unref" on the buffer. Typically each sink will have its own thread which hungrily waits for buffers and devours them as fast as possible. However the playlist is also decoding audio as fast as possible and pushing it onto each sink's queue. It is quite possible, that a sink's queue fills up faster than it can process the buffers. When the playlist discovers that all its sinks are full, it puts its thread to sleep, waiting to be woken up by a sink which has drained enough.

libgroove provides some higher-level sink types in addition to the basic sink. Each higher level sink runs in its own thread and is built using the basic sink. These include:

playback sink - opens a sound device and sends the decoded audio to it. This sink fills up with events that signal when the sink has started playing the next track, or when a buffer underflow occurs.
encoder sink - encodes the audio buffers it receives and fills up with encoded audio buffers. These encoded buffers can then be written to a file or streamed over the network, for example.
loudness scanner sink - uses the EBU R 128 standard to detect loudness. This sink fills up with information about each track, including loudness, peak, and duration.

The API is designed carefully such that even though the primary use case is for a music player backend, libgroove can be used for other use cases, such as transcoding audio, editing tags, or ReplayGain scanning. Here is an example of using libgroove to for a simple transcode command line application:

/* transcode one or more files into one output file */

#include <groove/groove.h>
#include <groove/encoder.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>

static int usage(char *arg0) {
    fprintf(stderr, "Usage: %s file1 [file2 ...] --output outputfile [--bitrate 320] [--format name] [--codec name] [--mime mimetype]\n", arg0);
    return 1;
}

int main(int argc, char * argv[]) {
    // arg parsing
    int bit_rate_k = 320;
    char *format = NULL;
    char *codec = NULL;
    char *mime = NULL;

    char *output_file_name = NULL;

    groove_init();
    atexit(groove_finish);
    groove_set_logging(GROOVE_LOG_INFO);
    struct GroovePlaylist *playlist = groove_playlist_create();

    for (int i = 1; i < argc; i += 1) {
        char *arg = argv[i];
        if (arg[0] == '-' && arg[1] == '-') {
            arg += 2;
            if (i + 1 >= argc) {
                return usage(argv[0]);
            } else if (strcmp(arg, "bitrate") == 0) {
                bit_rate_k = atoi(argv[++i]);
            } else if (strcmp(arg, "format") == 0) {
                format = argv[++i];
            } else if (strcmp(arg, "codec") == 0) {
                codec = argv[++i];
            } else if (strcmp(arg, "mime") == 0) {
                mime = argv[++i];
            } else if (strcmp(arg, "output") == 0) {
                output_file_name = argv[++i];
            } else {
                return usage(argv[0]);
            }
        } else {
            struct GrooveFile * file = groove_file_open(arg);
            if (!file) {
                fprintf(stderr, "Error opening input file %s\n", arg);
                return 1;
            }
            groove_playlist_insert(playlist, file, 1.0, NULL);
        }
    }
    if (!output_file_name)
        return usage(argv[0]);

    struct GrooveEncoder *encoder = groove_encoder_create();
    encoder->bit_rate = bit_rate_k * 1000;
    encoder->format_short_name = format;
    encoder->codec_short_name = codec;
    encoder->filename = output_file_name;
    encoder->mime_type = mime;
    if (groove_playlist_count(playlist) == 1) {
        groove_file_audio_format(playlist->head->file, &encoder->target_audio_format);

        // copy metadata
        struct GrooveTag *tag = NULL;
        while((tag = groove_file_metadata_get(playlist->head->file, "", tag, 0))) {
            groove_encoder_metadata_set(encoder, groove_tag_key(tag), groove_tag_value(tag), 0);
        }
    }

    if (groove_encoder_attach(encoder, playlist) < 0) {
        fprintf(stderr, "error attaching encoder\n");
        return 1;
    }

    FILE *f = fopen(output_file_name, "wb");
    if (!f) {
        fprintf(stderr, "Error opening output file %s\n", output_file_name);
        return 1;
    }

    struct GrooveBuffer *buffer;

    while (groove_encoder_buffer_get(encoder, &buffer, 1) == GROOVE_BUFFER_YES) {
        fwrite(buffer->data[0], 1, buffer->size, f);
        groove_buffer_unref(buffer);
    }

    fclose(f);

    groove_encoder_detach(encoder);
    groove_encoder_destroy(encoder);

    struct GroovePlaylistItem *item = playlist->head;
    while (item) {
        struct GrooveFile *file = item->file;
        struct GroovePlaylistItem *next = item->next;
        groove_playlist_remove(playlist, item);
        groove_file_close(file);
        item = next;
    }
    groove_playlist_destroy(playlist);

    return 0;
}

Note that this code contains no threading. Even so, because of the way libgroove is designed, when this app is run, one thread will work on decoding the audio while the main thread seen in this code will work on writing the encoded buffers to disk.

Once I had this backend built, I needed to use it in Groove Basin, which you may recall is a Node.js app. To do this I built a native add-on node module called groove. It uses libuv and v8 to interface between C and Node.js. I wrote the majority of this code at Hacker School, an experience which I highly recommend.

With the groove node module complete, the new architecture looked like this:

No longer did Groove Basin need to run a third party server to make everything work - just a single Node.js application with the correct libraries installed. And now I was in control of the audio backend code which meant that I had the power to make everything work exactly like I wanted it to.

Packaging

Nothing turns away potential users faster than a cumbersome install process. I knew that I had to make Groove Basin easy to install, so I took several steps to make it so.

One thing I did to make libgroove easy to install is bundle some of the harder to find dependencies along with it. Specifically, libav10, libebur128, and SDL2. This way if the user is on a computer which does not have those packages readily available, they may still install libgroove.

This convenience is less desirable than relying on existing system dependencies, however, so if the configure script detects system libraries, it happily prefers them.

Next, I made a libgroove PPA for Ubuntu users. This makes installing libgroove as easy as:

sudo apt-add-repository ppa:andrewrk/libgroove
sudo apt-get update
sudo apt-get install libgroove-dev libgrooveplayer-dev libgrooveloudness-dev

Then I joined the Debian multimedia packaging team. This team is dedicated to making Debian a good platform for audio and multimedia work. They kindly accepted me and coached me while I worked on packaging up libebur128 and libgroove for Debian. After a few back and forths, a libebur128 Debian package is ready to be installed from testing, and a libgroove Debian package can be installed from experimental. Once the libav10 transition is complete, libgroove can be submitted to unstable, where it will move into testing, and then finally be released to all of Debian!

After a few more months of progress, I'd like to package up Groove Basin itself. This way, the entire installation process could be just an apt-get install away.

Contributions to libav

While working on libgroove several issues came up which led me to contribute code to libav. The first thing I noticed is that if you asked libav for a decoder based on the .ogg extension, by default it would use the FLAC encoder. While it is true that .ogg files can contain FLAC audio, the Xiph.org Foundation recommends that .ogg only be used for Ogg Vorbis audio files.

Unfortunately, the built-in vorbis encoder is considered to not meet quality standards so the compromise that we ended up implementing is to default .ogg to vorbis when libvorbis is available, and default to FLAC when it is not. Fortunately for Debian and Ubuntu users, the libav package that is available in the repository does in fact depend on libvorbis.

Another time I had to dig into libav code was when I discovered that some of my .wma songs would have broken, glitchy playback if you seek to the beginning. Seeking to any other location seemed to work fine. When I investigated this behavior, I noticed that it occurred in avplay, the command line audio playback tool that ships with libav. I proposed a patch which short circuited seeking to 0 and skipped all the complicated seeking code. Janne Grunau not only committed my patch but he took the opportunity to revisit the ASF seeking code, tidy it up, and fix a bunch of issues.

I'm pretty happy about this patch landing in libav as it makes a whole set of songs able to be played for me that previously could not. A pretty important "feature" for a music player.

Finally, while investigating the best way to implement loudness compensation, I realized that simply adjusting the volume on an audio stream is not enough. If the song is so quiet that the amount we would have to turn the volume gain up to exceeds 1.0, we would end up with clipping. The solution to this is to detect whether it is possible that the song could clip and if so, use a compressor instead of a simple volume gain:

A compressor, otherwise known as a limiter, allows us to turn the volume up without clipping. The tradeoff is that it distorts the audio signal. This is why we prefer a simple volume gain, but fall back to a compressor if we need to turn the volume up high enough. This is what VLC does when you turn the volume up past 100% into the red.

In order to have this functionality, I ported the "compand" audio filter from ffmpeg. Now libgroove has the ability to turn the volume up beyond 100% like VLC, although I don't recommend doing it. It's much better for sound quality to turn the gain up on your physical speakers.

Conclusion

3 years, 6 months from git init and Groove Basin is still under active development. Here's what the UI looks like today:

Some of the features that it provides are:

Fast, responsive UI. It feels like a desktop app, not a web app.
Dynamic playlist mode which automatically queues random songs, favoring songs that have not been queued recently.
Drag and drop upload. Drag and drop playlist editing. Rich keyboard shortcuts.
Lazy multi-core EBU R128 loudness scanning (tags compatible with ReplayGain) and automatic switching between track and album mode. "Loudness Zen"
Streaming support. You can listen to your music library - or share it with your friends - even when you are not physically near your home speakers.
MPD protocol support. This means you already have a selection of clients which integrate with Groove Basin. For example MPDroid.
Last.fm scrobbling.
File system monitoring. Add songs anywhere inside your music directory and they instantly appear in your library in real time.
Supports GrooveBasin Protocol on the same port as MPD Protocol - use the `protocolupgrade` command to upgrade.

If you like you can try out the web interface client of Groove Basin on the live demo site. It will probably be chaotic and unresponsive if there is a fair amount of traffic to this blog post, as it's not designed for a large number of anonymous people to use it together; it's more for groups of 10 or less people who actually know each other in person.

The roadmap moving forward looks like this:

Tag Editing
Music library organization
Accoustid Integration
Playlists
User accounts / permissions rehaul
Event history / chat
Finalize GrooveBasin protocol spec

Groove Basin still has lots of issues but it's already a solid music player and it's only improving over time.

At some point I plan to write a tutorial article detailing exactly how to get this application running on a Raspberry Pi. It's mostly straightforward but there are enough "gotchas" here and there that I think it could be a useful article.

(update 2014-June-19) I have now written this article: Turn Your Raspberry Pi into a Music Player Server

Feel free to star or watch the Groove Basin GitHub repository if you want to keep track of progress.