Vikram Aggarwal: December 2020

Tuesday, December 29, 2020

Offline-first, the design of great picture gallery websites

tl:dr; How to use service workers to make your website responsive and functional offline

Websites have come a long way from the boring world of static <html> markup. There's a lot of new functionality in browsers. This new functionality brings benefit for big websites, but these techniques are useful for small personal websites as well.

One of the more important changes has been the support of Service Worker in many browsers. A Service Worker is Javascript code that can intercept network requests, and satisfy them locally on the client machine, rather than making server calls. There's a lot you can do with Service Workers, but I was most interested in making my home picture gallery work offline.

I wanted to allow a person to load up the gallery and be able to view it on their mobile phone or desktop even if the connection is lost. My photographs are shared with static html that I create using sigal, a simple image gallery software written in Python. It uses the Galleria library to create web pages that are self-contained. Since the galleries are static html & Javascript, they make a great case study for simple and fast web pages. In the current gallery, the images are downloaded as they are needed, with the next few images prefetched so the user doesn't have to wait. I wanted to make this entirely offline-first, so the images are downloaded, stored offline. Each image in my gallery is 150kb, and galleries have 10-20 images in them. The entire gallery is roughly 4mb, which is tiny. As a result, it loads fast and can be cached offline.

You can always implement your own Service Worker, the interface is straight-forward. If you just want to use Service Workers for browser-side caching, there is a much simpler alternative. We're going to use the upup library, which is a Javascript library that can be configured to store content offline, and cache it.

First, Service Workers need HTTPS support. Get yourself a certificate with LetsEncrypt. This is a nonprofit that has issued 225 Million certificates already, and are valiantly helping the entire web move to 100% HTTPS. Get their certificate if you don't have it. Heck, get HTTPS support even if you don't want offline first on your website. I heartily endorse them, I have moved to HTTPS thanks to them. You should too.

Now, let's add UpUp to your website. It is important where you place it. The upup library can answer requests for a specific scope (subdirectory of your website). It sets its scope based on its location on your website. Since you want the library to serve content to your entire website, you want the library to be as close to the root level of your website. Let's see a concrete example in action.

Let's say your website is gallery.eggwall.com. If you put the javascript at gallery.eggwall.com/js/ then the javascript can only cache offline content for /js and not for gallery.eggwall.com/g_Aircrafts/. To serve content for the entire subdomain gallery.eggwall.com you want the Javascript at gallery.eggwall.com/

We're going to put it at gallery.eggwall.com, with this in the html:

    <script src="https://gallery.eggwall.com/upup.min.js"></script>

So download the upup library and put the contents of the dist/ directory at the root level of your website. Putting random javascript like this is usually a bad idea, so examine the source code, make sure you understand what the Javascript is doing. The service worker source is much more important than the framework code.

Now that you have the library in place, invoke the UpUp.start method, and give it the base web page (index.html) and all the content that you want it to cache. The references here have to be relative to the location of the upup.sw.min.js. If you put the library in the root of your page, all the references here have to be relative to your root page:

    <script>
      UpUp.start({
          'content-url': 'g_Aircraft/index.html',
          'assets': [
              "/static/jquery-3.3.1.min.js",
              "/static/galleria.min.js",
              "/static/themes/classic/galleria.classic.min.js",
              "/static/plugins/history/galleria.history.min.js",

For simple pages like this, I find it helpful to include the <base> tag to remind me that everything is relative to the root:

<base href="https://gallery.eggwall.com">

On this gallery, all images and content is stored in the subdirectory g_Aircraft/. All thumbnails are stored in g_Aircraft/thumbnails/. So you want to load up all the images in upup.start:

    <script>
      UpUp.start({
          'content-url': 'g_Aircraft/index.html',
          'assets': [
              "/static/jquery-3.3.1.min.js",
              "/static/galleria.min.js",
              "/static/themes/classic/galleria.classic.min.js",
              "/static/plugins/history/galleria.history.min.js",
                 'g_Aircraft/_DSC1984.JPG',
                 'g_Aircraft/thumbnails/_DSC1984.JPG',
                 'g_Aircraft/_DSC1986.JPG',
                 'g_Aircraft/thumbnails/_DSC1986.JPG',
                 'g_Aircraft/_DSC1989.JPG',
                 'g_Aircraft/thumbnails/_DSC1989.JPG',
                 'g_Aircraft/_DSC1991.JPG',
                 'g_Aircraft/thumbnails/_DSC1991.JPG',
                 'g_Aircraft/_DSC1992.JPG',
                 'g_Aircraft/thumbnails/_DSC1992.JPG',
                 ...
                 'g_Aircraft/_DSC2148.JPG',
                 'g_Aircraft/thumbnails/_DSC2148.JPG',

          ]
      });

    </script>

You don't need to change anything on the server for this. I use nginx, but anything serving out static pages will do just fine. Offline-first changes your server-side metrics because many requests are handled directly on the client. So you won't be able to see when the client loaded the page again. Browser-side caching messes with these numbers too, so if you will have to roll your own Javacript if you want perfect interaction tracking.

These changes are fine for browsers that don't support service workers. Older browsers will be served the static content. Since they don't initialize a Service Worker, all requests will go to the server, as before. The Upup.start section just gets ignored. Browser-side caching will continue working as before, too.

With this, the UpUp service worker will cache all the content specified in assets above. The user can go offline, and the page still functions normally. The gallery demo is available if you want to play with it.

Service Workers add complexity. You can debug the site using Chrome Developer tools -> "Application" -> "Service Workers" or "Web Developer" -> Application -> "Service Workers" in Firefox. You want to check if the service worker initialized and is storing content in "Cache storage" -> upup-cache.

Here's a demo video on Android. You can see the user load up the site on their mobile browser, go offline, and still navigate normally.

Monday, December 21, 2020

Audio feature extraction for Machine Learning

tl:dr; Books and papers for audio processing, for building Machine Learning models.

I've been experimenting with Machine Learning for audio files.

Much machine learning literature for music deals with MIDI files, which are a digital format for specifying notes, duration and loudness. This is the format to use for models that work on the level of individual notes. A simple introduction for training such models is using the book "Hands On Machine Learning 2nd edition" (2019), An exercise in Chapter 15 (RNNs) introduces you to Bach chorales, and shows how to generate chords from digial music. Google's Magenta project has datasets and models for such discrete note-level training, generation and inference.

While MIDI is a convenient format for discrete music, most music data is stored as waveforms rather than MIDI. These are either raw WAV / FLAC, or encoded with the MP3 or Ogg Vorbis encoding. Extracting features from these files is considerably harder and requires a good understanding of audio analysis and the kinds of features that these waveforms represent. Depending on the audio stream, some understanding of music structure might come in handy.

Essentia is a freely-available library for handling audio information for music analysis, with bindings for C++ and Python. A Python tutorial on Essentia covers some of the basics.

A survey paper "An Evaluation of Audio Feature Extraction toolboxes", by Moffat D., Ronan D., and Reiss J.D. (2015) covers some toolkits in a variety of languages.

In order to use any toolkit for feature extraction, you need to know what features to look for, and which algorithms to select. This is a fast-moving area of research. The book: "Fundamentals of Music Processing", by Meinard Müller (2016) covers all the background on audio encoding, music representation, and analysis algorithms. The first two chapters cover the core concepts, and chapters 3 onwards dive into individual topics, and can then be read in parallel. This allows software engineers to understand the basics, and then immediately focus on the task at hand. This book is dense and requires a firm understanding of Linear Algebra. Once you know the terminology, you can read the relevant papers on feature types and either use a readily available library, or write the extraction code yourself.

Finally, pre-trained models in Essentia allow you to use existing models for classification tasks. An online demo exists to test out the functionality in a browser.

Friday, December 11, 2020

Tailscale: the best, secure, private VPN you need

tl:dr; Tailscale easily sets up remote machines as though they were on your local network (VPN)

Tell me if this situation sounds familiar: you have a variety of machines, at home and at remote locations. All combinations: behind a proxy/NAT, connected directly, a mix of Linux and Mac/Windows systems, a mixture of physical hardware and cloud instances. You want these machines to behave like they're on a local network and to use them without jumping through hoops. You could access these machines with proxies like ngrok or other tunneling software like ssh, but that's complicated.

You could set up a VPN, but that is time consuming and difficult to manage. Setting up a VPN requires skill and is difficult to get right across host platforms or architectures.

Tailscale is an excellent, simple-to-setup and secure VPN, with clients available for all major systems and architectures. You use an existing authentication (like your Gmail address), download the client software for your platform, and authenticate by navigating to a web address. Setting it up is refreshingly simple. It even sets up a dns, so you can refer to your machines by their hostname: p31 can be used instead of the full VPN IP address like 100.107.137.29.

I've used a variety of VPN systems in the past; I've also set up my own tunnels using different providers; I've rolled my own tunnels from first principles. Compared to existing systems, Tailscale is easier to setup, efficient, and has great network performance. Network latencies are lower than traditional hub-and-spoke systems, which relay through a central server. If the central server is located far from both VPN'd machine, network performance is usually poor.

Right now I'm pinging machines that are behind a NAT, accessing web pages on a different physical network, all by referring to simple hostnames. There's arm32, arm64, x64, different operating systems, physical and cloud instances that all appear as a local Class-A network. This is like magic!

Tailscale is also great for working on projects on your cloud or local instance without exposing it to the wild Internet traffic.

Image courtesy: Wikipedia.