Tuesday, January 19, 2021

FreeBSD, the alternative

 I like FreeBSD. Something about a barebones, simple UNIX that is clean at the core. Not particularly a pragmatic choice, I'll gladly admit. For most needs, a Linux installation is both simpler and more capable. More people are familiar with Linux, so this is a niche world.

But FreeBSD is a hobby, and a particularly gratifying hobby.


The simplicity of FreeBSD is a huge draw, you can get very far with an ancient Intel machine and 1 GB of RAM. The source code is a pleasure to read, the system is easy to modify, and there is relatively little of it on a base install. Full system upgrades are clean and easy, the ports system is elegant though often impractical.

For most people, the easiest way to run FreeBSD would be to use a virtual image, and boot it up using qemu (Linux) or Hyper-V (Windows). Both work very well, both support networking, which is really what you're looking for. Face it, you're not booting into FreeBSD for its giant catalog of games or media utilities.

Get the VM images from here. qcow2 works for QEMU, vhd works for Hyper-V, and vmdk works for VMWare. 

For QEMU, I use the following commandline:

qemu-system-x86_64 \
        -m 2048 \
        -enable-kvm \
        -hda bsd-with-jail.img \
        -net user,hostfwd=tcp::10023-:22 \
        -net nic \
        -curses

That specifies:

  • (-m 2048) 2048 MB RAM
  • (-enable-kvm) use the Linux KVM hypervisor
  • (-hda bsd-with-jail.img) the disk bsd-with-jail.img as the first hard disk 
  • (-net nic)  a full Network Interface Card, make your FreeBSD setup trivial as it can get a network interface called em0
  • (-net user, hostfwd=tcp::10023-:22) network translation is user-level, so no root permissions required and forward port 10023 on the host to port 22 on the guest. This allows you to ssh to the FreeBSD guest by ssh'ing port 10023 on the host.
  • (-curses) Use the text-based interface using libncurses rather than a graphical interface. Great for running on cloud instances, or through GNU screen so you can leave it on forever.

Other ports can be forwarded for running web servers, dns servers, ...


The same setup can be replicated on Hyper-V. For network, you can use the default switch. Port forwarding is more complicated, and you could achieve it with ssh tunnels on the Windows host or in the FreeBSD guest.


FreeBSD runs fine with 1GB of RAM, and is a great system to learn the fundamentals of computing. You can play with Dtrace, create constrained environments called Jails, or modify the behavior of a clean UNIX kernel.  A lot of this knowledge translates to Mac OS, as the OS X kernel is a hybrid of FreeBSD and Mach, and the userland is very similar to FreeBSD.

You can easily run a dozen jails on 1G of RAM, depending on what you do with them.  Each jail is separated from the other and can be a low-cost, throwaway computing environment. Similar to Docker, or Linux namespaces, root in a jail is safe, and cannot damage the host FreeBSD system.

Set up your own UNIX sandbox today!




Thursday, January 07, 2021

Mac OS Kernel (XNU) source, on github

 Few people know that Apple releases the source code for many components of their MacOS software. Many are surprised when they hear it.

Apple has a full site for their open source releases. Their most recent Mac OS release is for 11.0.1 (Big Sur). Not all source is released. There are the usual UNIX utilities: Perl, Python, Bash, and userland tools that make a functioning system. Functioning, but not always updated, as Apple doesn't always track the latest versions. They still ship Python 2.5, and other tools can be many years old. This is why you need to get a functional, bugfixed UNIX userland with homebrew or other tools.


The most interesting source there would probably be the kernel, called XNU. It is a combination of Mach, a microkernel, and BSD, a monolith. There's some Solaris code there, some Joyent code, some Sun code, NeXT, UC Berkeley, Carnegie Mellon code, and of course, Apple code.

Apple does have an official github profile, which also hosts xnu. But it was last updated two years ago, and stands at version 4903, while the latest version is 7195.


Their open source browser is passable but doesn't show every file properly. I find it much easier to read through code using Emacs so I pulled it into a github repository if anyone wants to clone it or browse it online.



Image credit: Apple Inc

Tuesday, December 29, 2020

Offline-first, the design of great picture gallery websites

tl:dr; How to use service workers to make your website responsive and functional offline


Websites have come a long way from the boring world of static <html> markup. There's a lot of new functionality in browsers. This new functionality brings benefit for big websites, but these techniques are useful for small personal websites as well.

One of the more important changes has been the support of Service Worker in many browsers. A Service Worker is Javascript code that can intercept network requests, and satisfy them locally on the client machine, rather than making server calls. There's a lot you can do with Service Workers, but I was most interested in making my home picture gallery work offline.

I wanted to allow a person to load up the gallery and be able to view it on their mobile phone or desktop even if the connection is lost. My photographs are shared with static html that I create using sigal, a simple image gallery software written in Python. It uses the Galleria library to create web pages that are self-contained. Since the galleries are static html & Javascript, they make a great case study for simple and fast web pages. In the current gallery, the images are downloaded as they are needed, with the next few images prefetched so the user doesn't have to wait. I wanted to make this entirely offline-first, so the images are downloaded, stored offline. Each image in my gallery is 150kb, and galleries have 10-20 images in them. The entire gallery is roughly 4mb, which is tiny. As a result, it loads fast and can be cached offline.

You can always implement your own Service Worker, the interface is straight-forward. If you just want to use Service Workers for browser-side caching, there is a much simpler alternative. We're going to use the upup library, which is a Javascript library that can be configured to store content offline, and cache it.

First,  Service Workers need HTTPS support. Get yourself a certificate with LetsEncrypt. This is a nonprofit that has issued 225 Million certificates already, and are valiantly helping the entire web move to 100% HTTPS. Get their certificate if you don't have it. Heck, get HTTPS support even if you don't want offline first on your website. I heartily endorse them, I have moved to HTTPS thanks to them. You should too.

Now, let's add UpUp to your website. It is important where you place it. The upup library can answer requests for a specific scope (subdirectory of your website). It sets its scope based on its location on your website. Since you want the library to serve content to your entire website, you want the library to be as close to the root level of your website. Let's see a concrete example in action.

Let's say your website is gallery.eggwall.com. If you put the javascript at gallery.eggwall.com/js/  then the javascript can only cache offline content for /js and not for gallery.eggwall.com/g_Aircrafts/.  To serve content for the entire subdomain gallery.eggwall.com you want the Javascript at gallery.eggwall.com/


We're going to put it at gallery.eggwall.com, with this in the html:
    <script src="https://gallery.eggwall.com/upup.min.js"></script>


So download the upup library and put the contents of the dist/ directory at the root level of your website. Putting random javascript like this is usually a bad idea, so examine the source code, make sure you understand what the Javascript is doing. The service worker source is much more important than the framework code.

Now that you have the library in place, invoke the UpUp.start method, and give it the base web page (index.html) and all the content that you want it to cache. The references here have to be relative to the location of the upup.sw.min.js. If you put the library in the root of your page, all the references here have to be relative to your root page:
    <script>
      UpUp.start({
          'content-url': 'g_Aircraft/index.html',
          'assets': [
              "/static/jquery-3.3.1.min.js",
              "/static/galleria.min.js",
              "/static/themes/classic/galleria.classic.min.js",
              "/static/plugins/history/galleria.history.min.js",



For simple pages like this, I find it helpful to include the <base> tag to remind me that everything is relative to the root:
<base href="https://gallery.eggwall.com">

 On this gallery, all images and content is stored in the subdirectory g_Aircraft/. All thumbnails are stored in g_Aircraft/thumbnails/. So you want to load up all the images in upup.start:


    <script>
      UpUp.start({
          'content-url': 'g_Aircraft/index.html',
          'assets': [
              "/static/jquery-3.3.1.min.js",
              "/static/galleria.min.js",
              "/static/themes/classic/galleria.classic.min.js",
              "/static/plugins/history/galleria.history.min.js",
                 'g_Aircraft/_DSC1984.JPG',
                 'g_Aircraft/thumbnails/_DSC1984.JPG',
                 'g_Aircraft/_DSC1986.JPG',
                 'g_Aircraft/thumbnails/_DSC1986.JPG',
                 'g_Aircraft/_DSC1989.JPG',
                 'g_Aircraft/thumbnails/_DSC1989.JPG',
                 'g_Aircraft/_DSC1991.JPG',
                 'g_Aircraft/thumbnails/_DSC1991.JPG',
                 'g_Aircraft/_DSC1992.JPG',
                 'g_Aircraft/thumbnails/_DSC1992.JPG',
                 ...
                 'g_Aircraft/_DSC2148.JPG',
                 'g_Aircraft/thumbnails/_DSC2148.JPG',

          ]
      });

    </script>

You don't need to change anything on the server for this. I use nginx, but anything serving out static pages will do just fine. Offline-first changes your server-side metrics because many requests are handled directly on the client. So you won't be able to see when the client loaded the page again. Browser-side caching messes with these numbers too, so if you will have to roll your own Javacript if you want perfect interaction tracking.

These changes are fine for browsers that don't support service workers. Older browsers will be served the static content. Since they don't initialize a Service Worker, all requests will go to the server, as before. The Upup.start section just gets ignored. Browser-side caching will continue working as before, too.

With this, the UpUp service worker will cache all the content specified in assets above. The user can go offline, and the page still functions normally. The gallery demo is available if you want to play with it.

Service Workers add complexity. You can debug the site using Chrome Developer tools -> "Application" -> "Service Workers" or "Web Developer" -> Application -> "Service Workers" in Firefox. You want to check if the service worker initialized and is storing content in "Cache storage" -> upup-cache.

Here's a demo video on Android. You can see the user load up the site on their mobile browser, go offline, and still navigate normally.




Monday, December 21, 2020

Audio feature extraction for Machine Learning

tl:dr; Books and papers for audio processing, for building Machine Learning models.


I've been experimenting with Machine Learning for audio files.

Much machine learning literature for music deals with MIDI files, which are a digital format for specifying notes, duration and loudness. This is the format to use for models that work on the level of individual notes. A simple introduction for training such models is using the book "Hands On Machine Learning 2nd edition" (2019), An exercise in Chapter 15 (RNNs) introduces you to Bach chorales, and shows how to generate chords from digial music. Google's Magenta project has datasets and models for such discrete note-level training, generation and inference.


While MIDI is a convenient format for discrete music, most music data is stored as waveforms rather than MIDI. These are either raw WAV / FLAC, or encoded with the MP3 or Ogg Vorbis encoding. Extracting features from these files is considerably harder and requires a good understanding of audio analysis and the kinds of features that these waveforms represent. Depending on the audio stream, some understanding of music structure might come in handy.

Essentia is a freely-available library for handling audio information for music analysis, with bindings for C++ and Python. A Python tutorial on Essentia covers some of the basics.

A survey paper "An Evaluation of Audio Feature Extraction toolboxes", by Moffat D., Ronan D., and Reiss J.D. (2015) covers some toolkits in a variety of languages.


In order to use any toolkit for feature extraction, you need to know what features to look for, and which algorithms to select. This is a fast-moving area of research. The book: "Fundamentals of Music Processing", by Meinard Müller (2016) covers all the background on audio encoding, music representation, and analysis algorithms. The first two chapters cover the core concepts, and chapters 3 onwards dive into individual topics, and can then be read in parallel. This allows software engineers to understand the basics, and then immediately focus on the task at hand. This book is dense and requires a firm understanding of Linear Algebra. Once you know the terminology, you can read the relevant papers on feature types and either use a readily available library, or write the extraction code yourself.


Finally, pre-trained models in Essentia allow you to use existing models for classification tasks. An online demo exists to test out the functionality in a browser.



Friday, December 11, 2020

Tailscale: the best, secure, private VPN you need

 tl:dr; Tailscale easily sets up remote machines as though they were on your local network (VPN)


Tell me if this situation sounds familiar: you have a variety of machines, at home and at remote locations. All combinations: behind a proxy/NAT,  connected directly, a mix of Linux and Mac/Windows systems, a mixture of physical hardware and cloud instances. You want these machines to behave like they're on a local network and to use them without jumping through hoops. You could access these machines with proxies like ngrok or other tunneling software like ssh, but that's complicated.

You could set up a VPN, but that is time consuming and difficult to manage. Setting up a VPN requires skill and is difficult to get right across host platforms or architectures.


Tailscale is an excellent, simple-to-setup and secure VPN, with clients available for all major systems and architectures. You use an existing authentication (like your Gmail address), download the client software for your platform, and authenticate by navigating to a web address. Setting it up is refreshingly simple. It even sets up a dns, so you can refer to your machines by their hostname: p31 can be used instead of the full VPN IP address like 100.107.137.29.

I've used a variety of VPN systems in the past; I've also set up my own tunnels using different providers; I've rolled my own tunnels from first principles. Compared to existing systems, Tailscale is easier to setup, efficient, and has great network performance. Network latencies are lower than traditional hub-and-spoke systems, which relay through a central server. If the central server is located far from both VPN'd machine, network performance is usually poor.

Right now I'm pinging machines that are behind a NAT, accessing web pages on a different physical network, all by referring to simple hostnames. There's arm32, arm64, x64, different operating systems, physical and cloud instances that all appear as a local Class-A network. This is like magic!

Tailscale is also great for working on projects on your cloud or local instance without exposing it to the wild Internet traffic.





Image courtesy: Wikipedia.


Saturday, November 28, 2020

Generating art with Tensorflow

tl:dr; Thought-provoking, computer-generated art

That is computer generated: the output of transfer learning using Tensorflow.

I was playing with some transfer learning, training a convolutional model on famous works of art, and applying the style to my own photographs when I realized the process could be done in reverse. So instead of taking the style of a famous artist, and applying it to my photographs, I've taken the style of my photographs and applied it to famous art. Here are the two pictures, the content is a famous Kandinsky image, and the style applied was that of a tiger in San Francisco.


Here's another. A famous Monet that has been modified with the style of a giraffe from the same zoo. The original image is pretty serene, and I love its transformation into a drug-fueled nightmare.


And finally, my favorite, two mundane pictures that together make evocative art.

The content image is from the page linked below, and the lion is my own photograph. Either picture is average by itself.


A computer and a human are more superior at chess than a computer alone, or a human alone. The same can be said about art. A human combined with a well-written software can make something amazing. 

This is based on the Tensorflow style transfer tutorial.