Saturday, November 28, 2020

Generating art with Tensorflow

tl:dr; Thought-provoking, computer-generated art

That is computer generated: the output of transfer learning using Tensorflow.

I was playing with some transfer learning, training a convolutional model on famous works of art, and applying the style to my own photographs when I realized the process could be done in reverse. So instead of taking the style of a famous artist, and applying it to my photographs, I've taken the style of my photographs and applied it to famous art. Here are the two pictures, the content is a famous Kandinsky image, and the style applied was that of a tiger in San Francisco.


Here's another. A famous Monet that has been modified with the style of a giraffe from the same zoo. The original image is pretty serene, and I love its transformation into a drug-fueled nightmare.


And finally, my favorite, two mundane pictures that together make evocative art.

The content image is from the page linked below, and the lion is my own photograph. Either picture is average by itself.


A computer and a human are more superior at chess than a computer alone, or a human alone. The same can be said about art. A human combined with a well-written software can make something amazing. 

This is based on the Tensorflow style transfer tutorial.

Thursday, November 26, 2020

Best Practices with Tensorflow: Test All Your Inputs Early

Tensorflow model training can take a long time. Depending on the machine, even the first epoch can take many hours to run. In this case, the remaining arguments like validation_data are only evaluated much later. Incorrectly formed input will be detected many hours later, deep in the stack, with incomprehensible errors. By this time, you might have forgotten what the shapes of the input or their expected data type is. You want to detect failures early, when you run model.fit() rather than a day later.

One easy way to avoid this is to wrap all model.fit() calls in a single method, which tests itself out with a smaller input. Here's the idea. If you have training data (X_train and y_train), and validation data (X_valid and y_valid), you take the first 10 observations, and call the training method itself.

This run will be fast, because you are training a model with a tiny set of observations, and the validation is done with a tiny set of values too.

It also ensures that any changes to the model.fit() line are correctly reflected both in the test_shapes run and in the real run. Any training done on the model is minor: a real run with the full data and full set of epochs will change the model weights, so this is harmless. In case you are using a pre-trained model, you can clone the model before trying the test run.

If you are using data generators, then the same mechanism can be used. Create a new tfdata object with a few observations both on the training data, and another with the validation data. Pass that to your model training function.

def fit_e9_model(model, X_train, y_train,
                 X_valid, y_valid, epochs,
                 batch_size=32, verbose=0,
                 test_shapes=True):

    # This fails after a day, when the validation data is incorrectly shaped.                              
    # This is a terrible idea. Failures should be early.                                                   

    # The best way to guard against it is to run a small fit run with                                      
    # a tiny data size, and a tiny validation data size to ensure that                                     
    # the data is correctly shaped.                                                                        

    if (test_shapes):
        print ("Testing with the first 10 elements of the input")
        X_small = X_train[:10,:,:,:]
        y_small = y_train[:10]
        X_valid_small = X_valid[:10,:,:,:]
        y_valid_small = y_valid[:10]
        # Call ourselves again with a smaller input. This confirms                                         
        # that the two methods are calling the same .fit() method, and                                     
        # that the input is correctly shaped in the original method too.                                   
        fit_e9_model(model, X_small, y_small,
                     X_valid_small, y_valid_small,
                     epochs=epochs, verbose=verbose,
                     batch_size=5,
                     test_shapes=False)

    # If that worked, then do a full run.                                                                  
    history_conv = model.fit(x=X_train, y=y_train, batch_size=batch_size,
                             validation_data=(X_valid, y_valid),
                             epochs=epochs, verbose=verbose)
    return history_conv



history = fit_e9_model(model, X_train, y_train, X_valid, y_valid, epochs=10)

Sunday, November 22, 2020

User-facing tool should have user-understandable errors

Here's some simple Tensorflow code to create a CNN, and train it. 
import tensorflow as tf
import tensorflow.keras as keras

def create_model(optimizer="sgd"):
    deep_model = keras.models.Sequential([
        keras.layers.Conv2D(64, 7, activation="relu", padding="same", 
                            input_shape=[1, 28, 28], name="input"),
        keras.layers.MaxPooling2D(1,name="firstPool"),
        keras.layers.Conv2D(128, 3, activation="relu", padding="same", 
                            name="first_conv_1"),
        keras.layers.Conv2D(128, 3, activation="relu", padding="same", 
                            name="first_conv_2"),

        keras.layers.MaxPooling2D(1, name="secondPool"),
        keras.layers.Conv2D(256, 3, activation="relu", padding="same", 
                            name="second_conv_1"),
        keras.layers.Conv2D(256, 3, activation="relu", padding="same", 
                            name="second_conv_2"),

        keras.layers.MaxPooling2D(1, name="thirdPool"),

        keras.layers.Flatten(name="flatten"),
        keras.layers.Dense(128, activation="relu", name="pre-bottneck"),

        keras.layers.Dropout(0.5, name="bottleneckDropout"),
        keras.layers.Dense(64, activation="relu", name="bottleneck"),

        keras.layers.Dropout(0.5, name="outputDropout"),
        keras.layers.Dense(10, activation="softmax", name="output"),
    ])
    
    deep_model.compile(loss="sparse_categorical_crossentropy",
                      optimizer=optimizer,
                      metrics=["accuracy"])

    return deep_model

def fit_model(model, X_train, y_train, X_valid, y_valid, epochs):
    history_conv = model.fit(X_train, y_train, validation_data=[X_valid, y_valid],
                             epochs=epochs, verbose=0)
    return history_conv

def plot_history(history, name):
    c10.plot_training(history, name, show=True)
    
model = create_model()
history = fit_model(model, X_train, y_train, X_valid, y_valid, epochs=10)
plot_history(history, "naive_deep_mnist")

When you run it, it causes this enormous error
  ---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-95a7830ebd3c> in <module>
     41 
     42 model = create_model()
---> 43 history = fit_model(model, X_train, y_train, X_valid, y_valid, epochs=10)
     44 plot_history(history, "naive_deep_mnist")

<ipython-input-26-95a7830ebd3c> in fit_model(model, X_train, y_train, X_valid, y_valid, epochs)
     34 
     35 def fit_model(model, X_train, y_train, X_valid, y_valid, epochs):
---> 36     history_conv = model.fit(X_train, y_train, validation_data=[X_valid, y_valid], epochs=epochs, verbose=0)
     37     return history_conv
     38 

/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py in _method_wrapper(self, *args, **kwargs)
    106   def _method_wrapper(self, *args, **kwargs):
    107     if not self._in_multi_worker_mode():  # pylint: disable=protected-access
--> 108       return method(self, *args, **kwargs)
    109 
    110     # Running inside `run_distribute_coordinator` already.

/usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py in fit(self, x, y, batch_size, epochs, verbose, callbacks, validation_split, validation_data, shuffle, class_weight, sample_weight, initial_epoch, steps_per_epoch, validation_steps, validation_batch_size, validation_freq, max_queue_size, workers, use_multiprocessing)
   1096                 batch_size=batch_size):
   1097               callbacks.on_train_batch_begin(step)
-> 1098               tmp_logs = train_function(iterator)
   1099               if data_handler.should_sync:
   1100                 context.async_wait()

/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    778       else:
    779         compiler = "nonXla"
--> 780         result = self._call(*args, **kwds)
    781 
    782       new_tracing_count = self._get_tracing_count()

/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    821       # This is the first call of __call__, so we have to initialize.
    822       initializers = []
--> 823       self._initialize(args, kwds, add_initializers_to=initializers)
    824     finally:
    825       # At this point we know that the initialization is complete (or less

/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
    694     self._graph_deleter = FunctionDeleter(self._lifted_initializer_graph)
    695     self._concrete_stateful_fn = (
--> 696         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
    697             *args, **kwds))
    698 

/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   2853       args, kwargs = None, None
   2854     with self._lock:
-> 2855       graph_function, _, _ = self._maybe_define_function(args, kwargs)
   2856     return graph_function
   2857 

/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   3211 
   3212       self._function_cache.missed.add(call_context_key)
-> 3213       graph_function = self._create_graph_function(args, kwargs)
   3214       self._function_cache.primary[cache_key] = graph_function
   3215       return graph_function, args, kwargs

/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3063     arg_names = base_arg_names + missing_arg_names
   3064     graph_function = ConcreteFunction(
-> 3065         func_graph_module.func_graph_from_py_func(
   3066             self._name,
   3067             self._python_function,

/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    984         _, original_func = tf_decorator.unwrap(python_func)
    985 
--> 986       func_outputs = python_func(*func_args, **func_kwargs)
    987 
    988       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

/usr/local/lib/python3.8/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
    598         # __wrapped__ allows AutoGraph to swap in a converted function. We give
    599         # the function a weak reference to itself to avoid a reference cycle.
--> 600         return weak_wrapped_fn().__wrapped__(*args, **kwds)
    601     weak_wrapped_fn = weakref.ref(wrapped_fn)
    602 

/usr/local/lib/python3.8/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
    971           except Exception as e:  # pylint:disable=broad-except
    972             if hasattr(e, "ag_error_metadata"):
--> 973               raise e.ag_error_metadata.to_exception(e)
    974             else:
    975               raise

ValueError: in user code:

    /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py:806 train_function  *
        return step_function(self, iterator)
    /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py:796 step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    /usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py:1211 run
        return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
    /usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py:2585 call_for_each_replica
        return self._call_for_each_replica(fn, args, kwargs)
    /usr/local/lib/python3.8/dist-packages/tensorflow/python/distribute/distribute_lib.py:2945 _call_for_each_replica
        return fn(*args, **kwargs)
    /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py:789 run_step  **
        outputs = model.train_step(data)
    /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/training.py:747 train_step
        y_pred = self(x, training=True)
    /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/base_layer.py:975 __call__
        input_spec.assert_input_compatibility(self.input_spec, inputs,
    /usr/local/lib/python3.8/dist-packages/tensorflow/python/keras/engine/input_spec.py:191 assert_input_compatibility
        raise ValueError('Input ' + str(input_index) + ' of layer ' +

    ValueError: Input 0 of layer sequential_17 is incompatible with the layer: : expected min_ndim=4, found ndim=3. Full shape received: [None, 28, 28]

Saturday, September 19, 2020

Compiling Tensorflow without AVX support, a Googler's perspective

tl:dr; Tensorflow compilation teaches you about the complexity of present-day software design.


Compiling Tensorflow is a curious experience. If I put on my external-user hat, the process is baffling.  Many Tensorflow choices are motived by development practices inside Google, rather than common open source development idioms. And so, as a Google engineer, I can explain what is going on and the motivations for the choices.

Why compile Tensorflow?

I want to run Tensorflow on an old Intel CPU that doesn't have AVX-instruction support. These are special "vector" instructions that speed the computation on large data streams, and are available on relatively new Intel and AMD processors. The solution is to compile Tensorflow from source as Tensorflow prepackaged binaries (after version 1.6) are compiled expecting AVX support on the processor. 

No problem: I'm an engineer and have done my share of large system compilations. I can do this.

Dependencies

Tensorflow compilation has only been tested on gcc 7.3.0, which was released in November, 2019. The latest version of gcc shipping with Ubuntu 20.04 is 9.3.0. If a user is compiling software from source, they are probably going to use a recent version of the compiler toolchain. I doubt most users will install an old version of gcc on their machine (or in a Docker image) just to compile Tensorflow. I didn't either, and went with gcc-9.3 with fingers crossed.

Perhaps it is the complexity of software development today. With the pace of development and releases: you cannot possibly support all versions of gcc, all versions of Ubuntu, Debian, Mac OS, Windows, all combinations of compute architecture: x86, x86_64, x86 with AVX, arm64, GPU with cuda. Add to this all the complexity of different target platforms: python, C++, ...

Unlike a few years ago, compilers like gcc and llvm are themselves being updated frequently. This is great, as bugs can be fixed, but it leads to a large burden on supporting different toolchains.

Lessons

Tensorflow downloads its own version of llvm. Instead of relying on the system version of llvm, which might have its own quirks, it just gets everything.

That's not all: Tensorflow downloads all its dependencies: boringssl, eigen3, aws libraries, protobuf libraries, llvm-project from github or their respective repositories. I suspect most of these go into //third-party. 

It is an interesting choice to download most of these rather than expecting them installed locally. On one level, it reduces the complexity in figuring why Tensorflow builds on Ubuntu versus Fedora, or on FreeBSD. But managing these packages also adds complexity. How do you know which version of protobuf or llvm to check-out, and what if those dependencies are no longer available.
The obvious complexity is that you have to compile your dependencies too. A user might already have a pre-compiled llvm and the protobuf library.

If anything, the Tensorflow style looks similar to Android Open Source (AOSP) or FreeBSD's Ports collection.  In both of these, the downloaded repository creates a parallel universe of source and objects. You compile everything from scratch. The notable difference between FreeBSD is that the output of FreeBSD's Ports are installed in /usr/local/ and are then available to be used on the system. After you compile protobufs for Tensorflow, you still don't have the protobuf library available to the wider system.

The reason for this is probably because Google engineers compile the whole world. Google production binaries shouldn't rely on the specific version of eigen3 you happen to have on your development machine. Instead, you get a specific version of eigen3 from the repository (the "mono" repo), and use that. Ditto for llvm. Most of this open-source dependency code does not diverge too far from upstream, as bugfixes are reported back to the authors. This provides some sanity of dependencies. I suspect the version of llvm or eigen3 chosen are the same versions that were in the mono repo at the time Tensorflow 2.4 was released. 

This differs from other large open source projects. You are expected to have all the dependencies locally if you are to compile Emacs. It needs libjpeg, so install that through apt or yum. Then you realize you need x11 libraries. Ok, go get those separately. Cumbersome, and it increases the risk of a failure at runtime as your version of libjpeg might not be what the authors tested against.

Bazel does help when compiling everything. On a subsequent run, it won't need to recompile boring_ssl. Inside Google, the build system reuses objects from prior runs of other engineers which vastly speeds up individual's compiliation. An open source developer does not benefit from this on their first compile of Tensorflow. They're starting out cold. Their subsequent compile runs are sped up, of course, but how often do you compile Tensorflow again? You generate the Python Wheel, rm -Rf the checked out repo, and carry on with your Python data analysis.


Another quirk: at the end, the bazel server is still running on that machine, and it shut down after many hours of disuse. This might be fine for Google engineers who will be compiling other software, soon. For them, the cost of keeping bazel up and running is small compared to the benefit from the pre-warmed caches and memory. I suspect independent open source developers are baffled why bazel is holding on to 400+Mb of RAM, hours after the compilation is done.

The choice of bazel itself is interesting. Most open source software uses the 'make' tool, despite its numerous flaws. Bazel is an open source implementation of an internal Google build tool, so Tensorflow uses that. Even Android AOSP uses make, since bazel wasn't available in open-source back in 2008 when Android AOSP was released.

Other systems

Let's see how other systems manage this sort of complexity.

PyTorch, by comparison offers a rich selection of choices. You select the build, the OS to run on, which package manager you want to use (Conda/Pip/...) whether you have CUDA installed, and if so, which version. It finally tells you exactly how to get PyTorch installed on your system.
 
This raises a question: why can't Tensorflow be available as a pre-compiled binary in a lot more configurations? The Intel Math Kernel Library (MKL), for example, is available in 50 variants packaged natively for Ubuntu: libmkl-avx, libmkl-avx2, libmkl-avx512, libmkl-avx512-mic, libmkl-vml-avx, ... These are all variants for specific Intel CPUs, to extract the maximum possible performance from each system. Tensorflow is similar: it is built to efficiently process compute-intensive workloads. Why isn't Tensorflow available in 50 different variants, targetting avx, no-avx, avx2, avx512, ...?
Here, I am guessing the choices are due to the Google-engineer/Open Source divide. At Google, most engineers run a specific kind of machine, and so the pre-compiled binaries target these workstations and similar CPUs on cloud compute farms. Most internal (and Google cloud) users don't deviate from these computing setups. So Tensorflow on a Core2Duo from 2006, or ARM32, or ARM64 isn't a high priority. This is a real lost opportunity, because compiling multiple targets can be automated. The real cost here is of maintenance. If you do provide Tensorflow on a Core2Duo or arm32, you are implicitly providing support for it.
The open source answer here would be to appoint a maintainer for that architecture. The Macintosh PowerPC port of Linux is still maintained by Benjamin Herrenschmidt, among others. He cares about that architecture, so he helps keep it up and running. The community will probably maintain a no-avx binary if you empower them. 


 The Linux kernel is also an incredibly complex system. You are building an operating system kernel, which by definition is hardware architecture and device specific. Even in 2020, you can build the Linux kernel for machines as varied as PowerPC, ARM, MIPS, Intel x64, and Intel 386. Of course, you can choose between AVX support and not.  The Linux kernel depends on very few external libraries, and is almost entirely self-contained. It compiles with make, and generates targets that work on many more architectures than Tensorflow. It has a huge configuration system, with many many choices. Most of the complexity is the skill and expertise in understanding the options and selecting them. You can always take an existing kernel configuration from a running system, and then run 'make menuconfig' to modify the specific options you want to change.



The comparison might not be entirely fair, though. The Linux kernel has been in active development for many decades. It was always developed in a decentralized way, and therefore has perfected open source development development and release. The open source process has also been shaped by the quirks of the Linux kernel to the point where it is difficult to tell whether Linux influences open source or open source influences Linux.

 

Outcome

It took a few hours of computation on the old machine, all four CPUs were busy for a whole day. But at the end, I have Tensorflow for Ubuntu 20.04 for x86_64 without AVX support. I tried this out on a Celeron and a Core2 machine, and it works great. Tensorflow is perfect for the old machines where you can run model training for a few hours, turn the screen off, and leave it alone.

Since I have the Wheel compiled for Python3, here it is if anyone needs Tensorflow 2.4 without AVX support for Ubuntu 20.04 and Python3. If you need another version, find my email address and mail me.

Just for fun, I'd love to compile PyTorch from source as well. It seems to follow the open source paradigm closely: you install specific dependencies using yum/apt, and it uses those directly.


Conclusion

Tensorflow compilation was an interesting process to see from the outside. The compilation process is far more complicated due to the wealth of dependencies. While most users of Tensorflow are aware of the complexity of Tensorflow, a lot more complexity is visible when compiling the system. The choices here are motivated by some development practices at Google, and also make an interesting case study for large system design.
 

Disclaimer: I'm a Google employee, but these are my own opinions from the public Tensorflow project. I did not examine any Google confidential systems to arrive at these observations.

Sunday, August 09, 2020

Unable to view filesystems other than / using Docker on Linux?

I was running a Docker container on my home machine today (Linux, x86_64). Usually Docker is trouble-free and reliable, but today I ended up navigating a deep labyrinth. Since I couldn't find any good documentation on this, and to avoid this problem in the future,  I wanted to document it.

The problem was that Docker refused to mount an external filesystem on the host as a directory on the image. I'll condense the problem to make it easy to understand. Here is the symptom:

Let's say /mnt has a filesystem that is mounted (an external disk, or a thumb drive). / is the root partition, and the only other physical partition. There are a few files on the thumb drive in /mnt/

host$ ls /mnt

fileA fileB directoryA/

The following command works file, and mounts /home/user/work-dir as /external

host$ docker run -it -v ~/work-dir:/external busybox

/external in the docker image correctly shows the contents of /home/user/work-dir, as expected.


But, if I point it to the external mount, the directory /external is empty in the docker container

host$ docker run -it -v /mnt:/external busybox

You'd expect /external in the busybox image to contain fileA, fileB and directoryA. But they don't. It gets weirder.
 If you export all of / as /external, then busybox can see everything in / except /mnt/

host$ docker run -it -v /:/external busybox

# ls /external/mnt

The output is empty, so /external/mnt/ exists, and is empty. /external/bin/ exists, and maps to /bin on the host machine, correctly, as expected.

I struggled with this for a long time. strace on docker clearly shows that it thinks /mnt doesn't exist, or is read only, even though it is definitely visible and writable in the host.

host$ date >> /mnt/out; cat /mnt/out

Sun 09 Aug 2020 04:26:16 PM PDT

So the directory structure is visible, writable, but never in a docker image. Doesn't help if you try all the commands as root. What gives?


The problem was that my docker installation was from snap. Snaps have limited visibility of the remaining system. This might be fine for a snap containing a calculator app, but docker needs a lot to function.

$ sudo snap remove docker; apt install docker-ce{,-cli}

When you do that, remember to log out because otherwise bash might have hashed the location of 'docker' to /snap/bin/docker, and it takes a logout to clear bash's state and realize that docker is now at /usr/bin/docker.


Snaps are a great idea, and I look forward to a time when they are functionally equivalent to packages installed through apt. For now, my experience with docker on snap makes me reluctant to use them. They're an additional layer of complexity to understand and debug, in a system that is already plenty complicated.

Wednesday, July 22, 2020

Playing flac files on Linux

It used to be that you just ran flac files through flac123, like so:
$ flac123 file.flac

That doesn't work any more, and the flac commandline tool only contains a single binary called 'flac' that decodes, encodes, etc.

Playing with 'mplayer' still works but it produces a lot of errors, and the audio level is low. The trick seems to be to call flac with the decode (-d) option, output to stdout (-c) and then pipe it to alsa player over stdin

$ flac -c -d /path/to/files/*flac | aplay


Saturday, July 18, 2020

Social trackers on banks: Etrade

Etrade has a single advertiser: Wall Street on Demand.

This is much better than other financial websites, though I'm not sure what Wall Street on Demand does.



Social tracker on banks: Schwab

Charles Schwab only has trackers from Tealium, Confirmit and SOASTA mPulse.

The site works well without it, and I feel a lot safer knowing that these companies are not snooping on my online financial activity.

It is never a good sign when you have to search online for these company names. They know a lot more about you than you know about them.


Social trackers on banks: Vanguard

Vanguard is much better on their web interface. You only have trackers from Adobe Audience Manager, and Doubleclick.

There are site analytics from two companies. It is best not to send this data to third-parties like Adobe or AppDynamics, since I don't know what these companies are tracking. They almost certainly have my IP address, time of day, browser and OS version, do they have my bank account number and other details as well?



Social trackers on banks: Ally bank

There are 12 trackers on ally.com that have nothing to do with my relation with ally.

These are advertisers, trackers by Snapchat, Adobe, Bing, Pinterest, Qualtrics and Facebook.

The only legitimate tracker that deserves to be there is LivePerson, for customer service. Even that should only start when I have an actual customer support interaction rather than being aware of my page interactions all the time.

What data are these trackers collecting, and why are they required on a bank website?






(Data obtained from Ghostery, the privacy extension for Google Chrome)

Wednesday, June 17, 2020

Game review: Human Resource Machine

Human Resource Machine is a game about writing programs that make a little person do stuff. Every level has a new task, and the initial tasks start out very easy, and then the difficulty builds up. The final level has you writing a sorting routine in a programming language.

The programming language is a bit like assembly, for a computer with a single register. The primitives are similar to assembly too, 'jump if zero', for example, along with indirect addressing in later levels. If you know assembly, you might have to adjust to the style: you cannot increment a register, only memory locations.

Leaving the technicalities aside, it has beautiful music, it has a compelling storyline, and the puzzles are engaging and fun. The levels in the end get difficult, and you might lose interest in solving them.
The levels are a beauty, it works great on Linux.

The most fascinating thing for me was to watch its adoption on Steam. The game was made five years ago, and about 12% of the players have finished the main plot. This is incredible. That means one in ten people who bought the game have solved the toughest challenge in the main story. That is not the toughest challenge, which is prime factorization. I still haven't solved that one.

It is made by the same folks who brought us World of Goo and Little Inferno. As is the case, their plot has a sub-text that continues on this game as well.

Buy the game, you won't regret it. Here's my solution of the last level: and you can see how the main game works. You get letters or numbers at the IN line on the left, your man processes them, and then puts them in the OUT line on the right. The floor is numbered, and that's the main memory. The program you wrote is along the right side of the screen. In the video,  the zero-terminated lists are being sorted using a naive insertion sort. And once they are fully sorted, they are output (smallest to largest) in the OUT line. Then the little man sets up the memory and inputs more lists.


Friday, May 22, 2020

Thread-local stack on Linux

Threads on POSIX systems like Linux share an address space, but have their own registers and execution state like their own stacks. We also know that the stack is mapped in the same virtual memory space.

The illusion of thread-local stacks is achieved by having the stacks of the different threads start at different locations in virtual memory.

threadStorage.c prints the location of the stack.

Here are numbers from a few architectures:
x86-64: (kernel v4.15)
$ ./threadStorage
one: Address of first variable 0x7fc7bbc90edc
two: Address of first variable 0x7fc7bb48fedc
main: Address of first variable 0x7ffe79c59e1c
three: Address of first variable 0x7fc7bac8eedc

The difference between one and two, and two and three is 0x8392704

The stack of the main thread starts at 0x7ffe79c59e1c, which is much higher than the stack of the first thread. Since most processes are single-threaded, they get much larger stack frames.

Numbers from other architectures I had lying around:

arm32: (kernel v4.19.66)
$ ./threadStorage
one: Address of first variable 0x76e3ee68
two: Address of first variable 0x7663de68
main: Address of first variable 0x7e9c6c30
three: Address of first variable 0x75e3ce68

arm64: (kernel v4.15)
$ ./threadStorage 
one: Address of first variable 0xffffab10a9dc
two: Address of first variable 0xffffaa9099dc
main: Address of first variable 0xffffe26742ec
three: Address of first variable 0xffffaa1089dc

MIPS: (kernel v3.18.140)
$ ./threadStorage
main: Address of first variable 0x7fe384a4
three: Address of first variable 0x7603ef3c
two: Address of first variable 0x7683ff3c
one: Address of first variable 0x77040f3c


In all these systems, the addresses are randomized, so successive runs of the same program produce slightly different results.  The offsets between the locations in successive runs is constant.

The repository is here: https://github.com/youngelf/linux-innards

Tuesday, March 03, 2020

Steam on Windows 10

tl:dr; The Steam client on Windows 10 does not work for me.

This computer has great network connectivity, I can login to Steam on the browser on that machine, but the Steam client doesn't think I'm authenticated. All downloads are stuck, perhaps waiting for authentication.

The forums don't mention anything, and my Steam client on Linux on a separate computer is just fine. I suspect their login servers are alright, this is a combination of their client, or Windows 10.

Steam support is non-existent. I have tried re-installing Steam, and while that was able to authenticate and download games, it got itself wedged after the next reboot. Steam's network troubleshooting steps are totally generic. In this case, they are completely unhelpful.



It is frustrating to 'purchase' content from a store, and struggle to use the content. By comparison, the  CDs of games that I purchased many years ago continue to work reliably.

This is a machine that is only used for gaming. This is past the point where it is productive.


A lot has been written about the benefits of online stores, but a failure like this makes me anxious. Needless to say, I doubt I will purchase games from Steam. The Windows-only games I have purchased till now are a sunk cost, partly because of Steam and partly because of Windows itself. The cost of keeping a Windows machine around, updated and in good order is not worth the games themselves.