tl:dr; Tensorflow compilation teaches you about the complexity of present-day software design.
Compiling Tensorflow is a curious experience. If I put on my external-user hat, the process is baffling. Many Tensorflow choices are motived by development practices inside Google, rather than common open source development idioms. And so, as a Google engineer, I can explain what is going on and the motivations for the choices.
Why compile Tensorflow?
I want to run Tensorflow on an old Intel CPU that doesn't have AVX-instruction support. These are special "vector" instructions that speed the computation on large data streams, and are available on relatively new Intel and AMD processors. The solution is to compile Tensorflow from source as Tensorflow prepackaged binaries (after version 1.6) are compiled expecting AVX support on the processor.
No problem: I'm an engineer and have done my share of large system compilations. I can do this.
Tensorflow compilation has only been tested on gcc 7.3.0, which was released in November, 2019. The latest version of gcc shipping with Ubuntu 20.04 is 9.3.0. If a user is compiling software from source, they are probably going to use a recent version of the compiler toolchain. I doubt most users will install an old version of gcc on their machine (or in a Docker image) just to compile Tensorflow. I didn't either, and went with gcc-9.3 with fingers crossed.
Perhaps it is the complexity of software development today. With the pace of development and releases: you cannot possibly support all versions of gcc, all versions of Ubuntu, Debian, Mac OS, Windows, all combinations of compute architecture: x86, x86_64, x86 with AVX, arm64, GPU with cuda. Add to this all the complexity of different target platforms: python, C++, ...
Unlike a few years ago, compilers like gcc and llvm are themselves being updated frequently. This is great, as bugs can be fixed, but it leads to a large burden on supporting different toolchains.
The reason for this is probably because Google engineers compile the whole world. Google production binaries shouldn't rely on the specific version of eigen3 you happen to have on your development machine. Instead, you get a specific version of eigen3 from the repository (the "mono" repo), and use that. Ditto for llvm. Most of this open-source dependency code does not diverge too far from upstream, as bugfixes are reported back to the authors. This provides some sanity of dependencies. I suspect the version of llvm or eigen3 chosen are the same versions that were in the mono repo at the time Tensorflow 2.4 was released.
Just for fun, I'd love to compile PyTorch from source as well. It seems to follow the open source paradigm closely: you install specific dependencies using yum/apt, and it uses those directly.
Disclaimer: I'm a Google employee, but these are my own opinions from the public Tensorflow project. I did not examine any Google confidential systems to arrive at these observations.