I finished building my Talos II system, and I decided to post my thoughts on it here.
This is an excellent machine, the best workstation I’ve ever used. It’s very fast — it compiles Emacs and the kernel in 2 minutes, and just spins up the fans a little bit while doing so (under normal operation it’s quiet). And it’s in a completely different league in terms of openness than any other computer I’ve owned (and any other high performance computer on the market today).
The system came with a nice user’s manual, and full schematics! I’ve already referred to the schematics to set up the pinmux for a rarely-used serial port so that I could use a TTL serial cable I had lying around (I submitted the device tree patch to Raptor).
It’s really two computers in one: the “baseboard management controller” (BMC), a low power ARMv6, with a full distro on it, and the main POWER9 CPUs. The BMC boots up as soon as you plug in the power supply, before you even press the power button. (It would be nice if there were an ncurses top-like viewer for fan speeds and temperatures that I could leave running on the BMC serial console, but I haven’t found such a thing yet.)
It has serial ports everywhere! One right into the main CPU, and two into the BMC. This is great for low level development, e.g., breaking into bootcode at an early stage.
I left the machine running overnight after first booting it. My neighbourhood had a power glitch, and in the morning I discovered the main CPU was off, and power cycling it via the BMC wasn’t working. Before unplugging and plugging back in, I asked on #talos-workstation, and it turns out I had hit a bug in the first-release firmware. With special commands I was able to power cycle the main CPU just via BMC software (no unplug required). Wanting to know the details, I asked if there was a data sheet for the chip I was interacting with. The amazing thing, from an openness perspective, is that one of the Raptor engineers instead referred me directly to the Verilog source code of the FPGA handling the power sequencing. No searching for datasheets, no black box testing, just straight source code (which is recompilable using an open source FPGA toolchain, BTW.)
It’s so refreshing to not have to do reverse engineering and speculation on opaque things when something fails. On this machine, everything is there, you just look up the source code.
I’ve always disliked the inflexibility and opacity of BIOS/EFI in the x86 world. IBM has done an amazing job here with OpenPOWER. All the early boot code is open, no locked management engines or anything like that. And they’ve adopted Petitboot as the bootloader. It runs on a Linux kernel, so I was able to bootstrap via deboostrap over NFS by building everything within the bootloader environment. Running a compiler in a boot environment is surreal. Even with free options on x86 like libreboot or coreboot, and GRUB, the boot environment is extremely limited. With Petitboot at times I wondered if I even needed to boot into a “desktop” kernel (at least for serial-only activities.)
Now I’m setting up my development environment and I’m learning about the PPC64 ELFv2 ABI, with a view toward figuring out how to bootstrap SBCL. I feel lucky that I got a POWER9 machine early while there are still some rough edges to figure out.