Netbooting

As part of my testing into a network-booted cluster of Raspberry Pis for my home lab clustering efforts, the first part is to get them to netboot properly.

Netbooting (or PXE booting) is the process of booting a computer without using some local storage medium. Many recent computers come with network booting as standard, allowing you to boot a computer with no local storage at all, should you wish.

In my case, I've found that network booting has several advantages:

Centralised storage for all servers, allowing me to back them up and reimage machines without fuss
Reclaiming Micro-SD cards, which are susceptible to burning out (e.g., my Home Assistant Pi)
Treating servers like cattle, instead of pets, and hopefully automating lots of processes with cloud-init and Ansible

Warning

I found this quite difficult to figure out and it took several hours of debugging! It's good because I now vaguely understand how network booting works, but I've found documentation to usually be quite poor and out of date.

Why Raspberry Pis?

I have lots of old Raspberry Pis at home that broadly sit around doing nothing. I opted for Pis over other mini PCs, as they were a low power, low space alternative to things such as NUC PCs. My University also seemed keen to hand the Pi 3Bs out as part of coursework and letting us keep them.

Raspberry Pis also have very good hardware support, and should just work™ out of the box with netbooting, at least for the more modern variants. In total, I think I have the following, which can be dedicated to the cluster:

Pi 4 Model B (4GB)
2 x Pi 3B+ (4GB I assume)
1 x Pi 3 Model B

All can network boot, but the 3B and 3B+ models need some initial configuration before they will do that.

Background

Boot Flow

All the guides I have found on the internet say that they are updated for the Pi 4, but the Raspberry Pi foundation must've changed something in the Raspberry Pi OS, or all the guides aren't written quite right.

Pi 3

On a Pi 3, the boot sequence is well documented. The Pi 3 requests the bootcode.bin file from the PXE boot server, over TFTP, which simply goes into the [serialnum]/start.elf file, which can then start the basics of the operating system. This is documented here.

Pi 4

The Pi 4 bootloader is set up differently. The main documentation page for network booting directs us to the Raspberry Pi 4 Bootloader Configuration page, which is actually just the page for all the Raspberry Pis.

Once we're on this page, there is a note that the Pi 4 and 5 do not use bootcode.bin, and we are directed to the following pages:

Special EEPROM Settings

Within the EEPROM, we can configure some settings specific to network booting. These are documented in the table below.

EEPROM Config Parameter	Default	Description
`TFTP_FILE_TIMEOUT`	`5000`	Timeout in milliseconds to wait for an individual TFTP file.
`TFTP_IP`	`""`	IP address of the TFTP server, to get the files needed.
`TFTP_PREFIX`	`0`	`0` for serial number, `1` for `TFTP_PREFIX_STR`, `2` for the MAC address in kebab format.
`TFTP_PREFIX_STR`	`""`	Custom string to use. Could be good if want to reduce reliance on serial, so could have, e.g., `running-lion` in both the `pxe` and `tftp` directories.
`PXE_OPTION43`	`Raspberry Pi Boot`	TBA

Main Important Bits

When network booting, we make use of two main protocols: TFTP, and NFS. TFTP is a very simple file transfer protocol that we use when initially downloading the bootloader from the server, and the client will query the TFTP server as specified by the DHCP next-server option, or the main DHCP server, failing that.

NFS is a more robust solution which can be used following bootstrapping of the initial bootloader. This initial bootloader is an ELF file that will typically load the cmdline.txt file also on that share. The cmdline.txt specifies options for the root filesystem, as the boot filesystem is available over TFTP.

This filesystem as specified in cmdline.txt is typically the root filesystem as exposed by the NFS server to the client. Therefore, we need to host two services on either a router, or a third party server: the TFTP server, for initial bootstrapping; and the NFS server for the main OS image.

Trying to Setup OpenWRT

Setup doesn't work with OpenWRT

I tried to setup this on my router, which runs OpenWRT, to make life simple for myself. When I moved the router, I'd then have this up and running as part of the network straight away. Alas, all guides use different setups, and the OpenWRT page on network booting

As with the rest of the theme of this project, I'm trying to use what I have available to me. I've already got my router setup and running OpenWRT, and this supposedly will quite happily serve up the required TFTP server and boot options. I added an old 64GB SSD in a USB3 caddy (not sure whether SSD performance matters over NFS? I'll have to check this at some point), then formatted it.

In terms of the SSD, when adding it to /etc/fstab, I made sure to identify it by the UUID blkid returned, so that if I add any other drives later down the line, it won't get confused as to which device /dev/sdX is. I formatted the drive as ext4 for testing, but later down the line, I'd like to consider making this a part of a RAID array for resiliency in case this drive dies.

OpenWRT has a PXE/TFTP pane in LuCI, and I struggled for ages trying to get this to work. I installed NFS on the router too, and created both a pxe folder and a tftp folder. The logic behind this was that I could put the initial bootstrap in the tftp folder (this is typically mapped as /boot/firmware on the Raspberry Pi), then put the remainder of the OS in the pxe folder.

Ultimately, I moved away from this solution on the router, as I couldn't mess with it too much as it runs the network, and the support and extensibility I wanted wasn't available from within OpenWRT. I'm sure had I moved away from LuCI, I could reconfigure network booting from the router reliably.

Setup Following Raspberry Pi Cluster Guide

Following my failure to get this working on the router, I tried looking at some other guides on the internet. This guide from linuxhit.com initially seemed to be good, but the filesystem didn't account for the new /boot/firmware that Raspberry Pi OS now uses, and imaged the new OS live from the SD card being used at the time.

I also tried the tutorial from the Level1Techs forums, but again, this was outdated and incomplete. I ended up following this tutorial from Raspberry Pi themselves, and I would suggest you do too, if trying to get network booting to work, with a few caveats:

Ignore most of the networking configuration, this is for a k8s cluster

When you get to copying the OS, the commands tell you to copy the firmware to the wrong location:

mkdir /tmp/image
cd /tmp/image
wget -O raspios_lite_latest.img.xz https://downloads.raspberrypi.com/raspios_lite_arm64_latest
xz -d raspios_lite_latest.img.xz
kpartx -a -v *.img
mkdir bootmnt
mkdir rootmnt
mount /dev/mapper/loop0p1 bootmnt/
mount /dev/mapper/loop0p2 rootmnt/
mkdir -p /mnt/usb/rpi1
mkdir -p /mnt/usb/tftpboot/<serialnum>
cp -a rootmnt/* /mnt/usb/rpi1

# Original Command
cp -a bootmnt/* /mnt/usb/rpi1/boot/firmware

# Should be
cp -a bootmnt/* /mnt/usb/tftpboot/<serialnum>

The share locations are also confusing, when copying to the /etc/fstab you are instructed to create a mount for the tftpboot, but this directory isn't exported by NFS, so this has to be added as a mount.

Modifying `dnsmasq.conf` on OpenWRT

Despite my migration away from OpenWRT for the PXE/TFTP server, I still had to set the DHCP options for the router to tell clients where the network boot server is. This can be done by setting options 43 and 66 in /etc/dnsmasq.conf (appending to the file):

Warning

This may not persist between reboots, I've not checked!

dhcp-option=43,"Raspberry Pi Boot"
dhcp-option=66,"192.168.1.117"

Serial Numbers and MAC Addresses

Dependant on the configuration of the EEPROM, the serial numbers, MAC addresses or a user customized string are used in the bootloader for the Pi to establish which folder to connect to for the /boot. This is the serial number by default, and I see no reason to change it.

The serial number can be extracted from the Pi by running the following command:

vcgencmd otp_dump | grep 28: | sed s/.*://g

Similarly, the MAC address can be dumped with ip a then look for the 48-bit address after link/ether.

Alternatively, on the Pi 4, just plug it in without any storage or network, and it'll happily tell you the serial number and MAC address under the board part of the diagnostics screen in the second and third columns, respectively.

Personal Reference

This is only of use to me, you don't need to use any of these values!

Server	Serial Number	Mac Address
`running-lion`	`38c128c0`	`DC:A6:32:79:26:DF`

General Debugging

This process was quite hard to understand as the whole interaction is very opaque without brilliant logging. I created a remote packet capture from the router's main LAN interface, so that I could see when transfers were taking place. The main things to look for are a valid DHCP discover, offer, and request going through, then checking TFTP requests are working.

Pi Diagnostic Screen

On the Pis I was stuck for ages, as they'd sit on the boot diagnostics screen. They'll show their IP and the server address with YI_ADDR and SI_ADDR on the main screen. They can't be pinged when they show those addresses.

The net part of the diagnostics interface will be all zeroes until it's got the proper PXE options, and then the log will go a bit mad with TFTP transfers taking place. Once this is done, in my experience it takes about 20-30s to get to either the full OS boot start (with the 4 Raspberrys) or an (initramfs) issue.

Initramfs Emergency Mode

If you get to this, it suggests that your cmdline.txt is messed up. Check you have exported the shares properly and that the server IP is also set correctly. Once this is done, you should at least get Linux to boot.

NFS Issues

Once I got past the initramfs issues, I was having issues with mounting the NFS shares properly. It was booting into Linux but unable to mount the shares, either because they didn't exist or the wrong IP was given.

Security

TFTP has no authentication, NFS has very limited authentication. The Pi 4 and later supports 'secure' boot, which on a technicality you could argue is 'secure'. The Pis will netboot off a signed image, although I've not tried this. Any form of encryption can be done with a device key, but this would involve some form of encryption on NFS or something else.

This should ideally only really be done on an isolated network, possibly a separate VLAN to the rest of the devices. In the real world for a large cluster you'd probably want to make use of some sort of TPM with a secure boot solution and not use Raspberry Pis, but for the home application it's not too bad.