Skip to content

Network Booting

As part of my testing into clustering Raspberry Pis for my home lab, the first part is to get them to netboot properly.

Netbooting (or PXE booting) is the process of booting a computer without using some local storage medium. Many computers support network booting, allowing you to boot a computer with no local storage at all, should you wish.

In my case, I've found that network booting has several advantages:

  • Centralised storage for all servers, allowing me to back them up and reimage machines without fuss
  • Reclaiming Micro-SD cards, which are susceptible to burning out (e.g., my Home Assistant Pi)
  • Treating servers like cattle, instead of pets, and hopefully automating lots of processes with cloud-init and Ansible

Background

A Very, Very, High Level Overview

Please Read Me!

This section gives a high level overview which is worth reading fully to understand how each part of the system interacts with the other parts of the system. Then, if something goes wrong following the rest of the guide, you are much more likely to understand why it is going wrong, and some good debugging steps that you can take.

The router will typically run a DHCP server to assign new clients on the network an IP address that doesn't conflict with other clients. DHCP also configures other network settings, such as telling the client where the router is, and the subnet the client is on so it knows whether to route a message to the router, or to another client on the network itself. The DHCP server also provides the DNS settings for clients, which really cements the name Dynamic Host Configuration Protocol.

The DHCP server also has further options which we can include in the response to the client. Of note are DHCP option 43, which is defined as "Vendor Specific Information" in RFC2132. In addition to this, it can specify a "next server", or the address of a TFTP server, where PXE clients should request bootloader files. This is defined as option 661, again in RFC2132.

PXE clients are simply those computers that implement a Preboot Execution Environment, meaning that the firmware, either on the motherboard, or an early stage bootloader in the case of a single board computer, contains some code that can make use of this option 43 and option 66 to download the next stage bootloader from a server. The first part of this download is done using the Trivial File Transfer Protocol (TFTP), which is a very simple protocol, not resilient to failures, and not very performant.

As a client, our firmware implements TFTP in an early bootloader, which is then used to bootstrap the next stage bootloader. This next stage bootloader can be much more complicated (relatively speaking), and will typically be used to boot a Linux kernel from a given initramfs we transfer with TFTP.

When starting the Linux kernel, we can specify a NFS mount, which is a more robust and performant file transfer protocol, allowing us to access files on a remote server. The Linux kernel has (typically) a /boot/cmdline.txt file, where the /boot is the first partition of the SD card or the hard drive, and is used to run the kernel, which can then mount the remainder of the drive, or indeed a myriad of other drives into the operating system.

Therefore, we need to ensure that our DHCP server is capable of setting option 43, and option 66 at a minimum. We can then provision a standalone server or use the capabilities of a more powerful router to host a TFTP server, and NFS server.

For the remainder of this guide, I will be demonstrating the configuration required to get Raspberry Pi Model 4Bs to boot from an OpenWRT-based router (which coincidentally is also a Pi Model 4B). I'll also give pointers to how this would be done for a standalone server.

Raspberry Pi Boot Flow

All the guides I have found on the internet say that they are updated for the Pi 4, but they seem to be quite outdated. Raspberry Pi have changed the location of the boot partition within the Raspberry Pi OS, and all tutorials refer to the old location at /boot. The latest version of the Pi OS has the firmware at /boot/firmware, when mapped into the OS.

Pi 3

On a Pi 3, the boot sequence is well documented. The Pi 3 requests the bootcode.bin file from the PXE boot server, over TFTP, which simply goes into the [serialnum]/start.elf file, which can then start the basics of the operating system. This is documented here.

Pi 4

The Pi 4 bootloader is set up differently. The main documentation page for network booting directs us to the Raspberry Pi 4 Bootloader Configuration page, which is actually just the page for all the Raspberry Pis.

Once we're on this page, there is a note that the Pi 4 and 5 do not use bootcode.bin, and we are directed to the following pages:

Special EEPROM Settings

Within the EEPROM, we can configure some settings specific to network booting. These are documented in the table below.

EEPROM Config Parameter Default Description
TFTP_FILE_TIMEOUT 5000 Timeout in milliseconds to wait for an individual TFTP file.
TFTP_IP "" IP address of the TFTP server, to get the files needed.
TFTP_PREFIX 0 0 for serial number, 1 for TFTP_PREFIX_STR, 2 for the MAC address in kebab format.
TFTP_PREFIX_STR "" Custom string to use. Could be good if want to reduce reliance on serial, so could have, e.g., running-lion in both the pxe and tftp directories.
PXE_OPTION43 Raspberry Pi Boot TBA

OpenWRT Set Up

As with the rest of the theme of this project, I'm trying to use what I have available to me. I've already got my router setup and running OpenWRT, forming part of my core network. If I can put the TFTP and NFS servers on this too, then the overheads when migrating the network to new infrastructure are minimised, and the only server with a physical boot drive is the router, which I can increase fault tolerance on by creating regular backups, and eventually putting the SSD into a RAID-like configuration.

In terms of the SSD, when adding it to /etc/fstab, I made sure to identify it by the UUID blkid returned, so that if I add any other drives later down the line, it won't get confused as to which device /dev/sdX is. I formatted the drive as ext4 for testing, but later down the line, I'd like to consider making this a part of a RAID array for resiliency in case this drive dies.

OpenWRT has a PXE/TFTP pane in LuCI, and I struggled for ages trying to get this to work. I installed NFS on the router too, and created both a pxe folder and a tftp folder. The logic behind this was that I could put the initial bootstrap in the tftp folder (this is typically mapped as /boot/firmware on the Raspberry Pi), then put the remainder of the OS in the pxe folder.

Ultimately, I moved away from this solution on the router, as I couldn't mess with it too much as it runs the network, and the support and extensibility I wanted wasn't available from within OpenWRT. I'm sure had I moved away from LuCI, I could reconfigure network booting from the router reliably.

Setup Following Raspberry Pi Cluster Guide

Following my failure to get this working on the router, I tried looking at some other guides on the internet. This guide from linuxhit.com initially seemed to be good, but the filesystem didn't account for the new /boot/firmware that Raspberry Pi OS now uses, and imaged the new OS live from the SD card being used at the time.

I also tried the tutorial from the Level1Techs forums, but again, this was outdated and incomplete. I ended up following this tutorial from Raspberry Pi themselves, and I would suggest you do too, if trying to get network booting to work, with a few caveats:

  • Ignore most of the networking configuration, this is for a k8s cluster
  • When you get to copying the OS, the commands tell you to copy the firmware to the wrong location:

    # Don't use /tmp there's not enough ram
    mkdir ~/image
    cd ~/image
    wget -O raspios_lite_latest.img.xz https://downloads.raspberrypi.com/raspios_lite_arm64_latest
    xz -d raspios_lite_latest.img.xz
    kpartx -a -v *.img
    mkdir bootmnt
    mkdir rootmnt
    mount /dev/mapper/loop0p1 bootmnt/
    mount /dev/mapper/loop0p2 rootmnt/
    mkdir -p /mnt/usb/rpi1
    mkdir -p /mnt/usb/tftpboot/<serialnum>
    cp -a rootmnt/* /mnt/usb/rpi1
    
    # Original Command
    cp -a bootmnt/* /mnt/usb/rpi1/boot/firmware
    
    # Should be
    cp -a bootmnt/* /mnt/usb/tftpboot/<serialnum>
    

  • The share locations are also confusing, when copying to the /etc/fstab you are instructed to create a mount for the tftpboot, but this directory isn't exported by NFS, so this has to be added as a mount.

Aside: Doing this on OpenWRT

I want to be able to provision clean images directly from my router. To do this, I first resized the root filesystem, following this guide. The scripts didn't work properly themselves, so I had to manually go in and run parted myself, as the parted binary distributed on my build did not come with the -f option.

Then, I could download the latest Raspberry Pi OS Lite image from the downloads page of the Raspberry Pi website. Following this, I decompressed with xz -d to get the raw .img file.

I could then setup a loop device with losetup -P /dev/loopX ~/2024-03-15-raspios-bookworm-armhf-lite.img, which gave /dev/loopXp1 for the boot partition and /dev/loopXp2 for the root partition. From here, I can then copy into the respective directories

OpenWRT has the following caveats: it doesn't necessarily have the space, and there are no packages for kpartx.

Adapt the above, so that you have an /etc/exports that can be mounted from a client e.g., arch machine and then you can go through these steps with the mounted share. For arch to not throw a funny error, ensure you pacman -Sy nfs-utils, also multipath-tools.

mount -t nfs info-highway.home.chza.me:/ /mnt/router

# Don't use /tmp there's not enough ram
mkdir /mnt/router/image
cd /mnt/router/image
wget -O raspios_lite_latest.img.xz https://downloads.raspberrypi.com/raspios_lite_arm64_latest
xz -d raspios_lite_latest.img.xz
kpartx -a -v *.img
mkdir bootmnt
mkdir rootmnt

# NOTE this may differ from loop0p1, so change it!
mount /dev/mapper/loopXp1 bootmnt/
mount /dev/mapper/loopXp2 rootmnt/

# For the main files
mkdir -p ../pxe/running-lion

# Bootloader only
mkdir -p ../tftp/<serialnum>

# Copy files across (bear in mind this will take ages)
cp -a rootmnt/* ../pxe/running-lion
cp -a bootmnt/* ../tftp/<serialnum>

Modifying dnsmasq.conf on OpenWRT

Despite my migration away from OpenWRT for the PXE/TFTP server, I still had to set the DHCP options for the router to tell clients where the network boot server is. This can be done by setting options 43 and 66 in /etc/dnsmasq.conf (appending to the file):

dhcp-option=43,"Raspberry Pi Boot"
dhcp-option=66,"192.168.1.117" # This was the option when I ran a different server, Raspi waits for option 43 and 66

Notes Next Try

Set DHCP option 66 to 192.168.1.1 for OpenWRT boot server

Need to ensure when following tutorial that you are copying to the tftp dir and not the pxe/boot/firmware dir

Also in the /etc/fstab can get rid of all fstab mounts Failure of resize2fs so need to disable that also need to figure out boot-firmware mount and remote-fs.target Something funny with the copy as permissions all messed up, maybe instead of cp need to rsync it across or something. When mounting the NFS share did a sudo and a chown don't think that was the right thing

Serial Numbers and MAC Addresses

Dependant on the configuration of the EEPROM, the serial numbers, MAC addresses or a user customized string are used in the bootloader for the Pi to establish which folder to connect to for the /boot. This is the serial number by default, and I see no reason to change it.

The serial number can be extracted from the Pi by running the following command:

vcgencmd otp_dump | grep 28: | sed s/.*://g

Similarly, the MAC address can be dumped with ip a then look for the 48-bit address after link/ether.

Alternatively, on the Pi 4, just plug it in without any storage or network, and it'll happily tell you the serial number and MAC address under the board part of the diagnostics screen in the second and third columns, respectively.

Personal Reference

This is only of use to me, you don't need to use any of these values!

Server Serial Number Mac Address
running-lion 38c128c0 DC:A6:32:79:26:DF

General Debugging

This process was quite hard to understand as the whole interaction is very opaque without brilliant logging. I created a remote packet capture from the router's main LAN interface, so that I could see when transfers were taking place. The main things to look for are a valid DHCP discover, offer, and request going through, then checking TFTP requests are working.

Pi Diagnostic Screen

On the Pis I was stuck for ages, as they'd sit on the boot diagnostics screen. They'll show their IP and the server address with YI_ADDR and SI_ADDR on the main screen. They can't be pinged when they show those addresses.

The net part of the diagnostics interface will be all zeroes until it's got the proper PXE options, and then the log will go a bit mad with TFTP transfers taking place. Once this is done, in my experience it takes about 20-30s to get to either the full OS boot start (with the 4 Raspberrys) or an (initramfs) issue.

Initramfs Emergency Mode

If you get to this, it suggests that your cmdline.txt is messed up. Check you have exported the shares properly and that the server IP is also set correctly. Once this is done, you should at least get Linux to boot.

NFS Issues

Once I got past the initramfs issues, I was having issues with mounting the NFS shares properly. It was booting into Linux but unable to mount the shares, either because they didn't exist or the wrong IP was given.

Security

TFTP has no authentication, NFS has very limited authentication. The Pi 4 and later supports 'secure' boot, which on a technicality you could argue is 'secure'. The Pis will netboot off a signed image, although I've not tried this. Any form of encryption can be done with a device key, but this would involve some form of encryption on NFS or something else.

This should ideally only really be done on an isolated network, possibly a separate VLAN to the rest of the devices. In the real world for a large cluster you'd probably want to make use of some sort of TPM with a secure boot solution and not use Raspberry Pis, but for the home application it's not too bad.

Further Reading


  1. There are two options for specifying a TFTP server. Option 66 gives the address of the server, as does the next-server option. Most PXE clients make use of the next-server option as they boot. 

Comments