Tricky Firmware Slowness
May 4, 2022
For a deployment recently we wanted to create an out-of-band circuit with a root bridge on a server that had the IPMI/BMC of several edge devices connected to it. There was a complexity in that one of the BMCs was remote and was being made accessible via a transparent bridge that had an arbitrary vlan tag set. Maybe its embarrassing to admit, but for these edge devices there was a need to be able to see both ssh (ipmitool) and http (graphical kvm), so these were exposed via wireguard.
Hardware/Firmware #
We were deploying two systems from Gigabyte, the R152-Z32 and E152-ZE0. The R152 uses Aspeed 2500 and the E152 uses the Aspeed 2600. This became meaningful as the BMC implementation on the E152 wasn’t plagued with the same issue as the R152.
Slow, but accessible? #
Initially, without VLAN tagging, the BMC interfaces work as expected. However, when we added the bridge in place and started tagging traffic from the BMC we’re seeing significant slowness. This is quite tricky as it:
- Doesn’t appear in a ping response (no difference in latency)
- Appears slightly over ipmitool, but is “faint”
- Appears significantly over http, taking 30+ seconds for initial page load and making Remote KVM completely unusable.
Initially I’d mentioned that one leg of the journey was being handled via a microwave and mrv. We suspected this to be somehow complicating things so we focused in on the directly connected unit.
This is the type of networking issue that is extremely hard to root cause without having access to both sides of the communication. Notionally it looked as if there was fragmentation occurring with specific payloads but plumbing that out one sided was proving to be difficult. We also had the R152 and E152 to work with (moving from R152 to E152 for more network interfaces on the edge) and noticed immediately that this bug was not present on the E152.
We removed the tagging from the bridge and the local BMC instance and saw nominal behavior. Took some videos and documented the issue for our Gigabyte representatives (in California). After a good deal of back and forth a reproducer was found. As a side note, during this time the industry has such insane shortages that Gigabyte didn’t even have representative hardware to test on for a bit, we almost shipped systems back to them.
A bug for everyone? #
These BMC systems are pretty well ubiquitous in server hardware, however there is quite small amount of implementation diversity. A lot of manufactures utilize American Megatrends MegaRAC and brand it themselves. If you look at a Gigabyte mainboard and an AsrockRack mainboard you will see essentially customizations of the same general interface.
Gigabyte pretty quickly said “we’ve gotta work with AMI for a fix”. Which makes you really think: how many systems are impacted by this fairly strait forward bug in the last half decade? We couldn’t be the only people on the planet who wanted to tag traffic from our BMC(s) right?
Well… When looking more into this I found this awesome video from ServeTheHome where AMI has a typo on a sticker that goes on top of their BMC “American Megatrands”. So, I guess it’s not unlikely that bugs are slipping through on basically some of the most ubiquitous hardware that is shipped now days.
When getting new hardware to integrate into your stack you don’t expect to be exposed to low level firmware issues. ODM(s) should be catching these things in their internal qualifications and publishing qualified lists of hardware that has been tested. If a feature is exposed commonly you’d expect that feature to have some level of qualified testing before shipping. We were lucky that we had the E152 on hand and could quickly swap over, as we didn’t have the ability to handle VLAN tagging without putting more devices in play and wanted to keep the out of band topology as simple as we could. Even still, we’ve had significant issues with the E152 BIOS/BMC and had to roll through firmware updates quite quickly. For edge devices this is extremely painful as you have to either design in high availability or have planned downtime.
Maybe we’re in a world right now, right before disaggregation hits hard, where these “sidecar” type devices are going to die out in favor of DPU style CXL interconnected “networks” of devices.
Firmware bugs are icky.
BONUS: Setup a VLAN aware Bridge in systemd-networkd #
This was implemented on a Protectli FW6D, where we will make a bridge from the opt{1,2,3,4}
interfaces. We’ve renamed the interfaces by using a .link
unit, here is an example:
cat /etc/systemd/network/0-opt1.link
[Match]
Path=pci-0000:03:00.0
[Link]
Name=opt1
We use the Path=pci-
approach because this is a “fixed” system that will not experience a PCI level change. Alternatively you can use MAC via PermanentMACAddress=
.
From there let’s make our VLAN aware bridge (named oobridge), we make the netdev:
vim /etc/systemd/network/oobridge.netdev
[NetDev]
Name=oobridge
Kind=bridge
[Bridge]
DefaultPVID=0
VLANFiltering=yes
We are going put an address on a vlan interface, on the same bridge (named oob), it will be referenced via the [Network]
directive:
vim /etc/systemd/network/oobridge.network
[Match]
Name=oobridge
[Network]
VLAN=oob
[BridgeVLAN]
VLAN=100
Now the vlan interface:
vim /etc/systemd/network/oob.netdev
[NetDev]
Name=oob
Kind=vlan
[VLAN]
Id=100
Now the address on that interface, with a dhcp server:
vim /etc/systemd/network/oob.network
[Match]
Name=oob
[Network]
LinkLocalAddressing=no
Address=100.64.64.1/24
DHCPServer=yes
[DHCPServer]
DNS=100.64.64.1
NTP=100.64.64.1
Timezone=America/Detroit
EmitDNS=yes
EmitNTP=yes
EmitRouter=yes
EmitTimezone=yes
Now we associate the interfaces, however we’re expecting VLAN ID 100
to be utilized on them:
vim /etc/systemd/network/optX.network
[Match]
Name=opt*
[Network]
Bridge=oobridge
[BridgeVLAN]
VLAN=100
Everything works as intended. Now we connect the BMC interfaces, but set the IP information on them to utilize VLAN ID 100
. This can be done via a variety of methods, likely most would connect directly to the BMC interface and use http/gui. However if you have a functioning system you can ssh into it’s possible to use ipmitool to manipulate the BMC (there is an actual bmc device: /dev/ipmi0
):
ipmitool lan set 1 ipsrc dhcp
ipmitool lan set 1 vlan id 100
If you want to set a manual address you can do something like this:
ipmitool lan set 1 ipsrc static
ipmitool lan set 1 ipaddr 100.64.64.2
ipmitool lan set 1 netmask 255.255.255.0
ipmitool lan set 1 vlan id 100
To examine configuration you’ll want to get familiar with print
, e.g. ipmitool lan print
.
We verify we have functioning connectivity (e.g. ping the BMC) and… start to experience the weird.