Half-watt PCIe power virus in HardenedBSD and FreeBSD - issue, how I figured it out, impact assessment, immediate mitigations, next steps
Initially penned starting Tue Mar 21 13:47:11 UTC 2023 by Amelia Bjornsdottir (they, she, he).
Recently I became the somewhat ashamed owner of a Lenovo ThinkPad T480 with an Intel Core i5-8250U. Frankly I do not know what the fuss is about, though I’m typing this on it so there must be something to it? I promptly installed both VoidLinux and HardenedBSD on it. I noticed when monkeying around with powerstat on the linux side and sysutils/intel-pcm (which I later installed on Linux) on the HardenedBSD side that I would burn more power in HardenedBSD than on Linux, to the tune of between 0.5 W and 1 W. After some frustrated groaning in #hardenedbsd and #freebsd on libera chat, and that one time I patched the kernel to use a special mwait method, which was only able to achieve the same C-states (and thus power savings compared to spinning or hlting) as ACPI, I set about, on a dare from someone on Jabber who said I should compare single-user states of both OSes, doing just that, and accidentally figuring out the problem kind of on my own. I had several bingo moments: I was able to reproduce the power virus on Linux. It cleared when I ran /etc/runit/1. Interposing intel-pcm between steps in Runit’s core services (including starting up systemd-udevd) showed that roughly a step after system-udevd fired up, the CPU started entering C8. (I will make the logs available on request.) After some more monkeying around in a single-user setup of Linux, where I’d made udev not have its magical effect on power, I fired up powertop on a whim, not really thinking it’d do much. I had it tweak some sysfs things related to runtime power management of PCI Express devices (including Thunderbolt)… and the cure to the power virus came around again. And I tweaked them back the other way, and the power virus came back.
I decided to reboot into FreeBSD, and monkey around with devctl, thinking this would probably help, as FreeBSD is able to power down unused PCIe devices (defined as not having a driver attached). As a bit of a nuclear option, I detached the entire Thunderbolt bridge at the root. I tabbed over to intel-pcm, and the CPU had started entering C8 states, and was consuming within a margin of error from Linux' power consumption (though still above).
The problem & its impact
FreeBSD does not appear to put Thunderbolt bridges to sleep when they are not in use. HardenedBSD is known to us not to do so. Neither operating system directly documents anything about Thunderbolt - I assume most features of the interface are unsupported. Both OSes claim to support ASPM.
The result is that the CPU is unable to go into C8 states. I don’t pretend to be a PCIe expert (nor, indeed, an expert in Intel Kaby Lake CPUs), so I don’t understand nor claim to understand how it gets from there, to there, but it may have something to do with interrupts? Whatever, I’m sleepy.
This problem may not be limited to Thunderbolt.
If you are suffering this problem, after deducing addresses from pciconf -l, try
devctl suspend <device you are not using> and watch your power consumption via, say, sysutils/intel-pcm, or
sysctl hw.acpi.battery.rate. If it drops, then great. If not, try disabling the device altogether (
I believe USB devices can also cause this, but the procedure to list them and turn them off is different:
usbconfig list, and
usbconfig power_off the ugenX.Y device shown.
This problem has no impact on data security by itself, though it may be caused by a problem which itself may have an impact on data security. Do not allow untrusted persons access to your computer’s PCIe, USB or Thunderbolt ports while power is applied in any way.
I plan on putting in a bug report to FreeBSD.