Silverblue VFIO on X570+W6600+6800XT
November 16, 2021
This post is essentially archiving the discussion here.
I’ve been reading through other threads and attempting to get this working. My objectives are:
- Fedora Silverblue
- Host: W6600 connected to four displays via displayport
- Audio via USB DAC (no need for W6600 audio capabilities)
- Guest: 6800XT connected to single display via HDMI
- USB-C passed to guest, with USB-C based hub for keyboard+mouse+headset
- Mostly Linux guest, and mostly headless for vulkan accelerated containers
- Windows guest for some games
This might be the first silverblue focused post on here? I’m attempting this on silverblue now but I have similar hardware I was planning to run headless-ly with Fedora IoT for doing moonlight. The approach with silverblue/ostree is a bit different than other distros.
Hardware is:
- Gigabyte X570 Aorus Master
- AMD Reference W6600
- AMD Reference 6800 XT
-[0000:00]-+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Root Complex
+-00.2 Advanced Micro Devices, Inc. [AMD] Starship/Matisse IOMMU
+-01.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
+-01.1-[01]----00.0 Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
+-01.2-[02-0a]----00.0-[03-0a]--+-00.0-[04]----00.0 Toshiba Corporation XG6 NVMe SSD Controller
| +-01.0-[05]----00.0 Toshiba Corporation XG6 NVMe SSD Controller
| +-03.0-[06]----00.0 Intel Corporation Wi-Fi 6 AX200
| +-05.0-[07]----00.0 Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller
| +-08.0-[08]--+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
| | +-00.1 Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
| | \-00.3 Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
| +-09.0-[09]----00.0 Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
| \-0a.0-[0a]----00.0 Advanced Micro Devices, Inc. [AMD] FCH SATA Controller [AHCI mode]
+-02.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
+-03.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
+-03.1-[0b-0d]----00.0-[0c-0d]----00.0-[0d]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Device 73e3
| \-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT]
+-03.2-[0e-10]----00.0-[0f-10]----00.0-[10]--+-00.0 Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT]
| +-00.1 Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT]
| +-00.2 Advanced Micro Devices, Inc. [AMD/ATI] Device 73a6
| \-00.3 Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 USB
+-04.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
+-05.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
+-07.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
+-07.1-[11]----00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Function
+-08.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge
+-08.1-[12]--+-00.0 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Reserved SPP
| +-00.1 Advanced Micro Devices, Inc. [AMD] Starship/Matisse Cryptographic Coprocessor PSPCPP
| +-00.3 Advanced Micro Devices, Inc. [AMD] Matisse USB 3.0 Host Controller
| \-00.4 Advanced Micro Devices, Inc. [AMD] Starship/Matisse HD Audio Controller
+-14.0 Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller
+-14.3 Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge
+-18.0 Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 0
+-18.1 Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 1
+-18.2 Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 2
+-18.3 Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 3
+-18.4 Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 4
+-18.5 Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 5
+-18.6 Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 6
\-18.7 Advanced Micro Devices, Inc. [AMD] Matisse Device 24: Function 7
I have done the following and had minimal success:
vim /etc/dracut.conf.d/vfio.conf
add_drivers+=" vfio vfio_iommu_type1 vfio_pci vfio_virqfd "
Need to enable initramfs, where “In Silverblue, the initramfs is prebuilt and included in the system image”, enabling will apparently create the local initramfs and read the dracut.conf.d
we configured above:
rpm-ostree initramfs –enable
Install the necessary hypervisor packages, choosing to do this as a layered package instead of something like flatpak:
rpm-ostree install virt-manager qemu-kvm
- Update BIOS (f35b at time of this posting)
- Set some BIOS settings:
- Settings -> IO Ports -> Initial Display Output -> PCIe 1 Slot (W6600)
- Tweaker -> Advanced CPU Settings -> SVM Mode -> Enabled
- Settings -> Miscellaneous -> IOMMU -> Enabled
- Settings -> NBIO Common Options -> IOMMU -> Enabled
Locate the GPU that we want to pass through (6800XT and it’s USB-C controller):
#!/bin/bash
shopt -s nullglob
for g in `find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V`; do
echo "IOMMU Group ${g##*/}:"
for d in $g/devices/*; do
echo -e "\t$(lspci -nns ${d##*/})"
done;
done;
IOMMU Group 34:
10:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] [1002:73bf] (rev c1)
IOMMU Group 35:
10:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT] [1002:ab28]
IOMMU Group 36:
10:00.2 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD/ATI] Device [1002:73a6]
IOMMU Group 37:
10:00.3 Serial bus controller [0c80]: Advanced Micro Devices, Inc. [AMD/ATI] Navi 21 USB [1002:73a4]
Adjust our kernel boot parameters:
rpm-ostree kargs –append=“amd_iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:73bf,1002:ab28,1002:73a6,1002:73a4 video=efifb:off”
Reboot and observe:
sudo lspci -vn
However I’ve truncated to only pull in the two GPU(s):
- W6600 is 0d:00:0,1
- 6800XT is 10:00.0,1,2,3
0d:00.0 0300: 1002:73e3 (prog-if 00 [VGA controller])
Subsystem: 1002:0e0c
Flags: bus master, fast devsel, latency 0, IRQ 213, IOMMU group 30
Memory at d0000000 (64-bit, prefetchable) [size=256M]
Memory at e0000000 (64-bit, prefetchable) [size=2M]
I/O ports at e000 [size=256]
Memory at fcc00000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at fcd00000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Physical Resizable BAR
Capabilities: [240] Power Budgeting <?>
Capabilities: [270] Secondary PCI Express
Capabilities: [2a0] Access Control Services
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [410] Physical Layer 16.0 GT/s <?>
Capabilities: [440] Lane Margining at the Receiver <?>
Kernel driver in use: amdgpu
Kernel modules: amdgpu
0d:00.1 0403: 1002:ab28
Subsystem: 1002:ab28
Flags: fast devsel, IRQ 255, IOMMU group 31
Memory at fcd20000 (32-bit, non-prefetchable) [disabled] [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
10:00.0 0300: 1002:73bf (rev c1) (prog-if 00 [VGA controller])
Subsystem: 1002:0e3a
Flags: bus master, fast devsel, latency 0, IRQ 255, IOMMU group 34
Memory at b0000000 (64-bit, prefetchable) [size=256M]
Memory at c0000000 (64-bit, prefetchable) [size=2M]
I/O ports at d000 [disabled] [size=256]
Memory at fc600000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at fc700000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Physical Resizable BAR
Capabilities: [240] Power Budgeting <?>
Capabilities: [270] Secondary PCI Express
Capabilities: [2a0] Access Control Services
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [410] Physical Layer 16.0 GT/s <?>
Capabilities: [440] Lane Margining at the Receiver <?>
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
10:00.1 0403: 1002:ab28
Subsystem: 1002:ab28
Flags: fast devsel, IRQ 255, IOMMU group 35
Memory at fc724000 (32-bit, non-prefetchable) [disabled] [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
10:00.2 0c03: 1002:73a6 (prog-if 30 [XHCI])
Subsystem: 1002:73a6
Flags: bus master, fast devsel, latency 0, IRQ 90, IOMMU group 36
Memory at fc500000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=8 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: xhci_hcd
10:00.3 0c80: 1002:73a4
Subsystem: 1002:0408
Flags: bus master, fast devsel, latency 0, IRQ 109, IOMMU group 37
Memory at fc720000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/2 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: i2c-designware-pci
From here I create a VM with virt-manager, will attempt Windows first:
- Q35 + OVMF
- CPU host-passthrough with manually set topology of socket 1 and cores 16
- virtio qcow2 disk (works fine with proper drivers)
- leaving all the default devices (for a windows 10 detected system)
- kvm hidden state on
- vendor_id state on with random value
- pass in the 6800XT, and the three other devices
<domain type="kvm">
<name>windblows</name>
<uuid>d17d1f9f-361b-40f3-87fe-da5f735bd5e9</uuid>
<metadata>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://microsoft.com/win/10"/>
</libosinfo:libosinfo>
</metadata>
<memory unit="KiB">33554432</memory>
<currentMemory unit="KiB">33554432</currentMemory>
<vcpu placement="static">16</vcpu>
<os>
<type arch="x86_64" machine="pc-q35-5.2">hvm</type>
<loader readonly="yes" type="pflash">/usr/share/edk2/ovmf/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/win10_VARS.fd</nvram>
<bootmenu enable="no"/>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state="on"/>
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
<vendor_id state="on" value="randomid"/>
</hyperv>
<kvm>
<hidden state="on"/>
</kvm>
<vmport state="off"/>
</features>
<cpu mode="host-passthrough" check="none" migratable="on">
<topology sockets="1" dies="1" cores="16" threads="1"/>
</cpu>
<clock offset="localtime">
<timer name="rtc" tickpolicy="catchup"/>
<timer name="pit" tickpolicy="delay"/>
<timer name="hpet" present="no"/>
<timer name="hypervclock" present="yes"/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled="no"/>
<suspend-to-disk enabled="no"/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type="file" device="disk">
<driver name="qemu" type="qcow2"/>
<source file="/var/home/agd/.kvm/win10.qcow2"/>
<target dev="vda" bus="virtio"/>
<boot order="1"/>
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>
<disk type="file" device="cdrom">
<driver name="qemu" type="raw"/>
<source file="/var/home/agd/.kvm/Win10_21H1_English_x64.iso"/>
<target dev="sdb" bus="sata"/>
<readonly/>
<address type="drive" controller="0" bus="0" target="0" unit="1"/>
</disk>
<disk type="file" device="cdrom">
<driver name="qemu" type="raw"/>
<source file="/var/home/agd/.kvm/virtio-win-0.1.208.iso"/>
<target dev="sdc" bus="sata"/>
<readonly/>
<address type="drive" controller="0" bus="0" target="0" unit="2"/>
</disk>
<controller type="usb" index="0" model="qemu-xhci" ports="15">
<address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
</controller>
<controller type="sata" index="0">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
</controller>
<controller type="pci" index="0" model="pcie-root"/>
<controller type="pci" index="1" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="1" port="0x10"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="2" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="2" port="0x11"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
</controller>
<controller type="pci" index="3" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="3" port="0x12"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
</controller>
<controller type="pci" index="4" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="4" port="0x13"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
</controller>
<controller type="pci" index="5" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="5" port="0x14"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
</controller>
<controller type="pci" index="6" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="6" port="0x15"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
</controller>
<controller type="pci" index="7" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="7" port="0x16"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
</controller>
<controller type="pci" index="8" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="8" port="0x17"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
</controller>
<controller type="pci" index="9" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="9" port="0x18"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0"/>
</controller>
<controller type="virtio-serial" index="0">
<address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
</controller>
<interface type="network">
<mac address="52:54:00:3c:0e:32"/>
<source network="network"/>
<model type="e1000e"/>
<address type="pci" domain="0x0000" bus="0x01" slot="0x00" function="0x0"/>
</interface>
<serial type="pty">
<target type="isa-serial" port="0">
<model name="isa-serial"/>
</target>
</serial>
<console type="pty">
<target type="serial" port="0"/>
</console>
<channel type="spicevmc">
<target type="virtio" name="com.redhat.spice.0"/>
<address type="virtio-serial" controller="0" bus="0" port="1"/>
</channel>
<input type="tablet" bus="usb">
<address type="usb" bus="0" port="1"/>
</input>
<input type="mouse" bus="ps2"/>
<input type="keyboard" bus="ps2"/>
<graphics type="spice" autoport="yes">
<listen type="address"/>
<image compression="off"/>
</graphics>
<video>
<model type="qxl" ram="65536" vram="65536" vgamem="16384" heads="1" primary="yes"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x01" function="0x0"/>
</video>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x10" slot="0x00" function="0x0"/>
</source>
<address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x10" slot="0x00" function="0x1"/>
</source>
<address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x10" slot="0x00" function="0x2"/>
</source>
<address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x10" slot="0x00" function="0x3"/>
</source>
<address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
</hostdev>
<redirdev bus="usb" type="spicevmc">
<address type="usb" bus="0" port="2"/>
</redirdev>
<redirdev bus="usb" type="spicevmc">
<address type="usb" bus="0" port="3"/>
</redirdev>
<memballoon model="virtio">
<address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
</memballoon>
</devices>
</domain>
Observations and Questions:
- Is it going to cause problems to pass
1002:ab28
since both the W6600 and 6800XT seem to have the same device? - What power state does the 6800XT idle in?
sudo lspci -vv -s 10:00.0
showsStatus: D3 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
which seems like it’s going to be a spicy boy while it’s sitting there doing nothing most of the time.
- VM doesn’t consistently work:
- Seems to only get video on HDMI when I cold boot the entire system. Rebooting the VM gives just a blank screen.
- USB-C on GPU doesn’t passthrough
- Hardware acceleration is good (seems to drive the display, amd drivers install and recognize the 6800XT)
Looking for assistance!
Reporting back, was able to get everything working consistently. This wasn’t my doing as much as the patience of jonpas who explained his approach to doing this. The main issue I was running into was not having distinct familiarity with how to manipulate the devices once my system was booted. This was gained by looking at jonpas repo and having him graciously shoulder look.
My intention is to put some documentation here that will help others now that I have it working. Breaking into logical sections
Restate the problem, get it to work #
From above (I’ve turned back on the second onboard NIC so the 6800XT is at 11:00 now), I was booting with the following passed to the kernel:
amd_iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:73bf,1002:ab28,1002:73a6,1002:73a4 video=efifb:off
Since the X570 allows for choosing the boot GPU PCIe slot the video=efifb:off
is actually not necessary. In my case the 6800XT is not “tainted” by the bootloader, so removal is fine.
The four devices that I want to get working are:
- 1002:73bf (GPU)
- 1002:ab28 (HDMI Audio)
- 1002:73a6 (USB)
- 1002:73a4 (USB)
When my system boots the GPU and HDMI Audio are flagged properly for vfio-pci
, but the USB devices are not, please examine:
sudo lspci -nv -s 11:00
11:00.0 0300: 1002:73bf (rev c1) (prog-if 00 [VGA controller])
Subsystem: 1002:0e3a
Flags: bus master, fast devsel, latency 0, IRQ 255, IOMMU group 36
Memory at b0000000 (64-bit, prefetchable) [size=256M]
Memory at c0000000 (64-bit, prefetchable) [size=2M]
I/O ports at e000 [disabled] [size=256]
Memory at fc600000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at fc700000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Physical Resizable BAR
Capabilities: [240] Power Budgeting <?>
Capabilities: [270] Secondary PCI Express
Capabilities: [2a0] Access Control Services
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [410] Physical Layer 16.0 GT/s <?>
Capabilities: [440] Lane Margining at the Receiver <?>
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
11:00.1 0403: 1002:ab28
Subsystem: 1002:ab28
Flags: fast devsel, IRQ 255, IOMMU group 37
Memory at fc724000 (32-bit, non-prefetchable) [disabled] [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
11:00.2 0c03: 1002:73a6 (prog-if 30 [XHCI])
Subsystem: 1002:73a6
Flags: bus master, fast devsel, latency 0, IRQ 91, IOMMU group 38
Memory at fc500000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable+ Count=8 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: xhci_hcd
11:00.3 0c80: 1002:73a4
Subsystem: 1002:0408
Flags: bus master, fast devsel, latency 0, IRQ 110, IOMMU group 39
Memory at fc720000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable+ Count=1/2 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: i2c-designware-pci
So, when I attempt to use the GPU and USB in a VM I get a variety of failure cases. If I remove the USB from passthrough to the VM I get a working GPU but it doesn’t work consistently (black screen, requires cold boot to reset).
Jonpas had the necessary commands here to troubleshoot this. I’d observed other scripts that people have posted on here but didn’t really grok the interaction with the devices so it wasn’t until it was done step by step did it click.
So we will now “rebind” the USB devices to vfio-pci
with the following commands:
Become root
sudo su
unbind the first device
echo 0000:11:00.2 > /sys/bus/pci/devices/0000:11:00.2/driver/unbind
lspci -nv -s 11:00.2
11:00.2 0c03: 1002:73a6 (prog-if 30 [XHCI])
Subsystem: 1002:73a6
Flags: fast devsel, IRQ 91, IOMMU group 38
Memory at fc500000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable- Count=8 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
We now see that there is no Kernel driver in use: xhci_hcd
. We attempt to bind the first device:
echo 0x1002 0x73a6 > /sys/bus/pci/drivers/vfio-pci/new_id
bash: echo: write error: File exists
This is because at boot time the kernel command line arguments attempted to bind this device, but somehow it’s gotten mucked by something at early boot.
So we need to remove_id
and then new_id
again:
echo 0x1002 0x73a6 > /sys/bus/pci/drivers/vfio-pci/remove_id
echo 0x1002 0x73a6 > /sys/bus/pci/drivers/vfio-pci/new_id
sudo lspci -nv -s 11:00.2
11:00.2 0c03: 1002:73a6 (prog-if 30 [XHCI])
Subsystem: 1002:73a6
Flags: fast devsel, IRQ 91, IOMMU group 38
Memory at fc500000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable- Count=8 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: vfio-pci
Success! We do this now for the other device:
echo 0000:11:00.3 > /sys/bus/pci/devices/0000:11:00.3/driver/unbind
echo 0x1002 0x73a4 > /sys/bus/pci/drivers/vfio-pci/remove_id
echo 0x1002 0x73a4 > /sys/bus/pci/drivers/vfio-pci/new_id
sudo lspci -nv -s 11:00.3
11:00.3 0c80: 1002:73a4
Subsystem: 1002:0408
Flags: fast devsel, IRQ 109, IOMMU group 39
Memory at fc720000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: vfio-pci
Now passing through 11:00.0,11:00.1,11:00.2,11:00.3
to the VM works, works consistently, and allows me to utilize the USB-C on the 6800XT to do keyboard, mouse, headset.
Now, with a working system I can flag other things as vfio-pci
like the other onboard NIC and pass that through, as well as rip out any other devices that are contributing to cruft:
- removal of the
vendor_id
- removal of the kvm hidden
- removal of all devices I can rip out
- addition of the onboard NIC
<domain type="kvm">
<name>windblows</name>
<uuid>d17d1f9f-361b-40f3-87fe-da5f735bd5e9</uuid>
<metadata>
<libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
<libosinfo:os id="http://microsoft.com/win/10"/>
</libosinfo:libosinfo>
</metadata>
<memory unit="KiB">33554432</memory>
<currentMemory unit="KiB">33554432</currentMemory>
<vcpu placement="static">16</vcpu>
<os>
<type arch="x86_64" machine="pc-q35-5.2">hvm</type>
<loader readonly="yes" type="pflash">/usr/share/edk2/ovmf/OVMF_CODE.fd</loader>
<nvram>/var/lib/libvirt/qemu/nvram/windblows_VARS.fd</nvram>
<bootmenu enable="no"/>
</os>
<features>
<acpi/>
<apic/>
<hyperv>
<relaxed state="on"/>
<vapic state="on"/>
<spinlocks state="on" retries="8191"/>
</hyperv>
<vmport state="off"/>
</features>
<cpu mode="host-passthrough" check="none" migratable="on">
<topology sockets="1" dies="1" cores="16" threads="1"/>
</cpu>
<clock offset="localtime">
<timer name="rtc" tickpolicy="catchup"/>
<timer name="pit" tickpolicy="delay"/>
<timer name="hpet" present="no"/>
<timer name="hypervclock" present="yes"/>
</clock>
<on_poweroff>destroy</on_poweroff>
<on_reboot>restart</on_reboot>
<on_crash>destroy</on_crash>
<pm>
<suspend-to-mem enabled="no"/>
<suspend-to-disk enabled="no"/>
</pm>
<devices>
<emulator>/usr/bin/qemu-system-x86_64</emulator>
<disk type="file" device="disk">
<driver name="qemu" type="qcow2"/>
<source file="/var/home/agd/.kvm/win10.qcow2"/>
<target dev="vda" bus="virtio"/>
<boot order="1"/>
<address type="pci" domain="0x0000" bus="0x04" slot="0x00" function="0x0"/>
</disk>
<controller type="usb" index="0" model="qemu-xhci" ports="15">
<address type="pci" domain="0x0000" bus="0x02" slot="0x00" function="0x0"/>
</controller>
<controller type="pci" index="0" model="pcie-root"/>
<controller type="pci" index="1" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="1" port="0x10"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="2" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="2" port="0x11"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x1"/>
</controller>
<controller type="pci" index="3" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="3" port="0x12"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x2"/>
</controller>
<controller type="pci" index="4" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="4" port="0x13"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x3"/>
</controller>
<controller type="pci" index="5" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="5" port="0x14"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x4"/>
</controller>
<controller type="pci" index="6" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="6" port="0x15"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x5"/>
</controller>
<controller type="pci" index="7" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="7" port="0x16"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x6"/>
</controller>
<controller type="pci" index="8" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="8" port="0x17"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x02" function="0x7"/>
</controller>
<controller type="pci" index="9" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="9" port="0x18"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x0" multifunction="on"/>
</controller>
<controller type="pci" index="10" model="pcie-root-port">
<model name="pcie-root-port"/>
<target chassis="10" port="0x19"/>
<address type="pci" domain="0x0000" bus="0x00" slot="0x03" function="0x1"/>
</controller>
<controller type="virtio-serial" index="0">
<address type="pci" domain="0x0000" bus="0x03" slot="0x00" function="0x0"/>
</controller>
<controller type="sata" index="0">
<address type="pci" domain="0x0000" bus="0x00" slot="0x1f" function="0x2"/>
</controller>
<input type="mouse" bus="ps2"/>
<input type="keyboard" bus="ps2"/>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x11" slot="0x00" function="0x0"/>
</source>
<address type="pci" domain="0x0000" bus="0x06" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x11" slot="0x00" function="0x1"/>
</source>
<address type="pci" domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x11" slot="0x00" function="0x2"/>
</source>
<address type="pci" domain="0x0000" bus="0x08" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x11" slot="0x00" function="0x3"/>
</source>
<address type="pci" domain="0x0000" bus="0x09" slot="0x00" function="0x0"/>
</hostdev>
<hostdev mode="subsystem" type="pci" managed="yes">
<source>
<address domain="0x0000" bus="0x07" slot="0x00" function="0x0"/>
</source>
<address type="pci" domain="0x0000" bus="0x0a" slot="0x00" function="0x0"/>
</hostdev>
<memballoon model="virtio">
<address type="pci" domain="0x0000" bus="0x05" slot="0x00" function="0x0"/>
</memballoon>
</devices>
</domain>
Why did this happen, can I do any better at boot time? #
There is a guide on VFIO in Fedora over here that has the lines:
pci-stub.ids=USB_ID
and:
The USB controller sometimes attaches to vfio, others to pci-stub for some reason. But it works so… No idea.
Did this ekistece user stumble upon to this race condition and mitigate it by using the older PCI-SUB driver?
Let us test by changing our boot parameters, which involves a lot of rpm-ostree kargs
commands:
amd_iommu=pt rd.driver.pre=vfio-pci vfio-pci.ids=1002:73bf,1002:ab28,8086:1539 pci-stub.ids=1002:73a6,1002:73a4
lspci -nv -s 11:00
11:00.0 0300: 1002:73bf (rev c1) (prog-if 00 [VGA controller])
Subsystem: 1002:0e3a
Flags: bus master, fast devsel, latency 0, IRQ 255, IOMMU group 36
Memory at b0000000 (64-bit, prefetchable) [size=256M]
Memory at c0000000 (64-bit, prefetchable) [size=2M]
I/O ports at e000 [disabled] [size=256]
Memory at fc600000 (32-bit, non-prefetchable) [size=1M]
Expansion ROM at fc700000 [disabled] [size=128K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [200] Physical Resizable BAR
Capabilities: [240] Power Budgeting <?>
Capabilities: [270] Secondary PCI Express
Capabilities: [2a0] Access Control Services
Capabilities: [2d0] Process Address Space ID (PASID)
Capabilities: [320] Latency Tolerance Reporting
Capabilities: [410] Physical Layer 16.0 GT/s <?>
Capabilities: [440] Lane Margining at the Receiver <?>
Kernel driver in use: vfio-pci
Kernel modules: amdgpu
11:00.1 0403: 1002:ab28
Subsystem: 1002:ab28
Flags: fast devsel, IRQ 255, IOMMU group 37
Memory at fc724000 (32-bit, non-prefetchable) [disabled] [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Legacy Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
11:00.2 0c03: 1002:73a6 (prog-if 30 [XHCI])
Subsystem: 1002:73a6
Flags: fast devsel, IRQ 26, IOMMU group 38
Memory at fc500000 (64-bit, non-prefetchable) [size=1M]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+
Capabilities: [c0] MSI-X: Enable- Count=8 Masked-
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: pci-stub
11:00.3 0c80: 1002:73a4
Subsystem: 1002:0408
Flags: fast devsel, IRQ 255, IOMMU group 39
Memory at fc720000 (64-bit, non-prefetchable) [disabled] [size=16K]
Capabilities: [48] Vendor Specific Information: Len=08 <?>
Capabilities: [50] Power Management version 3
Capabilities: [64] Express Endpoint, MSI 00
Capabilities: [a0] MSI: Enable- Count=1/2 Maskable- 64bit+
Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
Capabilities: [150] Advanced Error Reporting
Capabilities: [2a0] Access Control Services
Kernel driver in use: pci-stub
We see that those two devies are indeed flagged now as pci-stub
… which actually makes them work just fine.
So, ekistece was on to something with their use of pci-stub.ids=
.
What about Power? What about Flexibility? #
When I’d posted earlier I’d asked about power state. For my workloads I plan to not use the 6800XT very often, so it being in the lowest possible power state when this sytem is online is ideal. It appears that it is in D3
when the vfio-pci
driver is being used, but mabye the amdgpu driver would be better?
I’ve not had a chance to wallwart this yet.
Originally I scoffed at people who were dynamically “rebind”-ing their devices, thinking that I’d be better off by forcing the devices into form at boot time… but it might actually be an overall better approach to write a script that rebinds your devices on demand:
- would allow for the devices to be managed by the appropriate in kernel driver, potentially allowing for better powerstate management
- would allow for those devices to be used (e.g. I could do a workload with the 6800XT on the host OS if the W6600 didn’t have enough something that I needed).
I appreciate this community, this was my first foray into registering and posting. Thanks a ton for those who looked at this and hopefully my write up can show others some of the troubleshooting steps I wasn’t able to take until jonpas showed me.