HP DL380 G6
Some notes on the HP DL380 G6 server that I have.
Accessing iLo with old Java
In order to use the virtual console, you need to use their old Java applets. I had to use a Windows XP VM running Firefox 52.9.0 ESR with Java 6u45. Alternatively, you can try the instructions at Dell Remote Access Controller#Accessing old iDRAC which uses an old version of Firefox and Java with the security settings disabled.
Booting
After the server posts, it takes about a minute for any text to appear. The first thing that loads next is the SATA Option ROM for the CD-ROM. At this point, you can press F9 to enter setup.
Broadcom NetXtreme Ethernet Boot Agent
Ctrl-S to enter configuration menu
Integrated Lights-Out
You have 2 seconds to press F8 to configure the iLo.
HP Smart Array Controller
After initializing, you have 3 seconds to press one of:
- F8 - run option rom utility
- ESC - skip configuration and continue
Booting
After all that, you have 2-3 seconds to do one of the following before the boot process continues on.
Key | Description |
---|---|
F9
|
BIOS Setup |
F10
|
System maintenance |
F11
|
Default boot override options, (brings up the boot menu) |
F12
|
Network boot |
Linux and IOMMU
I bought a cheap Nvidia Quadro P400 graphics card to see if I can help transcode video streams faster in one of my VMs. Ideally, I'd like to pass the GPU through to a VM.
There are hurdles when trying to do this on old, unsupported HP server hardware. I'll go over my difficulties here.
Issues
Enabling iommu breaks storage during boot
I tried enabling iommu on Proxmox 7.3 running on this machine by booting with the intel_iommu=on
kernel argument, but it ended up causing the SD card that I'm booting off of to stop working. The system comes up to the point where systemd is trying to start but can't read the disk anymore. It also threw read errors to the linux console which made it appear as though the SD card was failing (when it wasn't).
What I got were these messages. I couldn't even reboot the machine with ctrl-alt-delete
because systemd couldn't read the shutdown binary.
[ 0.271175] ACPI: SPCR: Unexpected SPCR Access Width. Defaulting to byte size
[ 0.834416] [Firmware Bugl: the BIDS has corrupted hw-PMU resources (MSR 38d
is 330)
[ 1.143672] ERST: Failed to get Error Log Address Range.
Found volume group "pve’ using metadata type lvm2
g logical volumei{s) in volume group “pve’ now active
[ 4.557637] sd 3:0:0:0: [sdh] No Caching mode page found
[ 4.557690] sd 3:0:0:0: [sdh] Assuming drive cache: write through
/dev/mapper/pve-root: recovering journal
/dev/mapper/pve-root: Clearing orphaned inode 656721 (uid=0, gid=0, mode=0100600
, size=0)
/dev/mapper/pve-root: clean, 9526471622016 files, 4991886/6488064 hlocks
[ 11.058407] ipmi_si 0000:01:04.6: Could not setup I/O space
[ 11.771726] DMAR: DRHD: handling fault status reg 2
[ 11.771735] DMAR: [DMA Read NO_PASID] Request device [00:1e.0] fault addr 0x2
000 [fault reason 0x06] PTE Read access is not set
[ 11.771763] NMI: PCI system error (SERR) for reason bl on CPU 0.
[ 11.771768] Dazed and confused, but trying to continue
[ 25.637034] DMAR: DRHD: handling fault status reg 102
[ 25.637198] DMAR: [DMA Read MO_PASID] Request device [04:00.0] fault addr 0xff6b3000 [fault reason 00x06] PTE Read access is not set
...
[ 147.946721] bhlk_update_reguest: I/O error, dev sda, sector 35205608 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 147.947060] blk_update_reguest: I/O error, dev sda, sector 35205608 op O0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 147.947534] blk_update_reguest: I/O error, dev sda, sector 35205632 op 0x0:(READ) flags 0xB80700 phys_seg 1 prio class 0
[ 147.947862] blk_update_reguest: I/O error, dev sda, sector 35205632 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
[ 147.949046] blk_update_reguest: I/O error, dev sda, sector 34673080 op 0x0:(READ) flags 0x3000 phys_seg 1 prio class 0
[ 147.949365] EXT4-fs error (device dm-1): __extd_find_entry:1663: inode #524399: comm sshd: reading directory lblock 0
[ 147.949725] bhlk_update_reguest: I/O error, dev sda, sector 17829888 op 0x1:(WRITE) flags 0x3800 phys_seg 1 prio class 0
[ 147.950033] Buffer I/O error on dew dm-1, logical hlock 0, lost sync page write
...
[!!!!!!] Failed to execute shutdown binary.
...
The 'Dazed and confused' message reflected how I felt trying to figure out what's going on at 3 in the morning.
All this was 'fixed' by appending iommu=pt
as well.
The system came up and I can see the card. The card has the following PCI address and ID and it's in its own iommu group:
root@server:~# lspci -vv | grep -i nvid
10:00.0 VGA compatible controller: NVIDIA Corporation GP107GL [Quadro P400] (rev a1) (prog-if 00 [VGA controller])
Kernel modules: nvidiafb, nouveau
10:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
root@server:~# lspci -n | grep -i 10:00
10:00.0 0300: 10de:1cb3 (rev a1)
10:00.1 0403: 10de:0fb9 (rev a1)
root@server:~# find /sys/kernel/iommu_groups/ -type l | grep '31/dev'
/sys/kernel/iommu_groups/31/devices/0000:10:00.1
/sys/kernel/iommu_groups/31/devices/0000:10:00.0
Of course, it isn't quite this easy because I ran into the next issue.
PCIe passthrough fails due to platform RMRR requirement
When I try to pass through a Nvidia graphics card, I am getting this message when trying to start a VM: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
.
Full dmesg logs regarding DMAR:
[ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.83-1-pve root=/dev/mapper/pve-root ro intel_iommu=on iommu=pt
[ 0.010527] ACPI: DMAR 0x00000000B761FE80 000172 (v01 HP ProLiant 00000001 \xd2? 0000162E)
[ 0.010585] ACPI: Reserving DMAR table memory at [mem 0xb761fe80-0xb761fff1]
[ 0.273100] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.83-1-pve root=/dev/mapper/pve-root ro intel_iommu=on iommu=pt
[ 0.273168] DMAR: IOMMU enabled
[ 0.683529] DMAR-IR: This system BIOS has enabled interrupt remapping
[ 1.109267] iommu: Default domain type: Passthrough (set via kernel command line)
[ 1.184180] DMAR: Host address width 39
[ 1.184233] DMAR: DRHD base: 0x000000bfffe000 flags: 0x1
[ 1.184241] DMAR: dmar0: reg_base_addr bfffe000 ver 1:0 cap c90780106f0462 ecap f0207e
[ 1.184417] DMAR: RMRR base: 0x000000b77fc000 end: 0x000000b77fdfff
[ 1.184474] DMAR: RMRR base: 0x000000b77f5000 end: 0x000000b77fafff
[ 1.184531] DMAR: RMRR base: 0x000000b763e000 end: 0x000000b763ffff
[ 1.184588] DMAR: ATSR flags: 0x0
[ 1.184665] DMAR: No SATC found
[ 1.184720] DMAR: dmar0: Using Queued invalidation
[ 1.184863] pci 0000:00:00.0: Adding to iommu group 0
...
[ 1.191883] pci 0000:3f:06.3: Adding to iommu group 44
[ 1.192057] DMAR: Intel(R) Virtualization Technology for Directed I/O
[ 56.506358] vfio-pci 0000:10:00.1: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
[ 92.930663] vfio-pci 0000:10:00.1: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
[ 302.300476] vfio-pci 0000:10:00.1: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
[ 313.659322] vfio-pci 0000:10:00.1: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
[ 327.929358] vfio-pci 0000:10:00.1: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
[ 528.494808] vfio_iommu_type1: `' invalid for parameter `allow_unsafe_interrupts'
[ 594.778634] vfio-pci 0000:10:00.1: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
[ 679.647684] vfio-pci 0000:10:00.1: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
[ 773.746381] vfio-pci 0000:10:00.1: DMAR: Device is ineligible for IOMMU domain attach due to platform RMRR requirement. Contact your platform vendor.
The issue apparently is prevalent on HP Proliants because PCI device's memory space is marked as RMRR (Reserved Memory Region Reporting). RMRR memory shouldn't be mapped to a VM by the IOMMU and is enforced by Linux. This is what we're seeing. Perhaps newer servers won't have this mapping, but since this is a decade old piece of hardware, there won't be any HP BIOS updates to address this.
One possible solution seems to be to just patch out that restriction in the kernel and be extra careful to not passthrough any PCI devices that are used by the underlying server platform (such as network cards or raid controllers). Here are some links and resources from searching around:
- Reddit user who originally patched out the restriction: https://old.reddit.com/r/homelab/comments/iw5cew/proxmox_i_created_a_script_to_fix_gpu_pass/
- A subsequent project that streamlines this patch kernel: https://github.com/Aterfax/relax-intel-rmrr
Allow unsafe interrupts
With the new patched kernel, when I try starting a VM:
vfio_iommu_type1_attach_group: No interrupt remapping support. Use the module param "allow_unsafe_interrupts" to enable VFIO IOMMU support on this platform
Reload the module with that option:
# rmmod vfio_iommu_type1
# modprobe vfio_iommu_type1 allow_unsafe_interrupts=1
WIth all that done, I can _finally_ passthrough my graphics card into a virtual machine.