Hyper-threading

From Leo's Notes
Last edited on 30 December 2021, at 21:25.

Hyper-threading (HT) allows multiple threads to run on the same core. A processor with HT will appear as two logical processors per core. Each logical processor have their own independent architectural states (registers, etc.) but share the same execution resources (ALU, FPU, etc). This allows better use of a processor by keeping execution running even when one of the logical core's pipeline is stalled.

HT's benefits depend heavily on your workload. Generally, HT results in higher power usage and may be susceptible to Spectre-like attacks.

Disabling Hyper-threading[edit | edit source]

HT should be disabled in the BIOS but it can also be disabled in the operating system after the system has booted and is running.

On Linux, you will need to disable the second logical CPU per core. You can find a core's sibling by reading out /sys/devices/system/cpu/cpu#/topology/thread_siblings_list, where # is the logical core number.

# cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list  | sort | uniq
0,12
...
9,21

What this output tells you is that the logical processor 12 is a sibling of logical processor 0, and so on. To disable hyper-threading, we need to offline all siblings. This can be done with the following bash function.

function disable_hyperthreading (){
	for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un); do
		echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
	done
	echo "Disabled"
}

Kernel panics[edit | edit source]

On some machines, running the function above resulted in a kernel panic.

[  246.788441] smpboot: CPU 12 is now offline
[  246.856168] bad: scheduling from the idle thread!
[  246.925566] CPU: 13 PID: 90 Comm: migration/13 Tainted: G          I      --------- -  - 4.18.0-305.25.1.el8_4.x86_64 #1
[  247.079120] Hardware name: HP ProLiant SL390s G7/, BIOS P69 07/02/2013
[  247.174887] Call Trace:
[  247.214637]  dump_stack+0x5c/0x80
[  247.264712]  dequeue_task_idle+0x28/0x40
[  247.322505]  move_queued_task+0x7e/0x180
[  247.391743]  __balance_push_cpu_stop+0x12d/0x160
[  247.494764]  ? __migrate_task+0x80/0x80
[  247.565690]  cpu_stopper_thread+0x47/0x100
[  247.625381]  ? sort_range+0x20/0x20
[  247.688473]  smpboot_thread_fn+0xc5/0x160
[  247.749272]  kthread+0x116/0x130
[  247.804020]  ? kthread_flush_work_fn+0x10/0x10
[  247.870994]  ret_from_fork+0x35/0x40
[  247.929115] BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
[  248.061184] PGD 0 P4D 0
[  248.100897] Oops: 0010 [#1] SMP PTI
[  248.156790] CPU: 13 PID: 90 Comm: migration/13 Tainted: G          I      --------- -  - 4.18.0-305.25.1.el8_4.x86_64 #1
[  248.331365] Hardware name: HP ProLiant SL390s G7/, BIOS P69 07/02/2013
[  248.438339] RIP: 0010:0x0
[  248.476408] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[  248.607016] RSP: 0018:ffffc0140352fe38 EFLAGS: 00010002
[  248.685718] RAX: 0000000000000000 RBX: ffff9ef2cbba9f40 RCX: 000000000000027f
[  248.798444] RDX: 0000000000000000 RSI: ffff9eefde544380 RDI: ffff9ef2cbba9f40
[  248.899755] RBP: ffff9eefde544380 R08: 0000000000003c00 R09: 0000000000004400
[  249.009310] R10: 000000d851c27df7 R11: 000000000000000c R12: 000000000000000d
[  249.112479] R13: ffffc0140352fe68 R14: ffff9ef2cbba9f40 R15: ffff9eefde544380
[  249.217315] FS:  0000000000000000(0000) GS:ffff9ef2cbb80000(0000) knlGS:0000000000000000
[  249.336892] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  249.419837] CR2: ffffffffffffffd6 CR3: 0000000318e10006 CR4: 00000000000206e0
[  249.528394] Call Trace:
[  249.584095]  move_queued_task+0xff/0x180
[  249.671126]  __balance_push_cpu_stop+0x12d/0x160
[  249.752750]  ? __migrate_task+0x80/0x80
[  249.808431]  cpu_stopper_thread+0x47/0x100
[  249.867571]  ? sort_range+0x20/0x20
[  249.919188]  smpboot_thread_fn+0xc5/0x160
[  249.990663]  kthread+0x116/0x130
[  250.036074]  ? kthread_flush_work_fn+0x10/0x10
[  250.100548]  ret_from_fork+0x35/0x40
[  250.152900] Modules linked in: sd_mod t10_pi sg ata_generic crct10dif_pclmul igb ata_piix crc32_pclmul crc32c_intel i2c_algo_bit libata ghash_clmulni_intel
 serio_raw dca sunrpc dm_mirror dm_region_hash dm_log dm_mod
[  250.439144] CR2: 0000000000000000
[  250.492585] ---[ end trace b63f8ec0963b8bec ]---
[  250.558212] RIP: 0010:0x0
[  250.596213] Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
[  250.701656] RSP: 0018:ffffc0140352fe38 EFLAGS: 00010002
[  250.798575] RAX: 0000000000000000 RBX: ffff9ef2cbba9f40 RCX: 000000000000027f
[  250.897111] RDX: 0000000000000000 RSI: ffff9eefde544380 RDI: ffff9ef2cbba9f40
[  251.003043] RBP: ffff9eefde544380 R08: 0000000000003c00 R09: 0000000000004400
[  251.110621] R10: 000000d851c27df7 R11: 000000000000000c R12: 000000000000000d
[  251.223430] R13: ffffc0140352fe68 R14: ffff9ef2cbba9f40 R15: ffff9eefde544380
[  251.327727] FS:  0000000000000000(0000) GS:ffff9ef2cbb80000(0000) knlGS:0000000000000000
[  251.446295] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  251.534243] CR2: ffffffffffffffd6 CR3: 0000000318e10006 CR4: 00000000000206e0
[  251.638376] Kernel panic - not syncing: Fatal exception
[  251.736583] Kernel Offset: 0x1b000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[  251.918678] ---[ end Kernel panic - not syncing: Fatal exception ]---

This might be a bug with the kernel. I was able to turn all the CPUs off one by one. Perhaps a timing issue?