Use of compiler builtin ffs/ctz functions will result in optimized
instruction sequences when possible, and fall back to calling a function
provided by the compiler run-time library. We have slowly shifted our
platforms to take advantage of these builtins in 60645781d6 (arm64),
1c76d3a9fb (arm), 9e319462a0 (powerpc, partial).
Some platforms still rely on the libkern implementations of these
functions provided by libkern, namely riscv, powerpc (ffs*, flsll), and
i386 (ffsll and flsll). These routines are slow, as they perform a
linear search for the bit in question. Even on platforms lacking
dedicated bit-search instructions, such as riscv, the compiler library
will provide better-optimized routines, e.g. by using binary search.
Consolidate all definitions of these functions (whether currently using
builtins or not) to libkern.h. This should result in equivalent or
better performing routines in all cases.
One wart in all of this is the existing HAVE_INLINE_F*** macros, which
we use in a few places to conditionally avoid the slow libkern routines.
These aren't easily removed in one commit. For now, provide these
defines unconditionally, but marked for removal after subsequent
cleanup.
Removal of the now unused libkern routines will follow in the next
commit.
Reviewed by: dougm, imp (previous version)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40698
Since pmap_ps_enabled() is true by default, check it inside of
pmap_promote_pde() instead of at every call site.
Modify pmap_promote_pde() to return true if the promotion succeeded and
false otherwise. Use this return value in a couple places.
Reviewed by: kib, markj
Differential Revision: https://reviews.freebsd.org/D40744
Stop requiring all of the PTEs to have the accessed bit set for superpage
promotion to occur. Given that change, add support for promotion to
pmap_enter_quick(), which does not set the accessed bit in the PTE that
it creates.
Since the final mapping within a superpage-aligned and sized region of a
memory-mapped file is typically created by a call to pmap_enter_quick(),
we now achieve promotions in circumstances where they did not occur
before, for example, the X server's read-only mapping of libLLVM-15.so.
See also https://www.usenix.org/system/files/atc20-zhu-weixi_0.pdf
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D40478
Due to fxsave area is os independent reimplement fxsave handmade code
using copying of a whole area.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D40443
MFC after: 2 weeks
Now that <sys/tslog.h> is wrapped in #ifdef _KERNEL, it's safe to have
tslog annotations in files which might be built from userland (i.e. in
subr_boot.c, which is built as part of the boot loader).
This reverts commit 59588a546f.
The change to subr_boot.c broke the libsa build because the TSLOG
macros have their own definitions for the boot loader -- I didn't
realize that the loader code used subr_boot.c.
I'm currently testing a fix and I'll revert this revert once I'm
satisfied that everything works, but I don't want to leave the
tree broken for too long.
This reverts commit 469cfa3c30.
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
SYSINIT cpu takes roughly 2770 us:
* 2280 us in vm_ksubmap_init
* 535 us in kmem_malloc
* 450 us in pmap_zero_page
* 1720 us in pmap_growkernel
* 1620 us in pmap_zero_page
* 80 us in bufinit
* 480 us in cpu_setregs
* 430 us in cpu_setregs calling load_cr0
Much of this is hypervisor overhead: load_cr0 is slow because it traps
to the hypervisor, and 99% of the time in pmap_zero_page is spent when
we first touch the page, presumably due to the host Linux kernel
faulting in backing pages one by one.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40327
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
pmap_zero_page is responsible for 4.6 ms of the 25.0 ms of boot time.
This is not in fact time spent zeroing pages though; almost all of
that time is spent in a first-touch penalty, presumably due to the
host Linux kernel faulting in backing pages one by one.
There's probably a way to improve that by teaching Firecracker to
fault in all the VM's pages from the start rather than having them
faulted in one at a time, but that's outside of FreeBSD's control.
This commit adds a TSLOG_PAGEZERO option which enables TSLOG on the
amd64 pmap_zero_page function; it's a separate option (turned off
by default even if TSLOG is enabled) since zeroing pages happens
enough that it can easily fill the TSLOG buffer and prevent other
timing information from being recorded.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40326
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
hammer_time takes roughly 2740 us:
* 55 us in xen_pvh_parse_preload_data
* 20 us in boot_parse_cmdline_delim
* 20 us in boot_env_to_howto
* 15 us in identify_hypervisor
* 1320 us in link_elf_reloc
* 1310 us in relocate_file1 handling ef->rela
* 25 us in init_param1
* 30 us in dpcpu_init
* 355 us in initializecpu
* 255 us in initializecpu calling load_cr4
* 425 us in getmemsize
* 280 us in pmap_bootstrap
* 205 us in create_pagetables
* 10 us in init_param2
* 25 us in pci_early_quirks
* 60 us in cninit
* 90 us in kdb_init
* 105 us in msgbufinit
* 20 us in fpuinit
* 205 us elsewhere in hammer_time
Some of these are unavoidable (e.g. identify_hypervisor uses CPUID and
load_cr4 loads the CR4 register, both of which trap to the hypervisor)
but others may deserve attention.
Sponsored by: https://www.patreon.com/cperciva
Differential Revision: https://reviews.freebsd.org/D40325
The sysentvec sv_imgact_try was used by kern_exec() to allow
non-native ABI to fixup shell path according to ABI root directory.
Since the non-native ABI can now specify its root directory directly
to namei() via pwd_altroot() call this facility is not needed anymore.
Differential Revision: https://reviews.freebsd.org/D40092
MFC after: 2 month
The Barndinfo emul_path was used by the Elf image activator to fixup
interpreter file name according to ABI root directory. Since the
non-native ABI can now specify its root directory directly to namei()
via pwd_altroot() call this facility is not needed anymore.
Differential Revision: https://reviews.freebsd.org/D40091
MFC after: 2 month
Get rid of using register numbers which is undefined in libunwind
on x86_64.
Differential Revision: https://reviews.freebsd.org/D40156
MFC after: 1 month
Perhaps, this does not makes much sense as destroyng %rcx declared by
the x86_64 Linux syscall ABI. However,:
a) if we get a signal while we are in the kernel, we should restore
tf_rcx when preparing machine context for signal handlers.
b) the Linux world is strange, someone can depend on %rcx value
after syscall, something like go.
Differential Revision: https://reviews.freebsd.org/D40155
MFC after: 1 month
I agree, it would be great to avoid PCB_FULL_IRET, however we should
follow Linux system call ABI.
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D40152
MFC after: 1 month
This avoids bloating the kernel image when MAXCPU is large.
A follow-up patch for kgdb and other kernel debuggers is needed since
the stoppcbs symbol is now a pointer. Bump __FreeBSD_version so that
debuggers can use osreldate to figure out how to handle stoppcbs.
PR: 269572
MFC after: never
Reviewed by: mjg, emaste
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39806
Commit 0bda8d3e9f ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI. This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.
Rework IPI reporting to avoid this problem. Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure. This data is
only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
vmexit info structure, and make VM_RUN copy it out separately. Zero
out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.
PR: 271330
Reviewed by: corvink, jhb (previous versions)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D40113
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
pmap_get_pcid() calls zpcpu_get() which is defined in pcpu.h.
It is unclear why we do not include that header but like right
above the change add another guard around pmap_get_pcid().
This allows some LinuxKPI headers to compile again.
Suggested by: markj
MFC after: 10 days
All UFS options work for ufs.ko.
Reviewed by: emaste, imp
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D39990
Failing to preserve pir_desc can result in pending interrupts being lost
on resume leading to a hung VM.
Reviewed by: corvink, jhb
MFC after: 1 week
Sponsored by: vStack
Differential Revision: https://reviews.freebsd.org/D35447
This patch fixes virtual machine single stepping on VMX hosts.
Currently, when using bhyve's gdb stub, each attempt at single-stepping
a vCPU lands in a timer interrupt. The current single-stepping mechanism
uses the Monitor Trap Flag feature to cause VMEXIT after a single
instruction is executed. Unfortunately, the SDM states that MTF causes
VMEXITs for the next instruction that gets executed, which is often not
what the person using the debugger expects. [1]
This patch adds a new VM capability that masks interrupts on a vCPU by
blocking interrupt injection and modifies the gdb stub to use the newly
added capability while single-stepping a vCPU.
[1] Intel SDM 26.5.2 Vol. 3C
Reviewed by: corvink, jbh
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39949
This existing helper function is preferable to the hand-rolled
calculation of the kstack bounds.
Make some small style improvements while here. Notably, rename every
instance of "r", the return address, to "ra". Tidy the includes in the
affected files.
Reviewed by: jkoshy
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D39909
This is the MINIMAL config with SMP/NUMA options turned off.
Useful to ensure that UP configuration still builds, until it is removed
finally.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
On amd64 ACPI is required to boot, it cannot work as a module, and we do
not build the ACPI module for long time.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
If vmx or svm is disabled in BIOS or the device isn't supported by vmm,
modinit won't allocate these state save areas. As kmem_free panics when
passing a NULL pointer to it, loading the vmm kernel module causes a
panic too.
PR: 271251
Reviewed by: markj
Fixes: 74ac712f72 ("vmm: Dynamically allocate a couple of per-CPU state save areas")
MFC after: 1 week
Sponsored by: Beckhoff Automation GmbH & Co. KG
Differential Revision: https://reviews.freebsd.org/D39974
Do not preallocate pcpu area backing pages on early startup, only
allocate enough of KVA for pcpu[MAXCPU] and the page for BSP. Other
pages are allocated after we know the number of cpus and their
assignments to the domains.
PCPUs are not accessed until they are initialized, which happens on AP
startup.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D39945