1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-11-28 08:02:54 +00:00
Commit Graph

9310 Commits

Author SHA1 Message Date
Dmitry Chagin
510f5c88f0 linux(4): Modify write syscall to match Linux
Adding a write syscall wrapper is needed due to Linux family of write
syscalls doesn't distinguish between in kernel blocking operations
and always returns EAGAIN while FreeBSD can return ENOBUFS.

MFC after:		1 month
2023-08-20 10:36:28 +03:00
Dmitry Chagin
3460fab5fc linux(4): Remove sys/cdefs.h inclusion where it's not needed due to 685dc743 2023-08-18 13:12:02 +03:00
Mark Johnston
5635d5b61e vmm: Fix VM_GET_CPUS compatibility
bhyve in a 13.x jail fails to boot guests with more than one vCPU
because they pass too small a buffer to VM_GET_CPUS, causing the ioctl
handler to return ERANGE.  Handle this the same way as cpuset system
calls: make sure that the result can fit in the truncated space, and
relax the check on the cpuset buffer.

As a side effect, fix an insufficient bounds check on "size".  The
signed/unsigned comparison with sizeof(cpuset_t) fails to exclude
negative values, so we can end up allocating impossibly large amounts of
memory.

Reviewed by:	jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D41496
2023-08-17 18:10:02 -04:00
Dmitry Chagin
270e01d468 linux(4): Fix leftovers after 2ff63af9 2023-08-17 23:54:00 +03:00
Dmitry Chagin
158b57295f linux(4): Regen for sendfile 2023-08-17 22:57:17 +03:00
Dmitry Chagin
5068387f42 linux(4): Use l_off_t type for offset argument in sendfile syscall
The off_t on Linux is a long, so it's non-functional change, just to
avoid confusing future readers.

MFC after:		1 month
2023-08-17 22:57:16 +03:00
Warner Losh
78d146160d sys: Remove $FreeBSD$: one-line bare tag
Remove /^\s*\$FreeBSD\$$\n/
2023-08-16 11:55:17 -06:00
Warner Losh
031beb4e23 sys: Remove $FreeBSD$: one-line sh pattern
Remove /^\s*#[#!]?\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:58 -06:00
Warner Losh
685dc743dc sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
2023-08-16 11:54:36 -06:00
Warner Losh
71625ec9ad sys: Remove $FreeBSD$: one-line .c comment pattern
Remove /^/[*/]\s*\$FreeBSD\$.*\n/
2023-08-16 11:54:24 -06:00
Warner Losh
2ff63af9b8 sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
2023-08-16 11:54:18 -06:00
Warner Losh
95ee2897e9 sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
2023-08-16 11:54:11 -06:00
Ed Maste
a51f81c2e5 x86: move EARLY_AP_STARTUP into DEFAULTS
EARLY_AP_STARTUP was introduced in 2016 (commit fdce57a042) with note:

    As a transition aid, the new behavior is moved under a new
    kernel option (EARLY_AP_STARTUP). This will allow the option
    to be turned off if need be during initial testing. I hope to
    enable this on x86 by default in a followup commit ...

It was enabled by default, but became effectively mandatory (on x86)
some time later.  Move it to DEFAULTS to avoid an unbootable system if
the option is left out of a custom kernel configuration file.

Reported by:	wollman
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D41352
2023-08-14 16:17:48 -04:00
Marius Strobl
37c8ee8847 ath(4): Remove MIPS AHB frontend and join PCI one w/ main support again
Following the removal of general MIPS support, there's no longer a need
to have the AHB bus-frontend in place, which according to Linux sources
also isn't used with any non-MIPS SoCs. For simplicity, PCI bus support
is only made conditional on the main one again, i. e. device ath_pci is
removed, and built into the main module, i. e. if_ath_pci.ko obsoleted,
respectively.
Effectively, this reverts the following commits and associated changes:
dba9c85977
e849bb3ecb

Approved by:	adrian
Relnotes:	yes
Differential Revision:	https://reviews.freebsd.org/D41354
2023-08-08 22:30:13 +02:00
Mark Johnston
78cc000cba amd64: Increase sanitizers' static shadow memory reservation
Because KASAN shadows the kernel image itself (KMSAN currently does
not), a shadow mapping of the boot stack must be created very early
during boot.  pmap_san_enter() reserves a fixed number of pages for the
purpose of creating and mapping this shadow region.

After commit 789df254cc ("amd64: Use a larger boot stack"), it could
happen that this reservation is insufficient; this happens when
bootstack crosses a PAGE_SHIFT + KASAN_SHADOW_SCALE_SHIFT boundary.
Update the calculation to take into account the new size of the boot
stack.

Fixes:		789df254cc ("amd64: Use a larger boot stack")
Sponsored by:	The FreeBSD Foundation
2023-08-04 12:38:24 -04:00
Dmitry Chagin
b5c0b9555d linux(4): Regen for ioprio syscalls
MFC after:		1 month
2023-08-04 16:03:57 +03:00
Dmitry Chagin
1c83154e49 linux(4): Modify ioprio syscalls to match Linux
MFC after:		1 month
2023-08-04 16:03:55 +03:00
Ed Maste
9051987e40 amd64: Bump MAXCPU to 1024 (from 256)
Hardware with more than 256 CPU cores is currently available and will
become increasingly common over FreeBSD 14's lifetime.  Increase MAXCPU
in the amd64 GENERIC kernel configuration to 1024.

Earlier commits increased some related limits.  These prerequisite
commits include at least:

- d7ed40243769 Increase MAX_APIC_ID safeguard to 0x800
- d1639e43c5 cpuset: increase userland maximum size to 1024

Global and allocated arrays sized by MAXCPU result in excessive bloat
on systems with lower core counts.  In addition, some code used u_char
(8 bits) to hold a CPU index, which is not valid if MAXCPU is greater
than 256.

A number of recent commits addressed these sorts of issues, including
at least:

- 133935d26f pf: atomically increment state ids
- 74ac712f72 vmm: Dynamically allocate a couple of per-CPU state save areas
- 78cfa762eb callout: Move per-CPU callout state into the dpcpu region
- 42f722e721 amd64: store pcids pmap data in pcpu zone
- 9801e7c275 smp_topo: dynamically allocate group array
- 9fb6718d1b smp: Dynamically allocate the stoppcbs array
- 2bb16c6352 x86: retire use of intr_bind

There are some additional allocations still to be converted and
more scalability work is required to make effective use of very high
core count systems, but this change allows us to boot on these systems
and provides a Kernel Binary Interface (KBI) for the FreeBSD 14 release
that supports these configurations.

Special thanks to AMD for providing hardware to test these changes.

PR:		269572
Reviewed by:	des
Relnotes:	Yes
Sponsored by:	The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D36838
2023-08-03 17:41:26 -04:00
Gordon Bergling
29eab3e4e0 linux(4): Fix two typos in source code comments
- s/decriptors/descriptors/

MFC after:	3 days
2023-08-02 11:55:30 +02:00
Mark Johnston
5ad29bc8d4 amd64: Fix TLB invalidation routines in !SMP kernels
amd64 is special in that its implementation of zpcpu_offset_cpu() is not
the identity transformation, even in !SMP kernels.  Because the pm_pcidp
array of amd64's struct pmap is allocated from a pcpu UMA zone, this
means that accessing pm_pcidp directly, as is done in !SMP
implementations of pmap_invalidate_*, does not work.  Specifically, I
see occasional unexplicable crashes in userspace when PCIDs are enabled.

Apply a minimal patch to fix the problem.  While it would also make
sense to provide separate implementations of zpcpu_* for !SMP kernels,
fixing it this way makes the SMP and !SMP implementations of
pmap_invalidate_* more similar.

Reviewed by:	alc, kib
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	Juniper Networks, Inc.
Differential Revision:	https://reviews.freebsd.org/D41230
2023-07-30 11:12:35 -04:00
Alan Cox
3d7c37425e amd64 pmap: Catch up with pctrie changes
Recent changes to the pctrie code make it necessary to initialize the
kernel pmap's rangeset for PKU.
2023-07-28 15:13:13 -05:00
Dmitry Chagin
4281dab8bc linux(4): Add elf_hwcap2 to x86
On x86 Linux via AT_HWCAP2 the user controlled (by tunables) processor
capabilities are exposed.

Reviewed by:
Differential Revision:	https://reviews.freebsd.org/D41165
MFC after:		2 weeks
2023-07-28 11:56:59 +03:00
Mark Johnston
640e5cb304 kmsan: Add a comment explaining why KMSAN doesn't shadow above KERNBASE
Sponsored by:	The FreeBSD Foundation
2023-07-27 16:01:58 -04:00
Mark Johnston
789df254cc amd64: Use a larger boot stack
With sanitizers enabled, it becomes possible to overflow the stack when
only a single page is used.  Follow arm64's example and use the default
kernel stack size instead.  This is a bit wasteful, but without a guard
page, overflow merely corrupts adjacent .bss entries and is thus
difficult to debug.

Note, with a GENERIC kernel we already consume over half of the
available boot stack space, see the review for an example.

Reviewed by:	kib
Reported by:	Jenkins
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D41166
2023-07-24 18:49:36 -04:00
Dmitry Chagin
d9c2dc6bf1 linux(4): Regen for xattr syscalls
MFC after:		1 month
2023-07-22 14:03:32 +03:00
Dmitry Chagin
41f2c69ee3 linux(4): Modify xattr syscalls to match Linux
MFC after:		1 month
2023-07-22 14:03:31 +03:00
Kristof Provost
208fcb55e3 Fix MINIMAL build on amd64
amd64/include/counter.h uses KASSERT, but failed to include the
kassert.h header.
2023-07-14 09:18:43 +02:00
Doug Moore
3e04ae433f vm_radix_init: use initializer
Several vm_radix tries are not initialized with vm_radix_init. That
works, for now, since static initialization zeroes the root field
anyway, but if initialization changes, these tries will fail. Add
missing initializer calls.

Reviewed by:	alc, kib, markj
Differential Revision:	https://reviews.freebsd.org/D40971
2023-07-14 01:49:55 -05:00
Yufeng Zhou
294c52d969 amd64 pmap: Fix compilation when superpage reservations are disabled
The function pmap_pde_ept_executable() should not be conditionally
compiled based on VM_NRESERVLEVEL. It is required indirectly by
pmap_enter(..., psind=1) even when reservation-based allocation is
disabled at compile time.

Reviewed by:	alc
MFC after:	1 week
2023-07-12 12:07:42 -05:00
Gleb Smirnoff
0d1ff2b04d vmm: don't leak locks exiting vmmdev_ioctl()
At least an error from vcpu_lock_all() at line 553 would leak
memseg lock.  There might be other cases as well.

Reviewed by:		corvink, markj
Differential Revision:	https://reviews.freebsd.org/D40981
2023-07-12 09:16:40 -07:00
Gleb Smirnoff
30f0328a32 vmm: don't return random error from vcpu_lock_all() if vcpu is empty
When vcpu array is empty, function would return random value from
stack.  What I observed was -1.

Reviewed by:		corvink, markj
Differential Revision:	https://reviews.freebsd.org/D40980
2023-07-12 09:16:40 -07:00
John Baldwin
2596008a0b amd64 pcpu.h: Add missing 'do' from do-while loop around __PCPU_SET.
Reported by:	mjg
Diagnosed by:	jrtc27
2023-07-08 12:59:26 -07:00
John Baldwin
2329393c61 amd64: Use __seg_gs to implement per-CPU data accesses.
This makes use of the alternate address space support in both GCC and
clang to access per-CPU data as accesses relative to GS:.  The
original motivation for this is that it quiets verbose warnings from
GCC 12.  However, this version is also much easier to read and
allows the compiler to generate better code (e.g. the compiler can
use a GS: memory operand directly in other instructions such as IMUL
and CMP rather than always MOVing to a temporary register).

The one caveat is that the current approach is very inefficient at -O0
since the compiler expects to load the 0 base offset from a global
variable instead of assuming it is 0 (even with the const).

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D40647
2023-07-07 13:06:55 -07:00
Mitchell Horne
a89262079e Consistently provide ffs/fls using builtins
Use of compiler builtin ffs/ctz functions will result in optimized
instruction sequences when possible, and fall back to calling a function
provided by the compiler run-time library. We have slowly shifted our
platforms to take advantage of these builtins in 60645781d6 (arm64),
1c76d3a9fb (arm), 9e319462a0 (powerpc, partial).

Some platforms still rely on the libkern implementations of these
functions provided by libkern, namely riscv, powerpc (ffs*, flsll), and
i386 (ffsll and flsll). These routines are slow, as they perform a
linear search for the bit in question. Even on platforms lacking
dedicated bit-search instructions, such as riscv, the compiler library
will provide better-optimized routines, e.g. by using binary search.

Consolidate all definitions of these functions (whether currently using
builtins or not) to libkern.h. This should result in equivalent or
better performing routines in all cases.

One wart in all of this is the existing HAVE_INLINE_F*** macros, which
we use in a few places to conditionally avoid the slow libkern routines.
These aren't easily removed in one commit. For now, provide these
defines unconditionally, but marked for removal after subsequent
cleanup.

Removal of the now unused libkern routines will follow in the next
commit.

Reviewed by:	dougm, imp (previous version)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D40698
2023-07-06 14:46:41 -03:00
Mark O'Donovan
b0d3d44dfe qlnxe: add driver to amd64 NOTES
Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/779
2023-07-01 11:06:59 -06:00
Alan Cox
0d2f98c2f0 amd64 pmap: Tidy up pmap_promote_pde() calls
Since pmap_ps_enabled() is true by default, check it inside of
pmap_promote_pde() instead of at every call site.

Modify pmap_promote_pde() to return true if the promotion succeeded and
false otherwise.  Use this return value in a couple places.

Reviewed by:	kib, markj
Differential Revision:	https://reviews.freebsd.org/D40744
2023-06-24 13:09:04 -05:00
Alan Cox
34eeabff5a amd64/arm64 pmap: Stop requiring the accessed bit for superpage promotion
Stop requiring all of the PTEs to have the accessed bit set for superpage
promotion to occur.  Given that change, add support for promotion to
pmap_enter_quick(), which does not set the accessed bit in the PTE that
it creates.

Since the final mapping within a superpage-aligned and sized region of a
memory-mapped file is typically created by a call to pmap_enter_quick(),
we now achieve promotions in circumstances where they did not occur
before, for example, the X server's read-only mapping of libLLVM-15.so.

See also https://www.usenix.org/system/files/atc20-zhu-weixi_0.pdf

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D40478
2023-06-12 13:40:57 -05:00
Warner Losh
9121945d70 Regenerate sysent stuff after $FreeBSD$ removal
Sponsored by:		Netflix
2023-06-09 07:28:27 -06:00
Dmitry Chagin
cbbac56091 linux(4): Preserve fpu xsave state across signal delivery on amd64
PR:			270247
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D40444
MFC after:		2 weeks
2023-06-09 01:33:26 +03:00
Dmitry Chagin
920184ed6e linux(4): In preparation for xsave refactor fxsave code on amd64
Due to fxsave area is os independent reimplement fxsave handmade code
using copying of a whole area.

Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D40443
MFC after:		2 weeks
2023-06-09 01:32:46 +03:00
Dmitry Chagin
84617f6fcc linux(4) rt_sendsig: Remove the use of caddr_t
Replace caddr_t by more appropriate char *.

MFC after:		2 weeks
2023-06-06 23:01:39 +03:00
Colin Percival
9d6ae1e3c2 Revert "Revert "tslog: Annotate some early boot functions""
Now that <sys/tslog.h> is wrapped in #ifdef _KERNEL, it's safe to have
tslog annotations in files which might be built from userland (i.e. in
subr_boot.c, which is built as part of the boot loader).

This reverts commit 59588a546f.
2023-06-04 22:49:38 -07:00
Xin LI
4d779448ad gve: Fix build on i386 and enable LINT builds.
Reviewed-by:	imp
Differential Revision: https://reviews.freebsd.org/D40419
2023-06-04 16:35:00 -07:00
Colin Percival
59588a546f Revert "tslog: Annotate some early boot functions"
The change to subr_boot.c broke the libsa build because the TSLOG
macros have their own definitions for the boot loader -- I didn't
realize that the loader code used subr_boot.c.

I'm currently testing a fix and I'll revert this revert once I'm
satisfied that everything works, but I don't want to leave the
tree broken for too long.

This reverts commit 469cfa3c30.
2023-06-04 11:39:45 -07:00
Colin Percival
45cc8519f5 tslog: Annotate parts of SYSINIT cpu
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
SYSINIT cpu takes roughly 2770 us:
* 2280 us in vm_ksubmap_init
  * 535 us in kmem_malloc
    * 450 us in pmap_zero_page
  * 1720 us in pmap_growkernel
    * 1620 us in pmap_zero_page
* 80 us in bufinit
* 480 us in cpu_setregs
  * 430 us in cpu_setregs calling load_cr0

Much of this is hypervisor overhead: load_cr0 is slow because it traps
to the hypervisor, and 99% of the time in pmap_zero_page is spent when
we first touch the page, presumably due to the host Linux kernel
faulting in backing pages one by one.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D40327
2023-06-04 10:16:35 -07:00
Colin Percival
2404380aac tslog: Optionally instrument pmap_zero_page
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
pmap_zero_page is responsible for 4.6 ms of the 25.0 ms of boot time.
This is not in fact time spent zeroing pages though; almost all of
that time is spent in a first-touch penalty, presumably due to the
host Linux kernel faulting in backing pages one by one.

There's probably a way to improve that by teaching Firecracker to
fault in all the VM's pages from the start rather than having them
faulted in one at a time, but that's outside of FreeBSD's control.

This commit adds a TSLOG_PAGEZERO option which enables TSLOG on the
amd64 pmap_zero_page function; it's a separate option (turned off
by default even if TSLOG is enabled) since zeroing pages happens
enough that it can easily fill the TSLOG buffer and prevent other
timing information from being recorded.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D40326
2023-06-04 10:16:31 -07:00
Colin Percival
469cfa3c30 tslog: Annotate some early boot functions
Booting an amd64 kernel on Firecracker with 1 CPU and 128 MB of RAM,
hammer_time takes roughly 2740 us:
* 55 us in xen_pvh_parse_preload_data
  * 20 us in boot_parse_cmdline_delim
  * 20 us in boot_env_to_howto
* 15 us in identify_hypervisor
* 1320 us in link_elf_reloc
  * 1310 us in relocate_file1 handling ef->rela
* 25 us in init_param1
* 30 us in dpcpu_init
* 355 us in initializecpu
  * 255 us in initializecpu calling load_cr4
* 425 us in getmemsize
  * 280 us in pmap_bootstrap
    * 205 us in create_pagetables
* 10 us in init_param2
* 25 us in pci_early_quirks
* 60 us in cninit
* 90 us in kdb_init
* 105 us in msgbufinit
* 20 us in fpuinit
* 205 us elsewhere in hammer_time

Some of these are unavoidable (e.g. identify_hypervisor uses CPUID and
load_cr4 loads the CR4 register, both of which trap to the hypervisor)
but others may deserve attention.

Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D40325
2023-06-04 10:16:22 -07:00
Mark Johnston
18282c4772 sysarch: Add includes required for ktrcapfail() calls to be compiled
Reported by:	jfree
MFC after:	1 week
2023-06-01 17:18:23 -04:00
Mateusz Guzik
6217c2473d amd64: zero-pad register dumps on panic
de gustibus and so on

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-05-30 13:15:56 +00:00
Dmitry Chagin
eb98f77910 linux(4): Regen for linux_execve
MFC after:		2 month
2023-05-29 12:18:30 +03:00
Dmitry Chagin
8340b03425 linux(4): Add a dedicated linux_exec_copyin_args()
Because Linux allows to exec binaries with 0 argc.

Reviewed by:		brooks
Differential Revision:	https://reviews.freebsd.org/D40148
MFC after:		2 month
2023-05-29 12:18:16 +03:00
Dmitry Chagin
d706d02edb sysentvec: Retire sv_imgact_try as unneeded anymore
The sysentvec sv_imgact_try was used by kern_exec() to allow
non-native ABI to fixup shell path according to ABI root directory.
Since the non-native ABI can now specify its root directory directly
to namei() via pwd_altroot() call this facility is not needed anymore.

Differential Revision:	https://reviews.freebsd.org/D40092
MFC after:		2 month
2023-05-29 11:18:11 +03:00
Dmitry Chagin
57578deac7 Brandinfo: Retire emul_path as unneeded anymore
The Barndinfo emul_path was used by the Elf image activator to fixup
interpreter file name according to ABI root directory. Since the
non-native ABI can now specify its root directory directly to namei()
via pwd_altroot() call this facility is not needed anymore.

Differential Revision:	https://reviews.freebsd.org/D40091
MFC after:		2 month
2023-05-29 11:17:28 +03:00
Dmitry Chagin
fd745e1db6 linux(4): Use pwd_altroot() to tell namei() about ABI root path
PR:			72920
Differential Revision:	https://reviews.freebsd.org/D40090
MFC after:		2 month
2023-05-29 11:16:46 +03:00
Dmitry Chagin
78c2e58fa5 linux(4): Fix stack unwinding across signal frame on x86_64
Get rid of using register numbers which is undefined in libunwind
on x86_64.

Differential Revision:	https://reviews.freebsd.org/D40156
MFC after:		1 month
2023-05-28 17:07:28 +03:00
Dmitry Chagin
037b60fb0f linux(4): Preserve %rcx (return address) like a Linux do
Perhaps, this does not makes much sense as destroyng %rcx declared by
the x86_64 Linux syscall ABI. However,:
a) if we get a signal while we are in the kernel, we should restore
   tf_rcx when preparing machine context for signal handlers.
b) the Linux world is strange, someone can depend on %rcx value
   after syscall, something like go.

Differential Revision:	https://reviews.freebsd.org/D40155
MFC after:		1 month
2023-05-28 17:06:47 +03:00
Dmitry Chagin
185bd9fa30 linux(4): Simplify %r10 restoring on amd64
Restore %r10 at system call entry to avoid doing this multiply times.

Differential Revision:	https://reviews.freebsd.org/D40154
MFC after:		1 month
2023-05-28 17:06:23 +03:00
Dmitry Chagin
a463dd8108 linux(4): Add a comment explaining registers at syscall entry point on amd64
Differential Revision:	https://reviews.freebsd.org/D40153
MFC after:		1 month
2023-05-28 17:06:05 +03:00
Dmitry Chagin
a99b890ecd linux(4): Drop a weird comment from linux_set_syscall_retval on amd64
I agree, it would be great to avoid PCB_FULL_IRET, however we should
follow Linux system call ABI.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D40152
MFC after:		1 month
2023-05-28 17:05:44 +03:00
Mark Johnston
9fb6718d1b smp: Dynamically allocate the stoppcbs array
This avoids bloating the kernel image when MAXCPU is large.

A follow-up patch for kgdb and other kernel debuggers is needed since
the stoppcbs symbol is now a pointer.  Bump __FreeBSD_version so that
debuggers can use osreldate to figure out how to handle stoppcbs.

PR:		269572
MFC after:	never
Reviewed by:	mjg, emaste
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39806
2023-05-25 18:09:55 -04:00
Mark Johnston
e17eca3276 vmm: Avoid embedding cpuset_t ioctl ABIs
Commit 0bda8d3e9f ("vmm: permit some IPIs to be handled by userspace")
embedded cpuset_t into the vmm(4) ioctl ABI.  This was a mistake since
we otherwise have some leeway to change the cpuset_t for the whole
system, but we want to keep the vmm ioctl ABI stable.

Rework IPI reporting to avoid this problem.  Along the way, make VM_RUN
a bit more efficient:
- Split vmexit metadata out of the main VM_RUN structure.  This data is
  only written by the kernel.
- Have userspace pass a cpuset_t pointer and cpusetsize in the VM_RUN
  structure, as is done for cpuset syscalls.
- Have the destination CPU mask for VM_EXITCODE_IPIs live outside the
  vmexit info structure, and make VM_RUN copy it out separately.  Zero
  out any extra bytes in the CPU mask, like cpuset syscalls do.
- Modify the vmexit handler prototype to take a full VM_RUN structure.

PR:		271330
Reviewed by:	corvink, jhb (previous versions)
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D40113
2023-05-23 21:15:59 -04:00
Dmitry Chagin
1d76741520 linux(4): Implement ptrace_pokeusr for x86_64
Differential Revision:	https://reviews.freebsd.org/D40097
MFC after:		1 week
2023-05-18 20:02:35 +03:00
Dmitry Chagin
3d0addcd35 linux(4): Make ptrace_pokeusr machine dependent
Differential Revision:	https://reviews.freebsd.org/D40096
MFC after:		1 week
2023-05-18 20:01:12 +03:00
Dmitry Chagin
dd2a6cd701 linux(4): Make ptrace_peekusr machine dependend
And partially implement it for x86_64.

Differential Revision:	https://reviews.freebsd.org/D40095
MFC after:		1 week
2023-05-18 20:00:12 +03:00
Piotr Pawel Stefaniak
411942a70e GENERIC: remove a stray space character 2023-05-13 21:31:49 +02:00
Warner Losh
4d846d260e spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with:		pfg
MFC After:		3 days
Sponsored by:		Netflix
2023-05-12 10:44:03 -06:00
Bjoern A. Zeeb
721b44ba5f amd64: pmap.h put a guard around a pcpu.h function
pmap_get_pcid() calls zpcpu_get() which is defined in pcpu.h.
It is unclear why we do not include that header but like right
above the change add another guard around pmap_get_pcid().
This allows some LinuxKPI headers to compile again.

Suggested by:	markj
MFC after:	10 days
2023-05-12 11:14:54 +00:00
Warner Losh
062a7b918f twe: Remove driver
Sponsored by:		Netflix
2023-05-10 22:24:12 -06:00
Konstantin Belousov
bf864c3ed5 amd64 MINIMAL: SysV IPC syscalls are loadable
Reviewed by:	emaste, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39990
2023-05-09 18:30:07 +03:00
Konstantin Belousov
0c1c5e36eb amd64 MINIMAL: remove UFS from compiled-in list
Reviewed by:	emaste, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39990
2023-05-09 18:30:07 +03:00
Konstantin Belousov
bba6249ae9 amd64 MINIMAL config: remove statements about UFS module
All UFS options work for ufs.ko.

Reviewed by:	emaste, imp
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39990
2023-05-09 18:30:07 +03:00
Vitaliy Gusev
c543e09f1f
bhyve: save/restore pir_desc
Failing to preserve pir_desc can result in pending interrupts being lost
on resume leading to a hung VM.

Reviewed by:		corvink, jhb
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D35447
2023-05-09 10:31:27 +02:00
Bojan Novković
fefac54359
bhyve: fix vCPU single-stepping on VMX
This patch fixes virtual machine single stepping on VMX hosts.

Currently, when using bhyve's gdb stub, each attempt at single-stepping
a vCPU lands in a timer interrupt. The current single-stepping mechanism
uses the Monitor Trap Flag feature to cause VMEXIT after a single
instruction is executed. Unfortunately, the SDM states that MTF causes
VMEXITs for the next instruction that gets executed, which is often not
what the person using the debugger expects. [1]

This patch adds a new VM capability that masks interrupts on a vCPU by
blocking interrupt injection and modifies the gdb stub to use the newly
added capability while single-stepping a vCPU.

[1] Intel SDM 26.5.2 Vol. 3C

Reviewed by:		corvink, jbh
MFC after:		1 week
Differential Revision:	https://reviews.freebsd.org/D39949
2023-05-09 10:04:55 +02:00
Mitchell Horne
aba91805aa hwpmc: use kstack_contains()
This existing helper function is preferable to the hand-rolled
calculation of the kstack bounds.

Make some small style improvements while here. Notably, rename every
instance of "r", the return address, to "ra". Tidy the includes in the
affected files.

Reviewed by:	jkoshy
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39909
2023-05-06 14:49:19 -03:00
Konstantin Belousov
38843fe0f2 amd64: add MINIMALUP config
This is the MINIMAL config with SMP/NUMA options turned off.
Useful to ensure that UP configuration still builds, until it is removed
finally.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-05-06 14:24:07 +03:00
Konstantin Belousov
3a8c69c1ff amd64 MINIMAL config: remove sentence about acpi
On amd64 ACPI is required to boot, it cannot work as a module, and we do
not build the ACPI module for long time.

Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-05-06 14:24:07 +03:00
Konstantin Belousov
7c8e66ed8d amd64: convert UP code to dynamically allocated pmap->pm_pcid
Reported by:	peterj
Sponsored by:	The FreeBSD Foundation
2023-05-06 14:24:07 +03:00
Corvin Köhne
b10e100d16
vmm: don't free unallocated memory
If vmx or svm is disabled in BIOS or the device isn't supported by vmm,
modinit won't allocate these state save areas. As kmem_free panics when
passing a NULL pointer to it, loading the vmm kernel module causes a
panic too.

PR:			271251
Reviewed by:		markj
Fixes:			74ac712f72 ("vmm: Dynamically allocate a couple of per-CPU state save areas")
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D39974
2023-05-05 15:34:00 +02:00
Igor Ostapenko
0167b5a793 sys/amd64/conf/FIRECRACKER: typo (compatiblity)
https://bugs.freebsd.org/269753

PR:                      269753
Reported by:             Igor Ostapenko
Approved by:             doc, src (delphij, imp, zlei)
Differential revision:   https://reviews.freebsd.org/D38741
2023-05-05 01:23:08 +01:00
John Baldwin
4961faaacc pmap_{un}map_io_transient: Use bool instead of boolean_t.
Reviewed by:	imp, kib
Differential Revision:	https://reviews.freebsd.org/D39920
2023-05-04 12:29:48 -07:00
John Baldwin
407f675718 imgact_elf: Change header_supported to return bool instead of boolean_t.
Reviewed by:	imp, kib, emaste
Differential Revision:	https://reviews.freebsd.org/D39919
2023-05-04 12:29:29 -07:00
Konstantin Belousov
3582acbad3 amd64 mp_machdep.c: remove useless comment
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39945
2023-05-04 18:39:22 +03:00
Konstantin Belousov
af1c6d3f30 amd64: do not leak pcpu pages
Do not preallocate pcpu area backing pages on early startup, only
allocate enough of KVA for pcpu[MAXCPU] and the page for BSP.  Other
pages are allocated after we know the number of cpus and their
assignments to the domains.

PCPUs are not accessed until they are initialized, which happens on AP
startup.

Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39945
2023-05-04 18:39:22 +03:00
Konstantin Belousov
e704f88f3d amd64: initialize APs kpmap_store in init_secondary()
The APs pcpu area is zeroed in init_secondary() by pcpu_init(), so the
early initialization in pmap_bootstrap() is nop.

Fixes:	42f722e721cd010ae5759a4b0d3b7b93c2b9cad2ESC
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39945
2023-05-04 18:39:22 +03:00
Konstantin Belousov
42f722e721 amd64: store pcids pmap data in pcpu zone
This change eliminates the struct pmap_pcid array embedded into struct
pmap and sized by MAXCPU, which would bloat with MAXCPU increase.  Also
it removes false sharing of cache lines, since the array elements are
mostly locally accessed by corresponding CPUs.

Suggested by:	mjg
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:47 +03:00
Konstantin Belousov
9c8cbf3819 amd64 pmap_pcid_alloc(): pass a pointer to struct pmap_pcid instead of cpuid
Cpuid is used to index the pmap->pm_pcids array only.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:40 +03:00
Konstantin Belousov
9e0143694a amd64: add pmap_get_pcid() helper
Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:35 +03:00
Konstantin Belousov
86b61ccb34 amd64 pmap: add pmap_pinit_pcids() helper
to initialize pm_pcids array for a new user pmap

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:29 +03:00
Konstantin Belousov
32bb28d8ad amd64: move definition of the struct pmap_pcids into _pmap.h
and rename the structure to pmap_pcid.

Reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
Differential revision:	https://reviews.freebsd.org/D39890
2023-05-02 14:32:20 +03:00
Dmitry Chagin
80d8a4a003 linux(4): Make struct stat64 to match Linux actual one 2023-04-28 11:55:04 +03:00
Dmitry Chagin
cd0fca82bb linux(4): Regen for mknod syscall changes 2023-04-28 11:55:04 +03:00
Dmitry Chagin
ca3333dd4a linux(4): Use Linux dev_t type for mknod syscalls dev argument
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit unsigned integer
on all platforms. Prior the 2.6 kernel dev_t type was an unsigned short.
However, since the firs commit of the Linuxulator, mknod syscall get int dev
argument.
Also, there is some confusion here, while the kernel declares a dev_t type
as a 32-bit sized, the user-space dev_t type can be size of 64 bits, e.g.,
in the Glibc library.
To avoid confusion and to help porting of the Linuxulator to other platforms
use explicit l_dev_t for dev argument of mknod syscalls.
2023-04-28 11:55:02 +03:00
Dmitry Chagin
19973638be linux(4): Move dev_t type declaration under /compat/linux
As of version 2.6.0 of the Linux kernel, dev_t is a 32-bit unsigned integer
on all platforms. Move it into the MI linux.h under /compat/linux.
2023-04-28 11:55:02 +03:00
Dmitry Chagin
e0bfe0d62c linux(4): Make struct newstat to match actual Linux one
In the struct stat the st_dev, st_rdev are unsigned long.
2023-04-28 11:55:01 +03:00
Dmitry Chagin
023e688496 linux(4): Regen for struct l_old_stat changes 2023-04-28 11:55:01 +03:00
Dmitry Chagin
2370c7321f linux(4): Update syscalls.master to reflect struct l_old_stat 2023-04-28 11:54:59 +03:00
Dmitry Chagin
391fd1e1a1 linux(4): Mark old fstat syscal as unimplemented
It looks like the old fstat system call never been implemented.
2023-04-28 11:54:59 +03:00
Dmitry Chagin
a408fc097f linux(4): Rename obsolete old struct l_stat to struct l_old_stat 2023-04-28 11:54:59 +03:00
Mark Johnston
74ac712f72 vmm: Dynamically allocate a couple of per-CPU state save areas
This avoids bloating the BSS when MAXCPU is large.

No functional change intended.

PR:		269572
Reviewed by:	corvink, rew
Tested by:	rew
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D39805
2023-04-26 10:08:42 -04:00
Vitaliy Gusev
0912408a28
vmm: fix HLT loop while vcpu has requested virtual interrupts
This fixes the detection of pending interrupts when pirval is 0 and the
pending bit is set

More information how this situation occurs, can be found here:
c5b5f2d808/sys/amd64/vmm/intel/vmx.c (L4016-L4031)

Reviewed by:		corvink, markj
Fixes:			02cc877968bbcd57695035c67114a67427f54549 ("Recognize a pending virtual interrupt while emulating the halt instruction.")
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D39620
2023-04-26 10:38:46 +02:00
Mateusz Guzik
95e4f5ef7c x86: whack pmspcv from GENERIC
The driver is enormous and rarely used.

      text      data       bss        dec         hex   filename
  23076646   1870505   4415872   29363023   0x1c00b4f   kernel.before
  20017433   1870305   4416000   26303738   0x1915cfa   kernel.after

People using the driver will need to add pmspcv_load="YES" to
their loader.conf.

Reviewed by:	jhb
Relnotes:	yes
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D39816
2023-04-25 18:09:44 +00:00
Mark Johnston
47cf1b37f4 vmm: Expose some more AVX512 CPUID bits to guests
This is required to announce support for some accelerated AES
operations.  AVX512BW indicates support for the AVX512-FP16 extension
and AVX512VL indicates support for the use of AVX512 instructions with
vector lengths smaller than 512 bits.

VAES and VPCLMULQDQ extensions indicate that VEX-prefixed AES-NI and
pclmulqdq instructions are supported.

All of these bits are needed for OpenSSL to use VAES to accelerate
AES-GCM transforms.

Reviewed by:	corvink, kib, jhb
MFC after:	2 weeks
Sponsored by:	Stormshield
Sponsored by:	Klara, Inc.
Differential Revision:	https://reviews.freebsd.org/D39781
2023-04-25 13:35:14 -04:00
Dmitry Chagin
56c5230afd linux(4): Fix LINUX_AT_COUNT comments
Differential Revision:	https://reviews.freebsd.org/D39645
MFC after:		1 month
2023-04-22 22:16:43 +03:00
Dmitry Chagin
7d8c983983 linux(4): Deduplicate linux_copyout_auxargs()
Export default MINSIGSTKSZ value for the x86 until we do not preserve AVX
registers in the signal context.

Differential Revision:	https://reviews.freebsd.org/D39644
MFC after:		1 month
2023-04-22 22:16:02 +03:00
Warner Losh
559b94a122 syscall.master: Fix comments
Have more accruate comments. While #if, #else, etc are copied to the
header files, lines that don't start with # are not.  And #include files
are only output to sysinc (which winds up at the front of init_sysent.c
which seems a bit odd). This is all radically undocumented, and likely
has drifted somewhat from 4.4BSD and what other systems do (they've
drifted too, fwiw).

Sponsored by:		Netflix
2023-04-20 16:18:02 -06:00
Dmitry Chagin
de4da6cd04 x86: Move i386 timerreg.h to x86
Reviewed by:		emaste, jhb
Differential Revision:	https://reviews.freebsd.org/D39656
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Dmitry Chagin
d1f4c44aa8 x86: Move i386 ppireg.h to x86
Differential Revision:	https://reviews.freebsd.org/D39655
MFC after:		1 month
2023-04-20 19:42:59 +03:00
Konstantin Belousov
617a11eab6 x86: initialize use_xsave once
The explanation from https://reviews.freebsd.org/D39637 by stevek:
The "use_xsave" variable is a global and that is only supposed to be
initialized early before scheduling gets started. However, with the way
the ifuncs for "fpusave" and "fpurestore" are implemented, the value
could be changed at runtime when scheduling is active if "use_xsave"
was set to 0 by the tunable. This leaves a window of opportunity where
"use_xsave" gets re-initialized to 1 and a context switch could occur
with a thread that was not set up to be able to use xsave functionality.
This can lead to an "privileged instruction fault".

The fix is to protect "use_xsave" from being initialized more than once.

Reported and reviewed by:	stevek
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39660
2023-04-19 02:22:28 +03:00
Konstantin Belousov
1e0e335b0f amd64: fix PKRU and swapout interaction
When vm_map_remove() is called from vm_swapout_map_deactivate_pages()
due to swapout, PKRU attributes for the removed range must be kept
intact.  Provide a variant of pmap_remove(), pmap_map_delete(), to
allow pmap to distinguish between real removes of the UVA mappings
and any other internal removes, e.g. swapout.

For non-amd64, pmap_map_delete() is stubbed by define to pmap_remove().

Reported by:	andrew
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39556
2023-04-15 02:53:59 +03:00
Julien Grall
ab7ce14b1d xen/intr: introduce dev/xen/bus/intr-internal.h
Move the xenisrc structure which needs to be shared between the core Xen
interrupt code and architecture-dependent code into a separate header.  A
similar situation exists for the NR_EVENT_CHANNELS constant.

Turn xi_intsrc into a type definition named xi_arch to reflect the new
purpose of being an architectural variable for the interrupt source.

This was originally implemented by Julien Grall, but has been heavily
modified.  The core side was renamed "intr-internal.h" and is #include'd
by "arch-intr.h" instead of the other way around.  This allows the
architecture to add function definitions which use struct xenisrc.

The original version only moved xi_intsrc into xen_arch_isrc_t.  Moving
xi_vector was done by the submitter.

The submitter had also moved xi_activehi and xi_edgetrigger into
xen_arch_isrc_t.  Those disappeared with the removal of PVHv1 support.

Copyright note.  The current xenisrc structure was introduced at
76acc41fb7 by Justin T. Gibbs.  Traces remain, but the strength of
Copyright claims from before 2013 seem pretty weak.

Reviewed by: royger
Submitted by: Elliott Mitchell <ehem+freebsd@m5p.com>, 2021-03-17 19:09:01
Original implementation: Julien Grall <julien@xen.org>, 2015-10-20 09:14:56
Differential Revision: https://reviews.freebsd.org/D30648
[royger]
 - Adjust some line lengths
 - Fix comment about NR_EVENT_CHANNELS after movement.
 - Use #include instead of symlinks.
2023-04-14 15:58:53 +02:00
Elliott Mitchell
af610cabf1 xen/intr: adjust xen_intr_handle_upcall() to match driver filter
xen_intr_handle_upcall() has two interfaces.  It needs to be called by
the x86 assembly code invoked by the APIC.  Second, it needs to be called
as a driver_filter_t for the XenPCI code and for architectures besides
x86.

Unfortunately the driver_filter_t interface was implemented as a wrapper
around the x86-APIC interface.  Now create a simple wrapper for the
x86-APIC code, which calls an architecture-independent
xen_intr_handle_upcall().

When called via intr_event_handle(), driver_filter_t functions expect
preemption to be disabled.  This removes the need for
critical_enter()/critical_exit() when called this way.

The lapic_eoi() call is only needed on x86 in some cases when invoked
directly as an APIC vector handler.

Additionally driver_filter_t functions have no need to handle interrupt
counters.  The intrcnt_add() calling function was reworked to match the
current situation.  intrcnt_add() is now only called via one path.

The increment/decrement of curthread->td_intr_nesting_level had
previously been left out.  Appears this was mostly harmless, but this
was noticed during implementation and has been added.

CONFIG_X86 is a leftover from use with Linux.  While the barrier isn't
needed for FreeBSD on x86, it will be needed for FreeBSD on other
architectures.

Copyright note.  xen_intr_intrcnt_add() was introduced at 76acc41fb7
by Justin T. Gibbs.  xen_intrcnt_init() was introduced at fd036deac1
by John Baldwin.

sys/x86/xen/xen_arch_intr.c was originally created by Julien Grall in
2015 for the purpose of holding the x86 interrupt interface.  Later it
was found xen_intr_handle_upcall() was better earlier, and the x86
interrupt interface better later.  As such the filename and header list
belong to Julien Grall, but what those were created for is later.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D30006
2023-04-14 15:58:52 +02:00
Elliott Mitchell
ecdcad6516 xen: remove CONFIG_XEN_COMPAT, purge Xen 3.0 compatibility
This overlaps the purpose of __XEN_INTERFACE_VERSION__.  Remove Xen 3.0.2
compatibility.  __XEN_INTERFACE_VERSION__ has compatibility to Xen 3.2.8
enabled.  As Xen 3.3 was released almost 15 years ago, it seems unlikely
anyone hasn't updated.

Reviewed by: royger
2023-04-14 15:58:48 +02:00
Elliott Mitchell
b2c50bb934 xen/efi: make Xen PV EFI clock optional
The present implementation is only for x86.  Other architectures need
adjustments for querying presence of EFI.

Xen's EFI support is also quite troublesome on non-x86.  This is being
slowly remedied, but until in better shape the EFI clock functionality
should be disabled.

Reviewed by: royger
Differential Revision: https://reviews.freebsd.org/D31065
2023-04-14 15:58:47 +02:00
Henri Hennebert
71883128e5 rtsx: Add plug-and-play info
Add MODULE_PNP_INFO() to the driver to make it autoload if not linked
statically into the kernel. Remove the device from amd64/i386 GENERIC.

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D35074
2023-04-13 11:12:50 -03:00
Dmitry Chagin
50111714f5 linux(4): Regen for close_range syscall
MFC after:		2 weeks
2023-04-04 23:23:37 +03:00
Dmitry Chagin
1c27dce1f8 linux(4): Modify close_range syscall to match Linux
MFC after:		2 weeks
2023-04-04 23:23:24 +03:00
Alexander V. Chernikov
3091d980f5 netlink: add NETLINK to the DEFAULTS for each architecture
NETLINK is going to replace rtsock and a number of other ioctl/sysctl interfaces.
In-base utilies such as route(8), netstat(8) and soon ifconfig(8)
 are being converted to use netlink sockets as a transport between
 kernel and userland.
In the current configuration, it still possible have the kernel
 without NETLINK (`nooptions NETLINK`) and use the aforementioned
 utilies by buidling the world with `WITHOUT_NETLINK` src.conf knob.
However, this approach does not cover the cases when person unintentionally
 builds a custom kernel without netlink and tries to use the standard userland.

This change adds `option NETLINK` to the default options for each
 architecture, fixing the custom kernel issue.
For arm, this change uses `std.armv6` and `std.armv7` (netlink already in)
 instead of DEFAULTS.

Reviewed By: imp
Differential Revision: https://reviews.freebsd.org/D39339
2023-04-02 15:27:21 +00:00
Konstantin Belousov
cd137909c3 amd64 wakeup: recalculate mitigations after APICs are woken
APICs are needed to broadcast IPIs for MSR writes.

PR:	270489
Reviewed by:	dchagin, emaste, jhb
Tested by:	dchagin, manu
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39302
2023-03-29 21:45:20 +03:00
Elliott Mitchell
9f3be3a6ec xen: switch to using core atomics for synchronization
Now that the atomic macros are always genuinely atomic on x86, they can
be used for synchronization with Xen.  A single core VM isn't too
unusual, but actual single core hardware is uncommon.

Replace an open-coding of evtchn_clear_port() with the inline.

Substantially inspired by work done by Julien Grall <julien@xen.org>,
2014-01-13 17:40:58.

Reviewed by: royger
MFC after: 1 week
2023-03-29 09:51:42 +02:00
John Baldwin
0f735657aa bhyve: Remove vmctx member from struct vm_snapshot_meta.
This is a userland-only pointer that isn't relevant to the kernel and
doesn't belong in the ioctl structure shared between userland and the
kernel.  For the kernel, the old structure for the ioctl is still
supported under COMPAT_FREEBSD13.

This changes vm_snapshot_req() in libvmmapi to accept an explicit
vmctx argument.

It also changes vm_snapshot_guest2host_addr to take an explicit vmctx
argument.  As part of this change, move the declaration for this
function and its wrapper macro from vmm_snapshot.h to snapshot.h as it
is a userland-only API.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38125
2023-03-24 11:49:06 -07:00
John Baldwin
7d9ef309bd libvmmapi: Add a struct vcpu and use it in most APIs.
This replaces the 'struct vm, int vcpuid' tuple passed to most API
calls and is similar to the changes recently made in vmm(4) in the
kernel.

struct vcpu is an opaque type managed by libvmmapi.  For now it stores
a pointer to the VM context and an integer id.

As an immediate effect this removes the divergence between the kernel
and userland for the instruction emulation code introduced by the
recent vmm(4) changes.

Since this is a major change to the vmmapi API, bump VMMAPI_VERSION to
0x200 (2.0) and the shared library major version.

While here (and since the major version is bumped), remove unused
vcpu argument from vm_setup_pptdev_msi*().

Add new functions vm_suspend_all_cpus() and vm_resume_all_cpus() for
use by the debug server.  The underyling ioctl (which uses a vcpuid of
-1) remains unchanged, but the userlevel API now uses separate
functions for global CPU suspend/resume.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D38124
2023-03-24 11:49:06 -07:00
Konstantin Belousov
2b4b3789f8 acpi_wakeup.c: apply the reviewer' editorial corrections to the comment text.
Fixes:	02904a06c7
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39146
2023-03-18 17:47:19 +02:00
Konstantin Belousov
02904a06c7 amd64: properly recalculate mitigations knobs after resume
Revision r333125 AKA 986c4ca387 forced clear cpu_stdext_feature3
on suspend, since at that time microcode update was not reloaded
early on resume. Then, revision 050f5a8405 started re-reading
cpu_stdext_feature3 again. Since modern CPUs do not require mitigations
from the Skylake era, this went unnoticed for some time.

Keep zeroing cpu_stdext_feature3 on suspend, but re-read it in more
controlled way on resume after microcode is reloaded, and recalculate
active workarounds based on actual microcode capabilities.

Reported and tested by:	romain
Reviewed by:	emaste, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D39146
2023-03-18 17:40:05 +02:00
Konstantin Belousov
ff6d60946a amd64 acpi_wakeup.c: fix typo
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2023-03-17 15:10:34 +02:00
Vitaliy Gusev
94a3876d7e
vmm: fix missing ipi statistic
ipi counters are missing in bhyvectl's output because vm_maxcpu is 0
when initializing them. That's because vmm_stat_register is executed
before vmm_init.

Instead of directly fixing it, there's a better solution in illumos
which is cherry picked:
65a3bc8373

It replaces the matrix statistic by two counters per vcpu. One for
counting the ipis to the vcpu and one counting the ipis received by the
vcpu. This has several advantages:

- A matrix statistic becomes huge when using many vcpus.
- A matrix statistic easily reaches the MAX_VMM_STAT_ELEMS limit.
- Two counters are enough in most cases. DTrace can be used for more
  advanced debugging purposes.
- A matrix statistic wastes memory. The matrix size is determined by
  vm_maxcpu regardless of the number of vcpus assigned to the vm.

Reviewed by:		corvink, markj
Fixes:			ee98f99d7a ("vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.")
MFC after:		1 week
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D39038
2023-03-17 13:50:08 +01:00
Dmitry Chagin
9e7f03e9c6 linux(4): Drop unncessary struct l_ifconf declaration from amd64/linux
Its needed only for amd64/linux32 Linuxulator.

Differential Revision:	https://reviews.freebsd.org/D38793
2023-03-04 12:11:38 +03:00
Dmitry Chagin
cabbfb60d0 linux(4): Reduce code duplication between MD files
Move struct ifnet definitions under compat/linux.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D38791
2023-03-04 12:11:38 +03:00
John-Mark Gurney
2fee875629
abstract out the vm detection via smbios..
This makes the detection of VMs common between platforms that
have SMBios.

Reviewed by:		imp, kib
Differential Revision:	https://reviews.freebsd.org/D38800
2023-03-02 16:54:21 -08:00
Vitaliy Gusev
8104fc31a2
bhyve: fix restore of kernel structs
vmx_snapshot() and svm_snapshot() do not save any data and error occurs at
resume:

Restoring kernel structs...
vm_restore_kern_struct: Kernel struct size was 0 for: vmx
Failed to restore kernel structs.

Reviewed by:		corvink, markj
Fixes:			39ec056e6d ("vmm: Rework snapshotting of CPU-specific per-vCPU data.")
MFC after:		2 weeks
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D38476
2023-02-28 13:37:53 +01:00
Vitaliy Gusev
281b496f22
vmm: fix restore of TSC offset
After suspend/resume Ubuntu 20.04 and 22.04 installer can hang if
tsc-early clocksource has a big skew.

Reviewed by:		corvink, jhb
Fixes:			a7db532e3a ("vmm: Simplify saving of absolute TSC values in snapshots.")
MFC after:		2 weeks
Sponsored by:		vStack
Differential Revision:	https://reviews.freebsd.org/D38474
2023-02-28 13:37:44 +01:00
Mike Karels
dd6f6030cc amd64 kernel config: clean up whitespace
Most options in kernel config files use "options<space><tab>OPTION".
This allows the option to be commented out without shifting columns.
A few options had two tabs, and some had spaces.  Make them consistent.
2023-02-24 08:36:28 -06:00
Mateusz Guzik
6b9acd1bfb Exclude MMCCAM kernels from make universe
They don't provide any value and are quite arbitrary.

Note arm64 GENERIC-MMCCAM was already excluded, just not the NODEBUG
variant.

The option is already build-tested with arm64 LINT kernel.

Reviewed by:	manu
Differential Revision:	https://reviews.freebsd.org/D38458
2023-02-16 07:29:53 +00:00
Dmitry Chagin
c8a79231a5 linux(4): Rename linux_timer.h to linux_time.h
To avoid confusing people, rename linux_timer.h to linux_time.h,
as linux_timer.c is the implementation of timer syscalls only,
while linux_time.c contains implementation of all stuff declared
in linux_time.h.

MFC after:		2 weeks
2023-02-14 17:46:33 +03:00
Dmitry Chagin
2456a45929 linux(4): Cleanup includes under amd64/linux
Cleanup unneeded includes, sort the rest according to style(9).
No functional changes.

MFC after:		2 weeks
2023-02-14 17:46:32 +03:00
Dmitry Chagin
31e938c531 linux(4): Cleanup vm includes from linux_util.h
Include vm headers directly where they needed. The linux_util.h included
in a most source files of the Linuxulator, avoid collecting a rarely used
includes here.

MFC after:		2 weeks
2023-02-14 17:46:30 +03:00
Dmitry Chagin
06c07e1203 Complete removal of opt_compat.h
Since Linux emulation layer build options was removed there is no reason
to keep opt_compat.h.

Reviewed by:		emaste
Differential Revision:	https://reviews.freebsd.org/D38548
MFC after:		2 weeks
2023-02-13 19:07:38 +03:00
Dmitry Chagin
10d16789a3 linux(4): Get rid of the opt_compat.h include.
Since e013e369 COMPAT_LINUX, COMPAT_LINUX32 build options are removed,
so include of opt_compat.h is no more needed.

MFC after:		2 weeks
2023-02-12 20:24:32 +03:00
Mark Johnston
b265a2e0d7 vmm: Fix AP startup compatibility for old bhyve executables
These changes unbreak AP startup when using a 13.1-RELEASE bhyve
executable with a newer kernel:
- Correct the destination mask for the VM_EXITCODE_IPI message generated
  by an INIT or STARTUP IPI in vlapic_icrlo_write_handler().
- Only initialize vlapics on active vCPUs.  13.1-RELEASE bhyve activates
  AP vCPUs only after the BSP starts them with an IPI, and vmm now
  allocates vcpu structures lazily, so the STARTUP handling in
  vm_handle_ipi() could trigger a page fault.
- Fix an off-by-one setting the vcpuid in a VM_EXITCODE_SPINUP_AP
  message.

Fixes:	7c326ab5bb ("vmm: don't lock a mtx in the icr_low write handler")
Reviewed by:	jhb, corvink
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38446
2023-02-09 16:14:33 -05:00
Mark Johnston
ba34de1b3b vmm: Remove an unneeded initialization of "retu"
vm_handle_ipi() unconditionally initializes "retu".  No functional
change intended.

Reviewed by:	jhb, corvink
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38446
2023-02-09 16:14:33 -05:00
Mark Johnston
f3bbd0e818 vmm: Collapse identical case statements in vlapic_icrlo_write_handler()
No functional change intended.

Reviewed by:	jhb, corvink
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D38446
2023-02-09 16:14:33 -05:00
Dag-Erling Smørgrav
43d4680b39 MINIMAL: Update and clean up.
* Add GEOM_LABEL, required to boot a default UEFI install.

* Add enough of virtio to boot in bhyve.

* Reduce diff between amd64 and i386.

* Reduce diff to GENERIC.

MFC after:	1 week
Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D38468
2023-02-09 18:24:45 +01:00
Konstantin Belousov
ee84487120 amd64 ia32 vdso: always define some __vdso_ symbols
... regardless of the kernel config options.
It is reported that llvm16 ld.lld warns about undefined symbols
referenced by the VERSION script.

Reviewed by:	emaste, val_packett.cool
Discussed with:	jrtc27
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38392
2023-02-09 04:36:40 +02:00
Mateusz Guzik
b2c68dc6d9 amd64: ansify
Reported by:    clang 15
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2023-02-07 22:52:06 +00:00
Mateusz Guzik
819ed47204 amd64 pmap: patch up a comment in pmap_init_pv_table
Requested by:	jhb
2023-02-06 22:33:28 +00:00
Yuri
e4d3f1e40a hv_hid: Hyper-V HID driver
Hyper-V HID driver using hidbus/hms.

Reviewed by:	wulf
MFC after:	1 week
PR:		221074
Differential revision:	https://reviews.freebsd.org/D38140
2023-02-05 18:32:08 +03:00
Elliott Mitchell
d27d543c78 vmm: purge EOL release compatibility
Remove FreeBSD 11 support

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/603
Differential Revision: https://reviews.freebsd.org/D35560
2023-02-04 09:13:10 -07:00
Dmitry Chagin
ce20c00e85 linux(4): Remove stale comment that no longer applies.
MFC after:		1 week
2023-02-02 20:21:37 +03:00
Dmitry Chagin
6ad07a4b2b linux(4): Microoptimize rt_sendsig() on amd64.
Drop proc lock earlier, before copying user stuff.

Pointed out by:		kib
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D38326
MFC after:		1 week
2023-02-02 20:21:37 +03:00
Dmitry Chagin
a95cb95e12 linux(4): Preserve fpu fxsave state across signal delivery on amd64.
PR:			240768
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D38302
MFC after:		1 week
2023-02-02 20:21:37 +03:00
Dmitry Chagin
95b8603427 linux(4): Deduplicate linux_trans_osrel().
MFC after:		1 week
2023-02-02 17:58:07 +03:00
Dmitry Chagin
6039e966ff linux(4): Deduplicate linux_copyout_strings().
It is still present in the 32-bit Linuxulator on amd64.

MFC after:		1 week
2023-02-02 17:58:07 +03:00
Dmitry Chagin
9e550625f8 linux(4): Deduplicate linux_fixup_elf().
Use native routines to fixup initial process stack. On Arm64 linux_elf_fixup() is
noop, as it do the stack fixup (room for argc) in the linux_copyout_strings().

MFC after:		1 week
2023-02-02 17:58:07 +03:00
Dmitry Chagin
7446514533 linux(4): Microoptimize linux_elf.h for future use.
In order to reduce code duplication move coredump support definitions
into the appropriate header and hide private definitions.

MFC after:		1 week
2023-02-02 17:58:06 +03:00
Konstantin Belousov
2555f175b3 Move kstack_contains() and GET_STACK_USAGE() to MD machine/stack.h
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38320
2023-02-02 00:59:26 +02:00
Dmitry Chagin
575e48f1c4 linux(4): Deduplicate MI futex structures.
MFC after:	1 week
2023-02-01 21:57:04 +03:00
Dmitry Chagin
5c32146723 amd64: Eliminate write only cpu_fxsr.
Reviewed by:		kib
Differential Revision:	https://reviews.freebsd.org/D38289
MFC after:		1 week
2023-02-01 18:17:06 +03:00
Corvin Köhne
892feec221
vmm: avoid spurious rendezvous
A vcpu only checks if a rendezvous is in progress or not to decide if it
should handle a rendezvous. This could lead to spurios rendezvous where
a vcpu tries a handle a rendezvous it isn't part of. This situation is
properly handled by vm_handle_rendezvous but it could potentially
degrade the performance. Avoid that by an early check if the vcpu is
part of the rendezvous or not.

At the moment, rendezvous are only used to spin up application
processors and to send ioapic interrupts. Spinning up application
processors is done in the guest boot phase by sending INIT SIPI
sequences to single vcpus. This is known to cause spurious rendezvous
and only occurs in the boot phase. Sending ioapic interrupts is rare
because modern guest will use msi and the rendezvous is always send to
all vcpus.

Reviewed by:		jhb
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D37390
2023-02-01 12:36:36 +01:00
Eric Joyner
5354596764
vtd: Increase DRHD_MAX_UNITS
Observed on a couple Ice Lake-SP platforms (Intel Coyote Pass, Dell
R750), there are more than 8 DRHD sections enumerated in the DMAR ACPI
section.  Since the previous limit was 8, this resulted in some of these
not being parsed by vtd when the iommu is initialized; in this case when
PCI devices are being passthru'd to a bhyve VM.

This omission later causes a kernel panic later in initialization when
devices could not be found in a valid DRHD scope because the DHRD
containing the device's scope was not added to vtd.

Signed-off-by: Eric Joyner <erj@FreeBSD.org>

PR:		268486
Sponsored by:	Intel Corporation
Reviewed by:	rew@, corvink@
MFC after:	1 day
Differential Revision:	https://reviews.freebsd.org/D38285
2023-01-31 13:57:42 -08:00
Konstantin Belousov
153643a5bc amd64: do not enable PKRU if user disabled saving PKRU register in xsave mask
This is done by reverting CR4_PKE bit, because we perform %CR4
initialization in initializecpu(), and the function is called before
xsave_mask is read.  To not redo the whole early initialization
sequence for the corner case, this should be good enough.

Reported by:	jhb
Reviewed by:	jhb, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D38219
2023-01-27 19:44:49 +02:00
Andrew Gallatin
9cb6ba29cb vm: centralize VM_BATCHQUEUE_SIZE definition
Remove the platform-specific definitions of VM_BATCHQUEUE_SIZE
for amd64 and powerpc64, and instead treat all 64-bit platforms
identically.  This has the effect of increasing the arm64
and riscv VM_BATCHQUEUE_SIZE to match that of other platforms.

Reviewed by: jhb, markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37707
2023-01-21 14:30:00 -05:00
Robert Wing
27029bc08f vmm: fix use after free in ppt_detach()
The vmm module destroys the host_domain before unloading the ppt module
causing a use after free. This can happen when kldunload'ing vmm.

Reviewed by:	markj, jhb
Differential Revision:	https://reviews.freebsd.org/D38072
2023-01-20 11:25:27 +00:00
Robert Wing
c668e8173a vmm: take exclusive mem_segs_lock in vm_cleanup()
The consumers of vm_cleanup() are vm_reinit() and vm_destroy().

The vm_reinit() call path is, here vmmdev_ioctl() takes mem_segs_lock:
    vmmdev_ioctl()
    vm_reinit()
    vm_cleanup(destroy=false)

The call path for vm_destroy() is (mem_segs_lock not taken):
    sysctl_vmm_destroy()
    vmmdev_destroy()
    vm_destroy()
    vm_cleanup(destroy=true)

Fix this by taking mem_segs_lock in vm_cleanup() when destroy == true.

Reviewed by:	corvink, markj, jhb
Fixes:  67b69e76e8 ("vmm: Use an sx lock to protect the memory map.")
Differential Revision:	https://reviews.freebsd.org/D38071
2023-01-20 11:10:53 +00:00
Robert Wing
ccf32a68f8 vmm: take exclusive mem_segs_lock when (un)assigning ppt dev
PR:             268744
Reported by:    mmatalka@gmail.com
Reviewed by:	corvink, markj, jhb
Fixes:  67b69e76e8 ("vmm: Use an sx lock to protect the memory map.")
Differential Revision:	https://reviews.freebsd.org/D37962
2023-01-20 10:03:59 +00:00
Gordon Bergling
05187f2ffc amd64: Fix a common typo in source code comments
- s/comparision/comparison/

MFC after:	3 days
2023-01-19 14:27:18 +01:00
Alexander V. Chernikov
692e19cf51 netlink: add netlink to GENERIC@amd64
Netlink is a communication protocol defined in RFC 3549. It is async,
TLV-based protocol, providing 1-1 and 1-many communications between kernel
and userland. Netlink is currently used in Linux kernel to modify, read and
subscribe for nearly all networking states. Interface state, addresses, routes,
firewall, rules, fibs, etc, are controlled via Netlink.

Netlink support was added in D36002. It has got a number of improvements and
first customers since then:
* net/bird2 got netlink support, enabling route multipath in FreeBSD
* netlink-based devd notifications are being worked on ( D37574 ).
* linux(4) fully supports and depends on Netlink

Enabling Netlink in GENERIC targets two goals.
The first one is to provide stability for the third-party userland applications,
so they can rely on the fact that netlink always exists since 14.0 and potentially 13.2.
Loadable module makes life of the app delepers harder. For example, `net/bird2` can be
either build with netlink or rtsock support, but not both.

The second goal is to enable gradual conversion of the base userland tools
to use netlink(4) interfaces. Converting tools like netstat (D36529), route,
ifconfig one-by-one simplifies testing and addressing the feedback.
Othewise, switching all base to use netlink at once may be too big of a leap.

This change targets amd64, the other architectures will follow soon.

Differential Revision: https://reviews.freebsd.org/D37783
2023-01-13 10:22:40 +00:00
Konstantin Belousov
ad97b9bbfc amd64 pmap.h: make it easier to use the header for other consumers
Guard pmap_invlpg() definition with checks that only provide it when
both sys/pcpu.h and machine/cpufunc.h were already included.

Requested by:	Elliott Mitchell
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2023-01-06 01:30:29 +02:00
Konstantin Belousov
a2c08eba43 amd64: be more precise when enabling the AlderLake small core PCID workaround
In particular, do not enable the workaround if INVPCID is not supported
by the core.

Reported by:	"Chen, Alvin W" <Weike.Chen@Dell.com>
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37940
2023-01-06 01:30:29 +02:00
Konstantin Belousov
231d75568f Move INVLPG to pmap_quick_enter_page() from pmap_quick_remove_page().
If processor prefetches neighboring TLB entries to the one being accessed
(as some have been reported to do), then the spin lock does not prevent
the situation described in the "AMD64 Architecture Programmer's Manual
Volume 2: System Programming" rev. 3.23, "7.3.1 Special Coherency
Considerations".

Reported and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:46 +02:00
Konstantin Belousov
cde70e312c amd64: for small cores, use (big hammer) INVPCID_CTXGLOB instead of INVLPG
A hypothetical CPU bug makes invalidation of global PTEs using INVLPG
in pcid mode unreliable, it seems.  The workaround is applied for all
CPUs with small cores, since we do not know the scope of the issue, and
the right fix.

Reviewed by:	alc (previous version)
Discussed with:	emaste, markj
Tested by:	karels
PR:	261169, 266145
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:45 +02:00
Konstantin Belousov
45ac7755a7 amd64: identify small cores
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37770
2023-01-01 00:09:45 +02:00
Andrew Gallatin
1cac76c93f vm: reduce lock contention when processing vm batchqueues
Rather than waiting until the batchqueue is full to acquire the lock &
process the queue, we now start trying to acquire the lock using trylocks
when the batchqueue is 1/2 full. This removes almost all contention on the
vm pagequeue mutex for for our busy sendfile() based web workload.
It also greadly reduces the amount of time a network driver ithread
remains blocked on a mutex, and eliminates some packet drops under
heavy load.

So that the system does not loose the benefit of processing large
batchqueues, I've doubled the size of the batchqueues. This way, when
there is no contention, we process the same batch size as before.

This has been run for several months on a busy Netflix server, as well
as on my personal desktop.

Reviewed by: markj
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D37305
2022-12-14 14:34:07 -05:00
Alan Cox
f0878da03b pmap: standardize promotion conditions between amd64 and arm64
On amd64, don't abort promotion due to a missing accessed bit in a
mapping before possibly write protecting that mapping.  Previously,
in some cases, we might not repromote after madvise(MADV_FREE) because
there was no write fault to trigger the repromotion.  Conversely, on
arm64, don't pointlessly, yet harmlessly, write protect physical pages
that aren't part of the physical superpage.

Don't count aborted promotions due to explicit promotion prohibition
(arm64) or hardware errata (amd64) as ordinary promotion failures.

Reviewed by:	kib, markj
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D36916
2022-12-12 11:32:50 -06:00
John Baldwin
af3b48e101 vmm: Free vCPUs when destroying them.
Reported by:	andrew
Reviewed by:	corvink, andrew, markj
Differential Revision:	https://reviews.freebsd.org/D37649
2022-12-09 10:27:05 -08:00
John Baldwin
d212d6ebb4 vmm: Avoid infinite loop in vcpu_lock_all error case.
Reported by:	Coverity (CIDs 1501060,1501071)
Reviewed by:	corvink, markj, emaste
Differential Revision:	https://reviews.freebsd.org/D37648
2022-12-09 10:26:49 -08:00
John Baldwin
91980db1be vmm: Don't lock a vCPU for VM_PPTDEV_MSI[X].
These are manipulating state in a ppt(4) device none of which is
vCPU-specific.  Mark the vcpu fields in the relevant ioctl structures
as unused, but don't remove them for now.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37639
2022-12-09 10:26:23 -08:00
John Baldwin
62be9ffd82 vmm: VM_GET/SET_KERNEMU_DEV should run with the vCPU locked.
Reviewed by:	corvink, kib, markj
Differential Revision:	https://reviews.freebsd.org/D37638
2022-12-09 10:25:30 -08:00
John Baldwin
1f6db5d6b5 vmm: Remove stale comment for vm_rendezvous.
Support for rendezvous outside of a vcpu context (vcpuid of -1) was
removed in commit 949f0f47a4, and the vm, vcpuid argument pair was
replaced by a single struct vcpu pointer in commit d8be3d523d.

Reported by:	andrew
2022-11-30 13:06:46 -08:00
Bjoern A. Zeeb
4a8e4d1546 net80211: fix IEEE80211_DEBUG_REFCNT builds
Remove the KPI/KBI changes from ieee80211_node.h and always use the
macros to pass in __func__ and __LINE__ to the functions.
The actual implementations are prefixed by "_" rather than suffixed
by "_debug" as they no longer are "debug"-specific.

Some of the select functions were not actually using the passed in
func, line options; however they are calling other functions which
use them.  Directly call the internal implementation in those cases
passing the arguments on.

Use a file-local __debrefcnt_used define to mark the arguments __unused
in cases when we compile without IEEE80211_DEBUG_REFCNT and hope the
toolchain is intelligent enough to not pass them at all in those cases.

Also _ieee80211_free_node() now has a conflict so make the previous
_ieee80211_free_node() the new __ieee80211_free_node().

Add IEEE80211_DEBUG_REFCNT to the NOTES file on amd64 to keep exercising
the option.

Sponsored by:	The FreeBSD Foundation
X-MFC:		never
Discussed on:	freebsd-wireless
Reviewed by:	adrian
Differential Revision: https://reviews.freebsd.org/D37529
2022-11-29 21:20:37 +00:00
Corvin Köhne
7c326ab5bb
vmm: don't lock a mtx in the icr_low write handler
x2apic accesses are handled by a wrmsr exit. This handler is called in a
critical section. So, we can't lock a mtx in the icr_low handler.

Reported by:		kp, pho
Tested by:		kp, pho
Approved by:		manu (mentor)
Fixes:			c0f35dbf19 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
MFC after:		1 week
MFC with:		c0f35dbf19
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D37452
2022-11-23 09:00:04 +01:00
Corvin Köhne
fde8ce8892
vmm: remove unneccessary rendezvous assertion
When a vcpu sees that a rendezvous is in progress, it exits and tries to
handle the rendezvous. The vcpu doesn't check if it's part of the
rendezvous or not. If the vcpu isn't part of the rendezvous, the
rendezvous could be done before it reaches the assertion. This will
cause a panic.

The assertion isn't needed at all because vm_handle_rendezvous properly
handles a spurious rendezvous. So, we can just remove it.

PR:			267779
Reviewed by:		jhb, markj
Tested by:		bz
Approved by:		manu (mentor)
MFC after:		1 week
Sponsored by:		Beckhoff Automation GmbH & Co. KG
Differential Revision:	https://reviews.freebsd.org/D37417
2022-11-21 08:19:36 +01:00
Dmitry Chagin
2ee1a18d51 vmm: Fix build w/o KDTRACE_HOOKS.
Reviewed by:		imp
Differential revision:	https://reviews.freebsd.org/D37446
2022-11-20 18:00:55 +03:00
Cy Schubert
d487cba33d vmm: Fix non-INVARIANTS build
Reported by:	O. Hartmann <freebsd@walstatt-de.de>
Reviewed by:	jhb
Fixes:		58eefc67a1
Differential Revision:	https://reviews.freebsd.org/D37444
2022-11-18 13:20:13 -08:00
Mark Johnston
ca6b48f080 vmm: Restore the correct vm_inject_*() prototypes
Fixes:	80cb5d845b ("vmm: Pass vcpu instead of vm and vcpuid...")
Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D37443
2022-11-18 14:11:48 -05:00
John Baldwin
49fd5115a9 vmm: Trim some pointless #ifdef KTR.
Reported by:	markj
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37272
2022-11-18 10:25:39 -08:00
John Baldwin
ee98f99d7a vmm: Convert VM_MAXCPU into a loader tunable hw.vmm.maxcpu.
The default is now the number of physical CPUs in the system rather
than 16.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37175
2022-11-18 10:25:39 -08:00
John Baldwin
98568a005a vmm: Allocate vCPUs on first use of a vCPU.
Convert the vcpu[] array in struct vm to an array of pointers and
allocate vCPUs on first use.  This avoids always allocating VM_MAXCPU
vCPUs for each VM, but instead only allocates the vCPUs in use.  A new
per-VM sx lock is added to serialize attempts to allocate vCPUs on
first use.  However, a given vCPU is never freed while the VM is
active, so the pointer is read via an unlocked read first to avoid the
need for the lock in the common case once the vCPU has been created.

Some ioctls need to lock all vCPUs.  To prevent races with ioctls that
want to allocate a new vCPU, these ioctls also lock the sx lock that
protects vCPU creation.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37174
2022-11-18 10:25:38 -08:00
John Baldwin
c0f35dbf19 vmm: Use a cpuset_t for vCPUs waiting for STARTUP IPIs.
Retire the boot_state member of struct vlapic and instead use a cpuset
in the VM to track vCPUs waiting for STARTUP IPIs.  INIT IPIs add
vCPUs to this set, and STARTUP IPIs remove vCPUs from the set.
STARTUP IPIs are only reported to userland for vCPUs that were removed
from the set.

In particular, this permits a subsequent change to allocate vCPUs on
demand when the vCPU may not be allocated until after a STARTUP IPI is
reported to userland.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37173
2022-11-18 10:25:38 -08:00
John Baldwin
223de44c93 vmm devmem_mmap_single: Bump object reference under memsegs lock.
Reported by:	markj
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37273
2022-11-18 10:25:38 -08:00
John Baldwin
67b69e76e8 vmm: Use an sx lock to protect the memory map.
Previously bhyve obtained a "read lock" on the memory map for ioctls
needing to read the map by locking the last vCPU.  This is now
replaced by a new per-VM sx lock.  Modifying the map requires
exclusively locking the sx lock as well as locking all existing vCPUs.
Reading the map requires either locking one vCPU or the sx lock.

This permits safely modifying or querying the memory map while some
vCPUs do not exist which will be true in a future commit.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37172
2022-11-18 10:25:38 -08:00
John Baldwin
08ebb36076 vmm: Destroy mutexes.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37171
2022-11-18 10:25:38 -08:00
John Baldwin
d5118d0fc4 vmm stat: Add a special nelems constant for arrays sized by vCPU count.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37170
2022-11-18 10:25:38 -08:00
John Baldwin
58eefc67a1 vmm vmx: Allocate vpids on demand as each vCPU is initialized.
Compared to the previous version this does mean that if the system as
a whole runs out of dedicated vPIDs you might end up with some vCPUs
within a single VM using dedicated vPIDs and others using shared
vPIDs, but this should not break anything.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37169
2022-11-18 10:25:38 -08:00
John Baldwin
3f0f4b1598 vmm: Lookup vcpu pointers in vmmdev_ioctl.
Centralize mapping vCPU IDs to struct vcpu objects in vmmdev_ioctl and
pass vcpu pointers to the routines in vmm.c.  For operations that want
to perform an action on all vCPUs or on a single vCPU, pass pointers
to both the VM and the vCPU using a NULL vCPU pointer to request
global actions.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37168
2022-11-18 10:25:38 -08:00
John Baldwin
0cbc39d53d vmm ppt: Remove unused vcpu arg from MSI setup handlers.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37167
2022-11-18 10:25:37 -08:00
John Baldwin
e42c24d56b vmm: Remove unused vcpuid argument from vioapic_process_eoi.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37166
2022-11-18 10:25:37 -08:00
John Baldwin
d8be3d523d vmm: Use struct vcpu in the rendezvous code.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37165
2022-11-18 10:25:37 -08:00
John Baldwin
949f0f47a4 vmm: Remove support for vm_rendezvous with a cpuid of -1.
This is not currently used.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37164
2022-11-18 10:25:37 -08:00
John Baldwin
9388bc1e3a vmm: Remove vcpuid from I/O port handlers.
No I/O ports are vCPU-specific (unlike memory which does have
vCPU-specific ranges such as the local APIC).

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37163
2022-11-18 10:25:37 -08:00
John Baldwin
80cb5d845b vmm: Pass vcpu instead of vm and vcpuid to APIs used from CPU backends.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37162
2022-11-18 10:25:37 -08:00
John Baldwin
d3956e4673 vmm: Use struct vcpu in the instruction emulation code.
This passes struct vcpu down in place of struct vm and and integer
vcpu index through the in-kernel instruction emulation code.  To
minimize userland disruption, helper macros are used for the vCPU
arguments passed into and through the shared instruction emulation
code.

A few other APIs used by the instruction emulation code have also been
updated to accept struct vcpu in the kernel including
vm_get/set_register and vm_inject_fault.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37161
2022-11-18 10:25:37 -08:00
John Baldwin
28b561ad9d vmm: Add vm_gpa_hold_global wrapper function.
This handles the case that guest pages are being held not on behalf of
a virtual CPU but globally.  Previously this was handled by passing a
vcpuid of -1 to vm_gpa_hold, but that will not work in the future when
vm_gpa_hold is changed to accept a struct vcpu pointer.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37160
2022-11-18 10:25:36 -08:00
John Baldwin
0f435e6476 vmm: Add _KERNEL guards for io headers shared with userspace.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37159
2022-11-18 10:25:36 -08:00
John Baldwin
2b4fe856f4 bhyve: Remove unused vm and vcpu arguments from vm_copy routines.
The arguments identifying the VM and vCPU are only needed for
vm_copy_setup.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37158
2022-11-18 10:25:36 -08:00
John Baldwin
3dc3d32ad6 vmm: Use struct vcpu with the vmm_stat API.
The function callbacks still use struct vm and and vCPU index.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37157
2022-11-18 10:25:36 -08:00
John Baldwin
950af9ffc6 vmm: Expose struct vcpu as an opaque type.
Pass a pointer to the current struct vcpu to the vcpu_init callback
and save this pointer in the CPU-specific vcpu structures.

Add routines to fetch a struct vcpu by index from a VM and to query
the VM and vcpuid from a struct vcpu.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37156
2022-11-18 10:25:36 -08:00
John Baldwin
d030f941e6 vmm: Use VLAPIC_CTR* in more places.
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37155
2022-11-18 10:25:36 -08:00
John Baldwin
57e0119ef3 vmm vmx: Add VMX_CTR* wrapper macros.
These macros are similar to VCPU_CTR* but accept a single vmx_vcpu
pointer as the first argument instead of separate vm and vcpuid.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37154
2022-11-18 10:25:36 -08:00
John Baldwin
fca494dad0 vmm svm: Add SVM_CTR* wrapper macros.
These macros are similar to VCPU_CTR* but accept a single svm_vcpu
pointer as the first argument instead of separate vm and vcpuid.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37153
2022-11-18 10:25:36 -08:00
John Baldwin
869c8d1946 vmm: Remove the per-vm cookie argument from vmmops taking a vcpu.
This requires storing a reference to the per-vm cookie in the
CPU-specific vCPU structure.  Take advantage of this new field to
remove no-longer-needed function arguments in the CPU-specific
backends.  In particular, stop passing the per-vm cookie to functions
that either don't use it or only use it for KTR traces.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37152
2022-11-18 10:25:35 -08:00
John Baldwin
1aa5150479 vmm: Refactor storage of CPU-dependent per-vCPU data.
Rather than storing static arrays of per-vCPU data in the CPU-specific
per-VM structure, adopt a more dynamic model similar to that used to
manage CPU-specific per-VM data.

That is, add new vmmops methods to init and cleanup a single vCPU.
The init method returns a pointer that is stored in 'struct vcpu' as a
cookie pointer.  This cookie pointer is now passed to other vmmops
callbacks in place of the integer index.  The index is now only used
in KTR traces and when calling back into the CPU-independent layer.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37151
2022-11-18 10:25:35 -08:00
John Baldwin
73abae4493 vmm vmx: Add a global bool to indicate if the host has the TSC_AUX MSR.
A future commit will remove direct access to vCPU structures from
struct vmx, so add a dedicated boolean for this rather than checking
the capabilities for vCPU 0.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37269
2022-11-18 10:25:35 -08:00
John Baldwin
39ec056e6d vmm: Rework snapshotting of CPU-specific per-vCPU data.
Previously some per-vCPU state was saved in vmmops_snapshot and other
state was saved in vmmops_vcmx_snapshot.  Consolidate all per-vCPU
state into the latter routine and rename the hook to the more generic
'vcpu_snapshot'.  Note that the CPU-independent per-vCPU data is still
stored in a separate blob as well as the per-vCPU local APIC data.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37150
2022-11-18 10:25:35 -08:00
John Baldwin
19b9dd2e08 vmm svm: Mark all VMCB state caches dirty on vCPU restore.
Mark Johnston noticed that this was missing VMCB_CACHE_LBR.  Just set
all the bits as is done in svm_run() rather than trying to clear
individual bits.

Reported by:	markj
Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37259
2022-11-18 10:25:35 -08:00
John Baldwin
0f00260c67 vmm vmx: Refactor per-vCPU data.
Add a struct vmx_vcpu to hold per-vCPU data specific to VT-x and
move parallel arrays out of struct vmx into a single array of
this structure.

While here, dynamically allocate the VMCS, APIC page and PIR
descriptors for each vCPU rather than embedding them.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37149
2022-11-18 10:25:35 -08:00
John Baldwin
215d2fd53f vmm svm: Refactor per-vCPU data.
- Allocate VMCBs separately to avoid excessive padding in struct
  svm_vcpu.

- Allocate APIC pages dynamically directly in struct vlapic.

- Move vm_mtrr into struct svm_vcpu rather than using a separate
  parallel array.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37148
2022-11-18 10:25:35 -08:00
John Baldwin
35abc6c238 vmm: Use vm_get_maxcpus() instead of VM_MAXCPU in various places.
Mostly these are loops that iterate over all possible vCPU IDs for a
specific virtual machine.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37147
2022-11-18 10:25:34 -08:00
John Baldwin
a7db532e3a vmm: Simplify saving of absolute TSC values in snapshots.
Read the current "now" TSC value and use it to compute absolute time
saved value in vm_snapshot_vcpus rather than iterating over vCPUs
multiple times in vm_snapshot_vm.

Reviewed by:	corvink, markj
Differential Revision:	https://reviews.freebsd.org/D37146
2022-11-18 10:25:34 -08:00
Mateusz Guzik
c3f1a13902 Retire broken GPROF support from the kernel
The option is not even recognized and with that patched it does not
compile. Even if it did work, it would be prohibitively expensive to
use.

Interested parties can use pmcstat or dtrace instead.
2022-11-15 14:17:10 +00:00
Mark Johnston
8b1adff8bc bhyve: Drop volatile qualifiers from snapshot code
They accomplish nothing since the qualifier is casted away in calls to
memcpy() and copyin()/copyout().  No functional change intended.

MFC after:	2 weeks
Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D37292
2022-11-11 10:02:26 -05:00
Elliott Mitchell
ccd9b49f20 sys: use .S for assembly language files that use the preprocessor
Reviewed by:	imp
Pull Request:	https://github.com/freebsd/freebsd-src/pull/609
Differential Revision: https://reviews.freebsd.org/D35908
2022-11-02 10:29:00 -04:00
Konstantin Belousov
4d447b30f7 vmm: do not leak halted_cpus bit after suspension
Reported by:	bz
PR:	267468
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37227
2022-11-01 20:44:42 +02:00
Mitchell Horne
aba921bd9e ddb: print the actual syscall name
Some architectures will pretty-print a system call trap in the
backtrace. Rather than printing the symbol, use the syscallname()
function to pull the string from the sv_syscallnames array corresponding
to the process. This simplifies the function somewhat.

Mostly, this will result in dropping the "sys" prefix, e.g. "sys_exit"
will now be printed simply as "exit".

Make two minor tweaks to the function signature: use a u_int for the
syscall number since this is a more correct type (see the 'code' member
of struct syscall_args), and make the thread pointer the first argument.
The latter is more natural and conventional.

Suggested by:   jrtc27
Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D37200
2022-10-28 18:21:08 -03:00
Mitchell Horne
1da65dcb1c linux: populate sv_syscallnames in each sysentvec
This allows the syscallname() function to give a usable result for Linux
ABIs.

Reported by:	jrtc27
Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D37199
2022-10-28 18:21:08 -03:00
Jung-uk Kim
19ee8335c5 acpica: Merge ACPICA 20221020 2022-10-27 22:04:32 -04:00
John Baldwin
769b884e2e vmm: Fix AP startup with old userspace binaries.
Older binaries that do not request IPI exits to userspace do not
start user threads for other vCPUs until a STARTUP IPI triggers a
VM_EXITCODE_SPINUP_AP exit to userland.  This means that those vcpus
are not yet active (in terms of vm_active_cpus) when the INIT and
STARTUP IPIs are delivered to the vCPUs.

The changes in commit 0bda8d3e9f changed the INIT and STARTUP IPIs
to reuse the existing vlapic_calcdest() function.  This function
silently ignores IPIs sent to inactive vCPUs.  As a result, when using
an old bhyve binary, the INIT and STARTUP IPIs sent to wakeup APs were
ignored.

To fix, restructure the compat code for the INIT and STARTUP IPIs to
ignore the results of vlapic_calcdest() and manually parse the APIC ID
and resulting vcpuid.  As part of this, make the compat code always
conditonal on the ipi_exit capability being disabled.

Reviewed by:	c.koehne_beckhoff.com, markj
Differential Revision:	https://reviews.freebsd.org/D37093
2022-10-26 14:22:56 -07:00
Mark Johnston
ed72168431 bhyve: Address some signed/unsigned comparison warnings
MFC after:	1 week
2022-10-25 11:16:57 -04:00
Konstantin Belousov
934bfc128e Add vm_page_any_valid()
Use it and several other vm_page_*_valid() functions in more places.

Suggested and reviewed by:	markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D37024
2022-10-19 20:24:07 +03:00
Colin Percival
469ad86031 amd64: Add FIRECRACKER kernel configuration
This kernel configuration supports the Firecracker VMM environment.

Relnotes:	FreeBSD can now run inside the Firecracker VMM
		via the amd64 FIRECRACKER kernel configuration.
Sponsored by:	https://www.patreon.com/cperciva
Differential Revision:	https://reviews.freebsd.org/D36672
2022-10-17 23:02:22 -07:00
Corvin Köhne
2a2a64c4b9 vmm: validate icr value
Not all combinations of icr values are allowed. Neither Intel nor AMD
document what happens when an invalid value is written to the icr.
Ignore the IPI. So, the guest will note that the IPI wasn't delivered.

Reviewed by:		jhb
Differential Revision:  https://reviews.freebsd.org/D36946
Sponsored by:           Beckhoff Automation GmbH & Co. KG
2022-10-14 12:03:05 +02:00
Corvin Köhne
f56801d6d9 vmm: increase vlapic version
Mac os panics on apic versions lower than 0x14.

See https://opensource.apple.com/source/xnu/xnu-7195.81.3/osfmk/i386/lapic_native.c.auto.html

Additionally, an upcoming commit will validate the icr values written by
the guest. Older intel processors allow some different combinations than
the newer ones. AMD documents that only the newer combinations are
allowed. So, bumping the version allows us to avoid a differentiation
between AMD and Intel.

Intel documents that newer processors than the P6 are using the new
combinations. Sadly, Intel does not document which apic version belongs
to those processors. Linux identifies newer apics by a version larger or
equal to 0x14. Intel and AMD allow apic version between 0x10 and 0x15.
So, using 0x14 seems to be fine.

See 3eba620e7b/arch/x86/kernel/apic/apic.c (L238)

Reviewed by:		jhb
Differential Revision:  https://reviews.freebsd.org/D36945
Sponsored by:           Beckhoff Automation GmbH & Co. KG
2022-10-14 12:03:05 +02:00
Corvin Köhne
0bda8d3e9f vmm: permit some IPIs to be handled by userspace
Add VM_EXITCODE_IPI to permit returning unhandled IPIs to userland.
INIT and STARTUP IPIs are now returned to userland. Due to backward
compatibility reasons, a new capability is added for enabling
VM_EXITCODE_IPI.

Reviewed by:		jhb
Differential Revision:  https://reviews.freebsd.org/D35623
Sponsored by:           Beckhoff Automation GmbH & Co. KG
2022-10-14 12:03:05 +02:00
Konstantin Belousov
e0612ed490 amd64 pmap: add comment explaining why INVLPG is functional for PCID config
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36919
2022-10-11 00:33:17 +03:00
Konstantin Belousov
273d0715f6 amd64: remove useless addr2 variables in page range invalidation handlers
Reviewed by:	alc, markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36919
2022-10-11 00:33:12 +03:00
Mark Johnston
98d920d9cf bhyve: Annotate unused function parameters
MFC after:	1 week
2022-10-08 11:33:21 -04:00
John Baldwin
4d90a5afc5 sys: Consolidate common implementation details of PV entries.
Add a <sys/_pv_entry.h> intended for use in <machine/pmap.h> to
define struct pv_entry, pv_chunk, and related macros and inline
functions.

Note that powerpc does not yet use this as while the mmu_radix pmap
in powerpc uses the new scheme (albeit with fewer PV entries in a
chunk than normal due to an used pv_pmap field in struct pv_entry),
the Book-E pmaps for powerpc use the older style PV entries without
chunks (and thus require the pv_pmap field).

Suggested by:	kib
Reviewed by:	kib
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36685
2022-10-07 10:14:03 -07:00
Mitchell Horne
b05b1ecbef amd64, arm64 pmap: fix a comment typo
There is no such error code.

Fixes:	1d5ebad06c ("pmap: optimize MADV_WILLNEED on existing superpages")
2022-10-06 19:04:54 -03:00
Konstantin Belousov
85b715baae amd64/db_trace.c: remove stray prototype
Sponsored by:	NVIDIA networking
MFC after:	1 week
2022-10-04 01:50:30 +03:00
Mitchell Horne
754cb545b6 ddb: de-duplicate decode_syscall()
Only i386 and amd64 print the decoded syscall name in the backtrace.
This de-duplication facilitates further changes and adoption by other
platforms.

Reviewed by:	jrtc27, markj, jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36565
2022-10-03 13:49:54 -03:00
Alan Cox
1d5ebad06c pmap: optimize MADV_WILLNEED on existing superpages
Specifically, avoid pointless calls to pmap_enter_quick_locked() when
madvise(MADV_WILLNEED) is applied to an existing superpage mapping.

Reported by:	mhorne
Reviewed by:	kib, markj
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36801
2022-09-30 12:14:05 -05:00
John Baldwin
a35572b16e linux32: binutils as requires %eflags instead of %flags for CFI.
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D36781
2022-09-29 15:06:01 -07:00
Konstantin Belousov
648fa3558c amd64: Initialize IPI scoreboard earlier
Scoreboard is needed a moment when smp_started == true.  If some kernel
daemon thread is started before scoreboard is inited, and does some pmap
operation that requires TLB maintanence, which races with SMP startup,
we might dereference NULL invl_scoreboard.  This is particularly easy
to trigger when EARLY_AP_STARTUP is not defined.

Reported by:	glebius
Reviewed by:	markj
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D36766
2022-09-28 16:23:52 +03:00
Mark Johnston
4551cbbe99 amd64: Ignore 1GB mappings in pmap_advise()
This assertion can be triggered by usermode since vm_map_madvise()
doesn't force advice to be applied to an entire largepage mapping.  I
can't see any reason not to permit it, however, since MADV_DONTNEED and
_FREE are advisory and we can simply do nothing when a 1GB mapping is
encountered.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36675
2022-09-24 09:28:41 -04:00
Mark Johnston
6c2e9f4c32 amd64: Handle 1GB mappings in pmap_enter_quick_locked()
This code path can be triggered by applying MADV_WILLNEED to a 1GB
mapping.

Reviewed by:	alc, kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D36674
2022-09-24 09:28:41 -04:00
Mark Johnston
0b29f5efcc amd64: Make it possible to grow the KERNBASE region of KVA
pmap_growkernel() may be called when mapping a region above KERNBASE,
typically for a kernel module.  If we have enough PTPs left over from
bootstrap, pmap_growkernel() does nothing.  However, it's possible to
run out, and in this case pmap_growkernel() will try to grow the kernel
map all the way from kernel_vm_end to somewhere past KERNBASE, which can
easily run the system out of memory.  This happens with large kernel
modules such as the nvidia GPU driver.  There is also a WIP dtrace
provider which needs to map KVA in the region above KERNBASE (to provide
trampolines which allow a copy of traced kernel instruction to be
executed), and its allocations could potentially trigger this scenario.

This change modifies pmap_growkernel() to manage the two regions
separately, allowing them to grow independently.  The end of the
KERNBASE region is tracked by modifying "nkpt".

PR:		265019
Reviewed by:	alc, imp, kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D36673
2022-09-24 09:27:50 -04:00
John Baldwin
f49fd63a6a kmem_malloc/free: Use void * instead of vm_offset_t for kernel pointers.
Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36549
2022-09-22 15:09:19 -07:00
John Baldwin
7ae99f80b6 pmap_unmapdev/bios: Accept a pointer instead of a vm_offset_t.
This matches the return type of pmap_mapdev/bios.

Reviewed by:	kib, markj
Sponsored by:	DARPA
Differential Revision:	https://reviews.freebsd.org/D36548
2022-09-22 15:08:52 -07:00
Richard Scheffenegger
bb1d472d79 tcp: make CUBIC the default congestion control mechanism.
This changes the default TCP Congestion Control (CC) to CUBIC.
For small, transactional exchanges (e.g. web objects <15kB), this
will not have a material effect. However, for long duration data
transfers, CUBIC allocates a slightly higher fraction of the
available bandwidth, when competing against NewReno CC.

Reviewed By: tuexen, mav, #transport, guest-ccui, emaste
Relnotes: Yes
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D36537
2022-09-13 12:09:21 +02:00
Alan Cox
8d7ee2047c pmap: don't recompute mpte during promotion
When attempting to promote 4KB user-space mappings to a 2MB user-space
mapping, the address of the struct vm_page representing the page table
page that contains the 4KB mappings is already known to the caller.
Pass that address to the promotion function rather than making the
promotion function recompute it, which on arm64 entails iteration over
the vm_phys_segs array by PHYS_TO_VM_PAGE().  And, while I'm here,
eliminate unnecessary arithmetic from the calculation of the first PTE's
address on arm64.

MFC after:	1 week
2022-09-11 01:19:22 -05:00
Emmanuel Vadot
3fc174845c Revert "vmm: permit some IPIs to be handled by userspace"
This reverts commit a5a918b7a9.

This cause some problem with vm using bhyveload.

Reported by:	pho, kp
2022-09-09 15:55:01 +02:00
Emmanuel Vadot
83b65d0ae1 Revert "vmm: Remove unneeded variable maxcpus"
This reverts commit 653c36179d.
2022-09-09 15:54:56 +02:00