1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-11-28 08:02:54 +00:00
Commit Graph

9310 Commits

Author SHA1 Message Date
John Baldwin
a80b9ee15a atomic(9): Implement atomic_testand(clear|set)_ptr
For current architectures, these are just aliases for the existing
operation on the relevant scalar integer.

Reviewed by:	imp, kib
Obtained from:	CheriBSD
Sponsored by:	AFRL, DARPA
Differential Revision:	https://reviews.freebsd.org/D47631
2024-11-19 10:24:50 -05:00
Mark Johnston
d70230783a vmm: Postpone vmm module initialization to after SI_SUB_DEVFS
vmmops_modinit() needs to create a device file, and this must happen
after SI_SUB_DEVFS.  On non-EARLY_AP_STARTUP platforms (i.e., !x86) this
happens already by accident, but we shouldn't rely on it.

On riscv, remove the current SI_SUB_SMP ordering since that was copied
from arm64 and isn't needed.  In particular, riscv's vmmops_modinit()
does not call smp_rendezvous().

Reported by:	Oleksandr Kryvulia <shuriku@shurik.kiev.ua>
Fixes:	a97f683fe3 ("vmm: Add a device file interface for creating and destroying VMs")
2024-11-07 20:38:38 +00:00
Mark Johnston
a97f683fe3 vmm: Add a device file interface for creating and destroying VMs
This supersedes the sysctl interface, which has the limitations of being
root-only and not supporting automatic resource destruction, i.e., we
cannot easily destroy VMs automatically when bhyve terminates.

For now, two ioctls are implemented VMMCTL_VM_CREATE and
VMMCTL_VM_DESTROY.  Eventually I would like to support tying a VM's
lifetime to that of the descriptor, so that it is automatically
destroyed when the descriptor is closed.  However, this will require
some work in bhyve: when the guest wants to reboot, bhyve exits with a
status that indicates that it is to be restarted.  This is incompatible
with the idea of tying a VM's lifetime to that of a descriptor, since we
want to avoid creating and destroying a VM across each reboot (as this
involves freeing all of the guest memory, among other things).  One
possible design would be to decompose bhyve into two processes, a parent
which handles reboots, and a child which runs in capability mode and
handles guest execution.

In any case, this gets us closer to addressing the shortcomings
mentioned above.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D47028
2024-11-05 01:40:41 +00:00
Mark Johnston
f95acbd89d vmm: Rename the amdiommu driver to amdviiommu
To avoid a conflict with the new amdiommu driver imported recently.

Fixes:		0f5116d7ef ("AMD IOMMU driver")
Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D47415
2024-11-04 14:45:57 +00:00
John Baldwin
df61573596 x86: Remove invalid DEVMETHOD methods for leaf devices
None of these drivers are for bus devices, so bus_generic_* is not
appropriate.  Most of these were nops except that detach would
actually "succeed" (but not do any cleanup).

Reviewed by:	imp
Differential Revision:	https://reviews.freebsd.org/D47374
2024-11-01 10:10:30 -04:00
Ruslan Bukin
72ae04c733 vmm: fix vcpu atomic load
Load vcpu with acquire semantics as we are making a critical code
section between creating vcpu and using it.

Tested on risc-v only.

Pointed out by:	markj
Reviewed by: jhb, markj
Differential Revision: https://reviews.freebsd.org/D47306
2024-10-29 16:19:49 +00:00
Brooks Davis
76ab72e828 sysent: regen for typo fix 2024-10-22 19:21:26 +01:00
Konstantin Belousov
6244b9dc4a la57: explain how the trampoline works
Reviewed by:	markj, imp (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
Differential revision:	https://reviews.freebsd.org/D47208
2024-10-20 20:00:23 +03:00
Pierre Pronchery
d19fa9c1b7 vmm: avoid potential KASSERT kernel panic in vm_handle_db
If the guest VM emits the exit code VM_EXITCODE_DB the kernel will
execute the function named vm_handle_db.

If the value of rsp is not page aligned and if rsp+sizeof(uint64_t)
spans across two pages, the function vm_copy_setup will need two structs
vm_copyinfo to prepare the copy operation.

For instance is rsp value is 0xFFC, two vm_copyinfo objects are needed:

* address=0xFFC, len=4
* address=0x1000, len=4

The vulnerability was addressed by commit 51fda658ba ("vmm: Properly
handle writes spanning across two pages in vm_handle_db").  Still,
replace the KASSERT with an error return as a more defensive approach.

Reported by:    Synacktiv
Reviewed by	markj, emaste
Security:       HYP-09
Sponsored by:   The Alpha-Omega Project
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D46133
2024-10-02 12:58:45 -04:00
Bojan Novković
51fda658ba vmm: Properly handle writes spanning across two pages in vm_handle_db
The vm_handle_db function is responsible for writing correct status
register values into memory when a guest VM is being single-stepped
using the RFLAGS.TF mechanism. However, it currently does not properly
handle an edge case where the resulting write spans across two pages.
This commit fixes this by making vm_handle_db use two vm_copy_info
structs.

Security:	HYP-09
Reviewed by:	markj
2024-10-02 18:43:36 +02:00
Brooks Davis
d9d2e3ab7c sysent: regen comments 2024-10-01 18:46:40 +01:00
Brooks Davis
13227efc5b sysent: regen removing comment alignment 2024-10-01 17:10:08 +01:00
Konstantin Belousov
c2fe7156e9 amd64/mp_machdep.c: style
Wrap long lines.
Remove redundand declaration.

Sponsored by:	The FreeBSD Foundation
MFC after:	3 days
2024-10-01 14:32:19 +03:00
Pierre Pronchery
94693ec7c8 bhyve: initialize register value
In case of an error in a code pattern like

```
uint64_t val;
error = memread(vcpu, gpa, &val, 1, arg);
error = vie_update_register(vcpu, reg, val, size);
```

uninitialized stack data would be used.

Reported by:    Synacktiv
Reviewed by:	markj
Security:       HYP-21
Sponsored by:   The Alpha-Omega Project
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D46107
2024-09-27 08:59:36 -04:00
Joshua Rogers
f3754afd59 Remove stray whitespaces from sys/amd64/
Signed-off-by: Joshua Rogers <Joshua@Joshua.Hu>
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1418
2024-09-21 07:05:46 -06:00
Ahmad Khalifa
b538d49110 Add a new sysctl in order to diffrentiate UEFI architectures
With the new 32-bit UEFI loader, it's convenient to have a sysctl to
figure out how we booted. Can be accessed at machdep.efi_arch

Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1098
2024-09-20 08:45:09 -06:00
Konstantin Belousov
666303f598 sysarch: improve checks for max user address
making LA48 processes have the same limit as with the pre-LA57 kernels.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-17 02:02:14 +03:00
Konstantin Belousov
29a0a720c3 amd64 sysarch(2): style
Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-17 02:02:14 +03:00
Konstantin Belousov
e134cd9580 amd64: pml5 entries do not support PAT bits
Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-17 02:02:14 +03:00
Konstantin Belousov
4f82af24f1 amd64 pmap: do not set PG_G for usermode pmap pml5 kernel entry
Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-17 02:02:14 +03:00
Konstantin Belousov
bbb00b1719 pmap_bootstrap_la57(): reload IDT
after the trip through protected mode.  This is required by AMD64 ARM.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-17 02:02:14 +03:00
Konstantin Belousov
678bc2281c la57: do not set global bit for PML5 entry
The bit is reserved for PLM5, causing #PF on KVA access on real
hardware, unlike QEMU.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:13:51 +03:00
Konstantin Belousov
280e50461a amd64 la57_trampoline: save registers in memory
AMD64 ARM states that 64bit part of the architectural state is undefined
after 32<->64 mode switching.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:12:25 +03:00
Konstantin Belousov
687b896f8e amd64 la57_trampoline: lgdt descriptor is always 10 bytes in long mode
Extend its storage to be compliant.
This is currently nop due to padding and nul gdt descriptor right after
the lgdt descriptor.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:11:54 +03:00
Konstantin Belousov
1be58e67eb amd64 la57_trampoline: turn off global pages and PCID before turning off paging
SDM is explicit that having CR4.PCID=1 while toggling CR3.PG causes #GP.
To be safe and to avoid some more effects, also turn off CR4.PGE.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:10:16 +03:00
Konstantin Belousov
b7ea2b69ef amd64 la57_trampoline: disable EFER.LME around setting CR4.LA57
Changing paging mode while LME is set seems to be not allowed.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	jThe FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:09:38 +03:00
Konstantin Belousov
9a49c98baf amd64 la57_trampoline: stop using %rdx to remember original %cr0
Store %cr0 in %ebp.  %rdx is needed for MSR access.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:09:20 +03:00
Konstantin Belousov
180c8ab079 amd64 la57_trampoline: jump immediately after re-enabling paging
Literally follow requirements from SDM and execute jmp right after
%cr0 CR0_PG bit is toggled back.

Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:09:03 +03:00
Konstantin Belousov
787259bfe5 amd64 pmap: flush whole TLB after LA57 trampoline is installed
Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:08:53 +03:00
Konstantin Belousov
2912c2fbd4 amd64 pmap: be more verbose around entering and leaving LA57 trampoline
Sponsored by:	Advanced Micro Devices (AMD)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2024-09-16 11:08:53 +03:00
Doug Moore
8aa2cd9d13 rangeset: speed up range traversal
For rangeset-next search, use exact search rather than greater-than search.

Move a bit of the testing logic from the pmap code to the common rangeset code.

Reviewed by:	kib (previous version)
Tested by:	pho (previous version)
Differential Revision:	https://reviews.freebsd.org/D46314
2024-09-09 16:50:14 -05:00
Wuyang Chung
5d889e60c1 amd64: move the right parenthesis to the right place
Reviewed by: imp, emaste
Pull Request: https://github.com/freebsd/freebsd-src/pull/1356
2024-09-06 12:34:31 -06:00
Mark Johnston
133a513ddc vmm: Make vmm_dev.h more self-contained
vmm.h is required for VM_MAX_SUFFIXLEN.  vmm_snapshot.h is required for
struct vm_snapshot_meta.

This is a prerequisite for including vmm_dev.h in the headers parsed by
libsysdecode.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D46485
2024-09-01 14:03:15 +00:00
Mark Johnston
a852dc580c vmm: Harmonize compat ioctl definitions
For compat ioctls and structures, we use a mix of suffixes: _old,
_fbsd<version>, _<version>.  Standardize on _<version> to make things
more consistent.  No functional change intended.

Reported by:	jhb
Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D46449
2024-08-28 19:12:32 +00:00
Mark Johnston
e12b6aaf0d vmm: Move compat ioctl definitions to vmm_dev.c
There is no reason to keep them in vmm_dev.h.  No functional change
intended.

Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D46432
2024-08-26 18:42:13 +00:00
Mark Johnston
b9ef152bec vmm: Merge vmm_dev.c
This file contains the vmm device file implementation.  Most of this
code is not machine-dependent and so shouldn't be duplicated this way.
Move most of it into a generic dev/vmm/vmm_dev.c.  This will make it
easier to introduce a cdev-based interface for VM creation, which in
turn makes it possible to implement support for running bhyve as an
unprivileged user.

Machine-dependent ioctls continue to be handled in machine-dependent
code.  To make the split a bit easier to handle, introduce a pair of
tables which define MI and MD ioctls.  Each table entry can set flags
which determine which locks need to be held in order to execute the
handler.  vmmdev_ioctl() now looks up the ioctl in one of the tables,
acquires locks and either handles the ioctl directly or calls
vmmdev_machdep_ioctl() to handle it.

No functional change intended.  There is a lot of churn in this change
but the underlying logic in the ioctl handlers is the same.  For now,
vmm_dev.h is still mostly separate, even though some parts could be
merged in principle.  This would involve changing include paths for
userspace, though.

Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D46431
2024-08-26 18:41:39 +00:00
Mark Johnston
3df92c9728 vmm: Enable assertions in vmmdev_lookup()
The comment has been there since the initial import of the vmm code
and presumably reflected some kind of problem with standalone builds of
vmm.ko.  However, I don't see any problems with it, and mtx_assert() is
used elsewhere within the vmm code.

Reviewed by:	jhb
Differential Revision:	https://reviews.freebsd.org/D46438
2024-08-26 18:41:23 +00:00
Mark Johnston
93e81baa1c vmm: Move duplicated stats code into a generic file
There is a small difference between the arm64 and amd64 implementations:
the latter makes use of a "scope" to exclude AMD-specific stats on Intel
systems and vice-versa.  Replace this with a more generic predicate
callback which can be used for the same purpose.

No functional change intended.

Reviewed by:	corvink, jhb
Differential Revision:	https://reviews.freebsd.org/D46430
2024-08-26 18:41:14 +00:00
Mark Johnston
3ccb02334b vmm: Move vmm_ktr.h to a common directory
No functional change intended.

Reviewed by:	corvink, jhb, emaste
Differential Revision:	https://reviews.freebsd.org/D46429
2024-08-26 18:41:05 +00:00
John Baldwin
776cd02b89 vmm ppt: Enable busmastering and BAR decoding while a device is assigned
Reviewed by:	corvink, markj
Fixes:		f44ff2aba2 bhyve: Treat the COMMAND register for PCI passthru devices as emulated
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D46245
2024-08-22 14:40:48 -04:00
Konstantin Belousov
47656cc1ef amd64: use INVLPGB for kernel pmap invalidations
avoiding broadcast IPIs.

Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45191
2024-08-21 19:35:15 +03:00
Konstantin Belousov
bc4ffcadf2 amd64: add variables indicating INVLPGB works
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45191
2024-08-21 19:35:07 +03:00
Konstantin Belousov
111c7fc2fe amd64: add convenience wrappers for INVLPGB and TBLSYNC
Reviewed by:	alc, markj
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D45191
2024-08-21 19:34:59 +03:00
Warner Losh
ce7fac64ba Revert "nvme: Separate total failures from I/O failures"
All kinds of crazy stuff was mixed into this commit. Revert
it and do it again.

This reverts commit d5507f9e43.

Sponsored by:		Netflix
2024-08-15 21:29:53 -06:00
Warner Losh
d5507f9e43 nvme: Separate total failures from I/O failures
When it's a I/O failure, we can still send admin commands. Separate out
the admin failures and flag them as such so that we can still send admin
commands on half-failed drives.

Fixes: 9229b3105d (nvme: Fail passthrough commands right away in failed state)
Sponsored by: Netflix
2024-08-15 20:22:18 -06:00
Bojan Novković
ddc09a10ea pmap_growkernel: Use VM_ALLOC_NOFREE when allocating pagetable pages
This patch modifies pmap_growkernel in all pmaps to use VM_ALLOC_NOFREE
when allocating new pagetable pages. This should help reduce longterm
fragmentation as these pages are never released after
they are allocated.

Differential Revision:	https://reviews.freebsd.org/D45998
Reviewed by:	alc, markj, kib, mhorne
Tested by:	alc
2024-07-30 17:38:24 +02:00
Mark Johnston
ba682f8b9b vm: Remove kernel stack swapping support, part 5
- Remove cpu_thread_swapin() and cpu_thread_swapout().

Tested by:	pho
Reviewed by:	alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D46116
2024-07-29 01:40:39 +00:00
Bjoern A. Zeeb
d1bdc2821f Deprecate contigfree(9) in favour of free(9)
As of 9e6544dd6e contigfree(9) is no longer
needed and should not be used anymore.  We leave a wrapper for 3rd party
code in at least 15.x but remove (almost) all other cases from the tree.

This leaves one use of contigfree(9) untouched; that was the original
trigger for 9e6544dd6e and is handled in D45813 (to be committed
seperately later).

Sponsored by:	The FreeBSD Foundation
Reviewed by:	markj, kib
Tested by:	pho (10h stress test run)
Differential Revision: https://reviews.freebsd.org/D46099
2024-07-26 10:45:01 +00:00
Alan Cox
5b8c01d13a amd64 pmap: Optimize PKU lookups when creating superpage mappings
Modify pmap_pkru_same() to update the prototype PTE at the same time as
checking the address range.  This eliminates the need for calling
pmap_pkru_get() in addition to pmap_pkru_same().  pmap_pkru_same() was
already doing most of the work of pmap_pkru_get().

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D46135
2024-07-26 00:38:46 -05:00
Jessica Clarke
8415a654d0 Retire non-NEW_PCIB code and remove config option
All architectures enable NEW_PCIB in DEFAULTS (arm being the most recent
to do so in 121be55599 (arm: Set NEW_PCIB in DEFAULTS rather than a
subset of kernel configs")), so it's time we removed the legacy code
that no longer sees much testing and has a significant maintenance
burden.

Reviewed by:	jhb, andrew, emaste
Differential Revision:	https://reviews.freebsd.org/D32954
2024-07-18 18:55:12 +01:00