For current architectures, these are just aliases for the existing
operation on the relevant scalar integer.
Reviewed by: imp, kib
Obtained from: CheriBSD
Sponsored by: AFRL, DARPA
Differential Revision: https://reviews.freebsd.org/D47631
vmmops_modinit() needs to create a device file, and this must happen
after SI_SUB_DEVFS. On non-EARLY_AP_STARTUP platforms (i.e., !x86) this
happens already by accident, but we shouldn't rely on it.
On riscv, remove the current SI_SUB_SMP ordering since that was copied
from arm64 and isn't needed. In particular, riscv's vmmops_modinit()
does not call smp_rendezvous().
Reported by: Oleksandr Kryvulia <shuriku@shurik.kiev.ua>
Fixes: a97f683fe3 ("vmm: Add a device file interface for creating and destroying VMs")
This supersedes the sysctl interface, which has the limitations of being
root-only and not supporting automatic resource destruction, i.e., we
cannot easily destroy VMs automatically when bhyve terminates.
For now, two ioctls are implemented VMMCTL_VM_CREATE and
VMMCTL_VM_DESTROY. Eventually I would like to support tying a VM's
lifetime to that of the descriptor, so that it is automatically
destroyed when the descriptor is closed. However, this will require
some work in bhyve: when the guest wants to reboot, bhyve exits with a
status that indicates that it is to be restarted. This is incompatible
with the idea of tying a VM's lifetime to that of a descriptor, since we
want to avoid creating and destroying a VM across each reboot (as this
involves freeing all of the guest memory, among other things). One
possible design would be to decompose bhyve into two processes, a parent
which handles reboots, and a child which runs in capability mode and
handles guest execution.
In any case, this gets us closer to addressing the shortcomings
mentioned above.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D47028
To avoid a conflict with the new amdiommu driver imported recently.
Fixes: 0f5116d7ef ("AMD IOMMU driver")
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D47415
None of these drivers are for bus devices, so bus_generic_* is not
appropriate. Most of these were nops except that detach would
actually "succeed" (but not do any cleanup).
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D47374
Load vcpu with acquire semantics as we are making a critical code
section between creating vcpu and using it.
Tested on risc-v only.
Pointed out by: markj
Reviewed by: jhb, markj
Differential Revision: https://reviews.freebsd.org/D47306
If the guest VM emits the exit code VM_EXITCODE_DB the kernel will
execute the function named vm_handle_db.
If the value of rsp is not page aligned and if rsp+sizeof(uint64_t)
spans across two pages, the function vm_copy_setup will need two structs
vm_copyinfo to prepare the copy operation.
For instance is rsp value is 0xFFC, two vm_copyinfo objects are needed:
* address=0xFFC, len=4
* address=0x1000, len=4
The vulnerability was addressed by commit 51fda658ba ("vmm: Properly
handle writes spanning across two pages in vm_handle_db"). Still,
replace the KASSERT with an error return as a more defensive approach.
Reported by: Synacktiv
Reviewed by markj, emaste
Security: HYP-09
Sponsored by: The Alpha-Omega Project
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46133
The vm_handle_db function is responsible for writing correct status
register values into memory when a guest VM is being single-stepped
using the RFLAGS.TF mechanism. However, it currently does not properly
handle an edge case where the resulting write spans across two pages.
This commit fixes this by making vm_handle_db use two vm_copy_info
structs.
Security: HYP-09
Reviewed by: markj
In case of an error in a code pattern like
```
uint64_t val;
error = memread(vcpu, gpa, &val, 1, arg);
error = vie_update_register(vcpu, reg, val, size);
```
uninitialized stack data would be used.
Reported by: Synacktiv
Reviewed by: markj
Security: HYP-21
Sponsored by: The Alpha-Omega Project
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D46107
With the new 32-bit UEFI loader, it's convenient to have a sysctl to
figure out how we booted. Can be accessed at machdep.efi_arch
Reviewed by: imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/1098
making LA48 processes have the same limit as with the pre-LA57 kernels.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
after the trip through protected mode. This is required by AMD64 ARM.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The bit is reserved for PLM5, causing #PF on KVA access on real
hardware, unlike QEMU.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
AMD64 ARM states that 64bit part of the architectural state is undefined
after 32<->64 mode switching.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Extend its storage to be compliant.
This is currently nop due to padding and nul gdt descriptor right after
the lgdt descriptor.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
SDM is explicit that having CR4.PCID=1 while toggling CR3.PG causes #GP.
To be safe and to avoid some more effects, also turn off CR4.PGE.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Changing paging mode while LME is set seems to be not allowed.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: jThe FreeBSD Foundation
MFC after: 1 week
Literally follow requirements from SDM and execute jmp right after
%cr0 CR0_PG bit is toggled back.
Sponsored by: Advanced Micro Devices (AMD)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
For rangeset-next search, use exact search rather than greater-than search.
Move a bit of the testing logic from the pmap code to the common rangeset code.
Reviewed by: kib (previous version)
Tested by: pho (previous version)
Differential Revision: https://reviews.freebsd.org/D46314
vmm.h is required for VM_MAX_SUFFIXLEN. vmm_snapshot.h is required for
struct vm_snapshot_meta.
This is a prerequisite for including vmm_dev.h in the headers parsed by
libsysdecode.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D46485
For compat ioctls and structures, we use a mix of suffixes: _old,
_fbsd<version>, _<version>. Standardize on _<version> to make things
more consistent. No functional change intended.
Reported by: jhb
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46449
There is no reason to keep them in vmm_dev.h. No functional change
intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46432
This file contains the vmm device file implementation. Most of this
code is not machine-dependent and so shouldn't be duplicated this way.
Move most of it into a generic dev/vmm/vmm_dev.c. This will make it
easier to introduce a cdev-based interface for VM creation, which in
turn makes it possible to implement support for running bhyve as an
unprivileged user.
Machine-dependent ioctls continue to be handled in machine-dependent
code. To make the split a bit easier to handle, introduce a pair of
tables which define MI and MD ioctls. Each table entry can set flags
which determine which locks need to be held in order to execute the
handler. vmmdev_ioctl() now looks up the ioctl in one of the tables,
acquires locks and either handles the ioctl directly or calls
vmmdev_machdep_ioctl() to handle it.
No functional change intended. There is a lot of churn in this change
but the underlying logic in the ioctl handlers is the same. For now,
vmm_dev.h is still mostly separate, even though some parts could be
merged in principle. This would involve changing include paths for
userspace, though.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46431
The comment has been there since the initial import of the vmm code
and presumably reflected some kind of problem with standalone builds of
vmm.ko. However, I don't see any problems with it, and mtx_assert() is
used elsewhere within the vmm code.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D46438
There is a small difference between the arm64 and amd64 implementations:
the latter makes use of a "scope" to exclude AMD-specific stats on Intel
systems and vice-versa. Replace this with a more generic predicate
callback which can be used for the same purpose.
No functional change intended.
Reviewed by: corvink, jhb
Differential Revision: https://reviews.freebsd.org/D46430
When it's a I/O failure, we can still send admin commands. Separate out
the admin failures and flag them as such so that we can still send admin
commands on half-failed drives.
Fixes: 9229b3105d (nvme: Fail passthrough commands right away in failed state)
Sponsored by: Netflix
This patch modifies pmap_growkernel in all pmaps to use VM_ALLOC_NOFREE
when allocating new pagetable pages. This should help reduce longterm
fragmentation as these pages are never released after
they are allocated.
Differential Revision: https://reviews.freebsd.org/D45998
Reviewed by: alc, markj, kib, mhorne
Tested by: alc
As of 9e6544dd6e contigfree(9) is no longer
needed and should not be used anymore. We leave a wrapper for 3rd party
code in at least 15.x but remove (almost) all other cases from the tree.
This leaves one use of contigfree(9) untouched; that was the original
trigger for 9e6544dd6e and is handled in D45813 (to be committed
seperately later).
Sponsored by: The FreeBSD Foundation
Reviewed by: markj, kib
Tested by: pho (10h stress test run)
Differential Revision: https://reviews.freebsd.org/D46099
Modify pmap_pkru_same() to update the prototype PTE at the same time as
checking the address range. This eliminates the need for calling
pmap_pkru_get() in addition to pmap_pkru_same(). pmap_pkru_same() was
already doing most of the work of pmap_pkru_get().
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D46135
All architectures enable NEW_PCIB in DEFAULTS (arm being the most recent
to do so in 121be55599 (arm: Set NEW_PCIB in DEFAULTS rather than a
subset of kernel configs")), so it's time we removed the legacy code
that no longer sees much testing and has a significant maintenance
burden.
Reviewed by: jhb, andrew, emaste
Differential Revision: https://reviews.freebsd.org/D32954