1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-24 11:29:10 +00:00
Commit Graph

201 Commits

Author SHA1 Message Date
Neel Natu
a7424861fb Check for alignment check violation when processing in/out string instructions. 2014-05-23 19:59:14 +00:00
Neel Natu
d17b5104a9 Add emulation of the "outsb" instruction. NetBSD guests use this to write to
the UART FIFO.

The emulation is constrained in a number of ways: 64-bit only, doesn't check
for all exception conditions, limited to i/o ports emulated in userspace.

Some of these constraints will be relaxed in followup commits.

Requested by:	grehan
Reviewed by:	tychon (partially and a much earlier version)
2014-05-23 05:15:17 +00:00
Neel Natu
c5e423dd2e A Centos 6.4 guest will write 0xff to the 8259 mask register before beginning
the proper ICWx initialization sequence. It assumes, probably correctly, that
the boot firmware has done the 8259 initialization.

Since grub-bhyve does not initialize the 8259 this write to the mask register
takes a code path in which 'error' remains uninitialized (ready=0,icw_num=0).

Fix this by initializing 'error' at the start of the function.
2014-05-23 05:04:50 +00:00
Neel Natu
ba6f5e23cc Allow vmx_getdesc() and vmx_setdesc() to be called for a vcpu that is in the
VCPU_RUNNING state. This will let the VMX exit handler inspect the vcpu's
segment descriptors without having to exit the critical section.
2014-05-22 17:22:37 +00:00
Neel Natu
fd949af642 Inject page fault into the guest if the page table walker detects an invalid
translation for the guest linear address.
2014-05-22 03:14:54 +00:00
Neel Natu
f888763dd8 Add PG_RW check when translating a guest linear to guest physical address.
Set the accessed and dirty bits in the page table entry. If it fails then
restart the page table walk from the beginning. This might happen if another
vcpu modifies the page tables simultaneously.

Reviewed by:	alc, kib
2014-05-20 20:30:28 +00:00
Neel Natu
e4c8a13d61 Add PG_U (user/supervisor) checks when translating a guest linear address
to a guest physical address.

PG_PS (page size) field is valid only in a PDE or a PDPTE so it is now
checked only in non-terminal paging entries.

Ignore the upper 32-bits of the CR3 for PAE paging.
2014-05-19 03:50:07 +00:00
Peter Grehan
897bb47e7b Make the vmx asm code dtrace-fbt-friendly by
- inserting frame enter/leave sequences
 - restructuring the vmx_enter_guest routine so that it subsumes
   the vm_exit_guest block, which was the #vmexit RIP and not a
   callable routine.

Reviewed by:	neel
MFC after:	3 weeks
2014-05-18 03:50:17 +00:00
John Baldwin
b3e9732a76 Implement a PCI interrupt router to route PCI legacy INTx interrupts to
the legacy 8259A PICs.
- Implement an ICH-comptabile PCI interrupt router on the lpc device with
  8 steerable pins configured via config space access to byte-wide
  registers at 0x60-63 and 0x68-6b.
- For each configured PCI INTx interrupt, route it to both an I/O APIC
  pin and a PCI interrupt router pin.  When a PCI INTx interrupt is
  asserted, ensure that both pins are asserted.
- Provide an initial routing of PCI interrupt router (PIRQ) pins to
  8259A pins (ISA IRQs) and initialize the interrupt line config register
  for the corresponding PCI function with the ISA IRQ as this matches
  existing hardware.
- Add a global _PIC method for OSPM to select the desired interrupt routing
  configuration.
- Update the _PRT methods for PCI bridges to provide both APIC and legacy
  PRT tables and return the appropriate table based on the configured
  routing configuration.  Note that if the lpc device is not configured, no
  routing information is provided.
- When the lpc device is enabled, provide ACPI PCI link devices corresponding
  to each PIRQ pin.
- Add a VMM ioctl to adjust the trigger mode (edge vs level) for 8259A
  pins via the ELCR.
- Mark the power management SCI as level triggered.
- Don't hardcode the number of elements in Packages in the source for
  the DSDT.  iasl(8) will fill in the actual number of elements, and
  this makes it simpler to generate a Package with a variable number of
  elements.

Reviewed by:	tycho
2014-05-15 14:16:55 +00:00
Neel Natu
055fc2cb5e Virtual machine halt detection is turned on by default. Allow it to be
disabled via the tunable 'hw.vmm.halt_detection'.
2014-05-05 16:19:24 +00:00
Neel Natu
e50ce2aa06 Add logic in the HLT exit handler to detect if the guest has put all vcpus
to sleep permanently by executing a HLT with interrupts disabled.

When this condition is detected the guest with be suspended with a reason of
VM_SUSPEND_HALT and the bhyve(8) process will exit.

Tested by executing "halt" inside a RHEL7-beta guest.

Discussed with:	grehan@
Reviewed by:	jhb@, tychon@
2014-05-02 00:33:56 +00:00
Neel Natu
2cb97c9dd6 Ignore writes to microcode update MSR. This MSR is accessed by RHEL7 guest.
Add KTR tracepoints to annotate wrmsr and rdmsr VM exits.
2014-04-30 02:08:27 +00:00
Neel Natu
c6a0cc2e21 Some Linux guests will implement a 'halt' by disabling the APIC and executing
the 'HLT' instruction. This condition was detected by 'vm_handle_hlt()' and
converted into the SPINDOWN_CPU exitcode . The bhyve(8) process would exit
the vcpu thread in response to a SPINDOWN_CPU and when the last vcpu was
spun down it would reset the virtual machine via vm_suspend(VM_SUSPEND_RESET).

This functionality was broken in r263780 in a way that made it impossible
to kill the bhyve(8) process because it would loop forever in
vm_handle_suspend().

Unbreak this by removing the code to spindown vcpus. Thus a 'halt' from
a Linux guest will appear to be hung but this is consistent with the
behavior on bare metal. The guest can be rebooted by using the bhyvectl
options '--force-reset' or '--force-poweroff'.

Reviewed by:	grehan@
2014-04-29 18:42:56 +00:00
Neel Natu
f0fdcfe247 Allow a virtual machine to be forcibly reset or powered off. This is done
by adding an argument to the VM_SUSPEND ioctl that specifies how the virtual
machine should be suspended, viz. VM_SUSPEND_RESET or VM_SUSPEND_POWEROFF.

The disposition of VM_SUSPEND is also made available to the exit handler
via the 'u.suspended' member of 'struct vm_exit'.

This capability is exposed via the '--force-reset' and '--force-poweroff'
arguments to /usr/sbin/bhyvectl.

Discussed with:	grehan@
2014-04-28 22:06:40 +00:00
Neel Natu
63c9389af6 A VMCS is always inactive when it exits the vmx_run() loop.
Remove redundant code and the misleading comment that suggest otherwise.

Reviewed by:	grehan@
2014-04-26 22:37:56 +00:00
Peter Grehan
8d1d7a9e5a Allow the guest to read the TSC via MSR 0x10.
NetBSD/amd64 does this, as does Linux on AMD CPUs.

Reviewed by:	neel
MFC after:	3 weeks
2014-04-24 00:27:34 +00:00
Neel Natu
c5d216b786 Change the vlapic timer frequency to be in the ballpark of contemporary
hardware. This also decouples the vlapic emulation from the host's TSC
frequency.

Requested by:	grehan@
2014-04-23 16:50:40 +00:00
Tycho Nightingale
82c2c89084 Factor out common ioport handler code for better hygiene -- pointed
out by neel@.

Approved by:	neel (co-mentor)
2014-04-22 16:13:56 +00:00
Tycho Nightingale
c46ff7fa0b Add support for the PIT 'readback' command -- based on a patch by grehan@.
Approved by:	grehan (co-mentor)
2014-04-18 16:05:12 +00:00
Tycho Nightingale
d6aa08c3ef Respect the destination operand size of the 'Input from Port' instruction.
Approved by:	grehan (co-mentor)
2014-04-18 15:22:56 +00:00
Tycho Nightingale
79d6ca331e Add support for reading the PIT Counter 2 output signal via the NMI
Status and Control register at port 0x61.

Be more conservative about "catching up" callouts that were supposed
to fire in the past by skipping an interrupt if it was
scheduled too far in the past.

Restore the PIT ACPI DSDT entries and add an entry for NMISC too.

Approved by:	neel (co-mentor)
2014-04-18 00:02:06 +00:00
John Baldwin
380174cbe4 Don't spindown the BSP if it executes hlt with the APIC disabled. A
guest that doesn't use the APIC at all can trigger this, plus the BSP
always needs to execute as it should trigger a reset, etc.

Reviewed by:	tychon
2014-04-15 20:53:53 +00:00
Tycho Nightingale
54f6330515 Local APIC access via 32-bit naturally-aligned loads is merely
suggested in the SDM.  Since some OSes have implemented otherwise
don't be too rigorous in enforcing it.

Approved by:	grehan (co-mentor)
2014-04-15 17:06:26 +00:00
Tycho Nightingale
1354571279 Add support for emulating the byte move and sign extend instructions:
"movsx r/m8, r32" and "movsx r/m8, r64".

Approved by:	grehan (co-mentor)
2014-04-15 15:11:10 +00:00
Tycho Nightingale
b96be57a2d Add support for emulating the slave PIC.
Reviewed by:	grehan, jhb
Approved by:	grehan (co-mentor)
2014-04-14 19:00:20 +00:00
Neel Natu
81d597b736 There is no need to save and restore the host's return address in the
'struct vmxctx'. It is preserved on the host stack across a guest entry
and exit and just restoring the host's '%rsp' is sufficient.

Pointed out by:	grehan@
2014-04-11 20:15:53 +00:00
Tycho Nightingale
e0f210e6ef Account for the "plus 1" encoding of the CPUID Function 4 reported
core per package and cache sharing values.

Approved by:	grehan (co-mentor)
2014-04-11 18:19:21 +00:00
Peter Grehan
201b1ccc22 Rework r264179.
- remove redundant code
- remove erroneous setting of the error return
  in vmmdev_ioctl()
- use style(9) initialization
- in vmx_inject_pir(), document the race condition
  that the final conditional statement was detecting,

Tested with both gcc and clang builds.

Reviewed by:	neel
2014-04-10 19:15:58 +00:00
Warner Losh
0e30c5c0b4 Make the vmm code compile with gcc too. Not entirely sure things are
correct for the pirbase test (since I'd have thought we'd need to do
something even when the offset is 0 and that test looks like a
misguided attempt to not use an uninitialized variable), but it is at
least the same as today.
2014-04-05 22:43:23 +00:00
Ryan Stone
a86672509c Re-write bhyve's I/O MMU handling in terms of PCI RID.
Reviewed by:	neel
MFC after:	2 months
Sponsored by:	Sandvine Inc.
2014-04-01 15:54:03 +00:00
Ryan Stone
7036ae46bf Revert PCI RID changes.
My PCI RID changes somehow got intermixed with my PCI ARI patch when I
committed it.  I may have accidentally applied a patch to a non-clean
working tree.  Revert everything while I figure out what went wrong.

Pointy hat to: rstone
2014-04-01 15:06:03 +00:00
Ryan Stone
956ed3830c Re-write bhyve's I/O MMU handling in terms of PCI RIDs
Reviewed by:	neel
Sponsored by:	Sandvine Inc
2014-04-01 14:54:43 +00:00
Neel Natu
b15a09c05e Add an ioctl to suspend a virtual machine (VM_SUSPEND). The ioctl can be called
from any context i.e., it is not required to be called from a vcpu thread. The
ioctl simply sets a state variable 'vm->suspend' to '1' and returns.

The vcpus inspect 'vm->suspend' in the run loop and if it is set to '1' the
vcpu breaks out of the loop with a reason of 'VM_EXITCODE_SUSPENDED'. The
suspend handler waits until all 'vm->active_cpus' have transitioned to
'vm->suspended_cpus' before returning to userspace.

Discussed with:	grehan
2014-03-26 23:34:27 +00:00
Tycho Nightingale
e883c9bb40 Move the atpit device model from userspace into vmm.ko for better
precision and lower latency.

Approved by:	grehan (co-mentor)
2014-03-25 19:20:34 +00:00
Neel Natu
22d822c6b0 When a vcpu is deactivated it must also unblock any rendezvous that may be
blocked on it.

This is done by issuing a wakeup after clearing the 'vcpuid' from 'active_cpus'.
Also, use CPU_CLR_ATOMIC() to guarantee visibility of the updated 'active_cpus'
across all host cpus.
2014-03-18 02:49:28 +00:00
Neel Natu
970955e479 Notify vcpus participating in the rendezvous of the pending event to ensure
that they execute the rendezvous function as soon as possible.
2014-03-17 23:30:38 +00:00
Tycho Nightingale
0775fbb475 Fix a race wherein the source of an interrupt vector is wrongly
attributed if an ExtINT arrives during interrupt injection.

Also, fix a spurious interrupt if the PIC tries to raise an interrupt
before the outstanding one is accepted.

Finally, improve the PIC interrupt latency when another interrupt is
raised immediately after the outstanding one is accepted by creating a
vmexit rather than waiting for one to occur by happenstance.

Approved by:	neel (co-mentor)
2014-03-15 23:09:34 +00:00
Tycho Nightingale
1ed19b835a Don't try to return a vector to a caller that only cares if a vector
is pending or not.

Approved by:	neel (co-mentor)
2014-03-11 22:12:12 +00:00
Tycho Nightingale
762fd20804 Replace the userspace atpic stub with a more functional vmm.ko model.
New ioctls VM_ISA_ASSERT_IRQ, VM_ISA_DEASSERT_IRQ and VM_ISA_PULSE_IRQ
can be used to manipulate the pic, and optionally the ioapic, pin state.

Reviewed by:	jhb, neel
Approved by:	neel (co-mentor)
2014-03-11 16:56:00 +00:00
Neel Natu
ef39d7e910 Fix a race between VMRUN() and vcpu_notify_event() due to 'vcpu->hostcpu'
being updated outside of the vcpu_lock(). The race is benign and could
potentially result in a missed notification about a pending interrupt to
a vcpu. The interrupt would not be lost but rather delayed until the next
VM exit.

The vcpu's hostcpu is now updated concurrently with the vcpu state change.
When the vcpu transitions to the RUNNING state the hostcpu is set to 'curcpu'.
It is set to 'NOCPU' in all other cases.

Reviewed by:	grehan
2014-03-01 03:17:58 +00:00
John Baldwin
ad3e368726 Correct VMware capitalization.
Submitted by:	joeld
2014-02-28 21:33:40 +00:00
John Baldwin
722b6744a7 Workaround an apparent bug in VMWare Fusion's nested VT support where it
triggers a VM exit with the exit reason of an external interrupt but
without a valid interrupt set in the exit interrupt information.

Tested by:	Michael Dexter
Reviewed by:	neel
MFC after:	1 week
2014-02-28 19:07:55 +00:00
Neel Natu
dc50650607 Queue pending exceptions in the 'struct vcpu' instead of directly updating the
processor-specific VMCS or VMCB. The pending exception will be delivered right
before entering the guest.

The order of event injection into the guest is:
- hardware exception
- NMI
- maskable interrupt

In the Intel VT-x case, a pending NMI or interrupt will enable the interrupt
window-exiting and inject it as soon as possible after the hardware exception
is injected. Also since interrupts are inherently asynchronous, injecting
them after the hardware exception should not affect correctness from the
guest perspective.

Rename the unused ioctl VM_INJECT_EVENT to VM_INJECT_EXCEPTION and restrict
it to only deliver x86 hardware exceptions. This new ioctl is now used to
inject a protection fault when the guest accesses an unimplemented MSR.

Discussed with:	grehan, jhb
Reviewed by:	jhb
2014-02-26 00:52:05 +00:00
Neel Natu
159dd56f94 Add support for x2APIC virtualization assist in Intel VT-x.
The vlapic.ops handler 'enable_x2apic_mode' is called when the vlapic mode
is switched to x2APIC. The VT-x implementation of this handler turns off the
APIC-access virtualization and enables the x2APIC virtualization in the VMCS.

The x2APIC virtualization is done by allowing guest read access to a subset
of MSRs in the x2APIC range. In non-root operation the processor will satisfy
an 'rdmsr' access to these MSRs by reading from the virtual APIC page instead.

The guest is also given write access to TPR, EOI and SELF_IPI MSRs which
get special treatment in non-root operation. This is documented in the
Intel SDM section titled "Virtualizing MSR-Based APIC Accesses".

Enforce that APIC-write and APIC-access VM-exits are handled only if
APIC-access virtualization is enabled. The one exception to this is
SELF_IPI virtualization which may result in an APIC-write VM-exit.
2014-02-21 06:03:54 +00:00
Neel Natu
52e5c8a2ec Simplify APIC mode switching from MMIO to x2APIC. In part this is done to
simplify the implementation of the x2APIC virtualization assist in VT-x.

Prior to this change the vlapic allowed the guest to change its mode from
xAPIC to x2APIC. We don't allow that any more and the vlapic mode is locked
when the virtual machine is created. This is not very constraining because
operating systems already have to deal with BIOS setting up the APIC in
x2APIC mode at boot.

Fix a bug in the CPUID emulation where the x2APIC capability was leaking
from the host to the guest.

Ignore MMIO reads and writes to the vlapic in x2APIC mode. Similarly, ignore
MSR accesses to the vlapic when it is in xAPIC mode.

The default configuration of the vlapic is xAPIC. The "-x" option to bhyve(8)
can be used to change the mode to x2APIC instead.

Discussed with:	grehan@
2014-02-20 01:48:25 +00:00
John Baldwin
a0efd3fb34 A first pass at adding support for injecting hardware exceptions for
emulated instructions.
- Add helper routines to inject interrupt information for a hardware
  exception from the VM exit callback routines.
- Use the new routines to inject GP and UD exceptions for invalid
  operations when emulating the xsetbv instruction.
- Don't directly manipulate the entry interrupt info when a user event
  is injected.  Instead, store the event info in the vmx state and
  only apply it during a VM entry if a hardware exception or NMI is
  not already pending.
- While here, use HANDLED/UNHANDLED instead of 1/0 in a couple of
  routines.

Reviewed by:	neel
2014-02-18 03:07:36 +00:00
Neel Natu
294d0d88fc Handle writes to the SELF_IPI MSR by the guest when the vlapic is configured
in x2apic mode. Reads to this MSR are currently ignored but should cause a
general proctection exception to be injected into the vcpu.

All accesses to the corresponding offset in xAPIC mode are ignored.

Also, do not panic the host if there is mismatch between the trigger mode
programmed in the TMR and the actual interrupt being delivered. Instead the
anomaly is logged to aid debugging and to prevent a misbehaving guest from
panicking the host.
2014-02-17 23:07:16 +00:00
Neel Natu
9c43cd07ec Use spinlocks to lock accesses to the vioapic.
This is necessary because if the vlapic is configured in x2apic mode the
vioapic_process_eoi() function is called inside the critical section
established by vm_run().
2014-02-17 22:57:51 +00:00
John Baldwin
abb023fb95 Add virtualized XSAVE support to bhyve which permits guests to use XSAVE and
XSAVE-enabled features like AVX.
- Store a per-cpu guest xcr0 register.  When switching to the guest FPU
  state, switch to the guest xcr0 value.  Note that the guest FPU state is
  saved and restored using the host's xcr0 value and xcr0 is saved/restored
  "inside" of saving/restoring the guest FPU state.
- Handle VM exits for the xsetbv instruction by updating the guest xcr0.
- Expose the XSAVE feature to the guest only if the host has enabled XSAVE,
  and only advertise XSAVE features enabled by the host to the guest.
  This ensures that the guest will only adjust FPU state that is a subset
  of the guest FPU state saved and restored by the host.

Reviewed by:	grehan
2014-02-08 16:37:54 +00:00
Neel Natu
bf73979dd9 Add a counter to differentiate between VM-exits due to nested paging faults
and instruction emulation faults.
2014-02-08 06:22:09 +00:00