Xen PVHVM guest.
Submitted by: Roger Pau Monné
Sponsored by: Citrix Systems R&D
Reviewed by: gibbs
Approved by: re (blanket Xen)
MFC after: 2 weeks
sys/amd64/amd64/mp_machdep.c:
sys/i386/i386/mp_machdep.c:
- Make sure that are no MMU related IPIs pending on migration.
- Reset pending IPI_BITMAP on resume.
- Init vcpu_info on resume.
sys/amd64/include/intr_machdep.h:
sys/i386/include/intr_machdep.h:
sys/x86/acpica/acpi_wakeup.c:
sys/x86/x86/intr_machdep.c:
sys/x86/isa/atpic.c:
sys/x86/x86/io_apic.c:
sys/x86/x86/local_apic.c:
- Add a "suspend_cancelled" parameter to pic_resume(). For the
Xen PIC, restoration of interrupt services differs between
the aborted suspend and normal resume cases, so we must provide
this information.
sys/dev/acpica/acpi_timer.c:
sys/dev/xen/timer/timer.c:
sys/timetc.h:
- Don't swap out "suspend safe" timers across a suspend/resume
cycle. This includes the Xen PV and ACPI timers.
sys/dev/xen/control/control.c:
- Perform proper suspend/resume process for PVHVM:
- Suspend all APs before going into suspension, this allows us
to reset the vcpu_info on resume for each AP.
- Reset shared info page and callback on resume.
sys/dev/xen/timer/timer.c:
- Implement suspend/resume support for the PV timer. Since FreeBSD
doesn't perform a per-cpu resume of the timer, we need to call
smp_rendezvous in order to correctly resume the timer on each CPU.
sys/dev/xen/xenpci/xenpci.c:
- Don't reset the PCI interrupt on each suspend/resume.
sys/kern/subr_smp.c:
- When suspending a PVHVM domain make sure there are no MMU IPIs
in-flight, or we will get a lockup on resume due to the fact that
pending event channels are not carried over on migration.
- Implement a generic version of restart_cpus that can be used by
suspended and stopped cpus.
sys/x86/xen/hvm.c:
- Implement resume support for the hypercall page and shared info.
- Clear vcpu_info so it can be reset by APs when resuming from
suspension.
sys/dev/xen/xenpci/xenpci.c:
sys/x86/xen/hvm.c:
sys/x86/xen/xen_intr.c:
- Support UP kernel configurations.
sys/x86/xen/xen_intr.c:
- Properly rebind per-cpus VIRQs and IPIs on resume.
A warning is emitted again if the temperature became briefly valid
meanwhile. This avoids spamming the user when the sensor is broken.
Other values (ie. not _TMP) always raise a warning.
settings for ACPI-enumerated serial ports by forcing any IRQs that use
an ISA IRQ value with these settings to active-high instead of active-low.
This is known to occur with the BIOS on an Intel D2500CCE motherboard.
Tested by: Robert Ames <robertames@hotmail.com>, lev
Submitted by: Juergen Weiss weiss at uni-mainz.de (original patch)
we are probing a PCI-PCI bridge it is because we found one by enumerating
the devices on a PCI bus, so the bridge is definitely present. A few
BIOSes report incorrect status (_STA) for some bridges that claimed they
were not present when in fact they were.
While here, move this check earlier for Host-PCI bridges so attach fails
before doing any work that needs to be torn down.
PR: kern/91594
Tested by: Jack Vogel @ Intel
MFC after: 1 week
that uses non-ISA IRQs but use a plain IRQ resource in _CRS. However,
a non-ISA IRQ can't fit into a plain IRQ resource. If we encounter a
link like this, build the resource buffer from _PRS instead of _CRS.
- Set the correct size of the end tag in a resource buffer.
Tested by: Benjamin Lee <ben@b1c1l1.com>
MFC after: 2 weeks
Switch eventtimers(9) from using struct bintime to sbintime_t.
Even before this not a single driver really supported full dynamic range of
struct bintime even in theory, not speaking about practical inexpediency.
This change legitimates the status quo and cleans up the code.
When CPU becomes idle, cpu_idleclock() calculates time to the next timer
event in order to reprogram hw timer. Return that time in sbintime_t to
the caller and pass it to acpi_cpu_idle(), where it can be used as one
more factor (quite precise) to extimate furter sleep time and choose
optimal sleep state. This is a preparatory change for further callout
improvements will be committed in the next days.
The commmit is not targeted for MFC.
This hack is picked up from Linux, which claims that it follows
Windows behavior.
PR: amd64/174409
Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>,
KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>,
Slawa Olhovchenkov <slw@zxy.spb.ru>
MFC after: 13 days
ASUS P8Z77-V board reports _AC2, _AC3 and _AC4 setpoints as 0C. With active
cooling already automatically set to _AC2, that still caused driver to print
two useless lines about temperature above _AC3 and _AC4 every ten seconds.
Three setponts of 0C is probably a board bug, but the same spam could happen
also in correct case if system is runnign not with the lowest cooling level.
... to avoid any races or inconsistencies.
This should fix a regression introduced in r243404.
Also, remove a stale comment that has not been true for quite a while
now.
Pointyhat to: avg
Teested by: trociny, emaste, dumbbell (earlier version)
MFC after: 1 week
... instead of the ever increasing ones.
Also, do free old resources when allocating new ones when cx states
change.
Tested by: Tom Lislegaard <Tom.Lislegaard@proact.no>
Obtained from: jkim
MFC after: 1 week
This alleviates issues on newer Sandy/Ivy Bridge gear that seems to require
boatloads more ACPI resources than before.
Reviewed by: avg@
Obtained from: Yahoo! Inc.
MFC after: 2 weeks
parent adapter's _DOD list, only check the low 16 bits of both _ADR and
_DOD. The language in the ACPI spec seems to indicate that the _ADR values
should exactly match the entries in _DOD. However, I assume that the
masking added to _DOD values was added to work around some known busted
machines (the commit history doesn't indicate either way), and the ACPI
spec does require that the low 16 bits are unique for all video outputs,
so only check the low 16 bits should be fine.
This fixes recognition of video outputs that use the new standardized
device ID scheme in ACPI 3.0 that set bit 31 such as certain Dell laptops.
Tested by: Juergen Lock nox jelal kn-bremen de
MFC after: 3 days
(1022) in HPET. But according to report they still haven't fixed problem
with level-triggered interrupts.
Make workaround used for earlier chipsets apply to this new ID also.
PR: amd64/171355
MFC after: 3 days
For C1 and C2 states use cpu_ticks() to measure sleep time instead of much
slower ACPI timer. We can't do it for C3, as TSC may stop there. But it is
less important there as wake up latency is high any way.
For C1 and C2 states do not check/clear bus mastering activity status, as
it is important only for C3. As side effect it can make CPU enter C2 instead
of C3 if last BM activity was two sleeps back (unlike one before), but
that may be even good because of collecting more statistics. Premature BM
wakeup from C3, entered because of overestimation, can easily be worse then
entering C2 from both performance and power consumption points of view.
Together on dual Xeon E5645 system on sequential 512 bytes read test this
change makes cpu_idle_acpi() as fast as simplest cpu_idle_hlt() and only
few percents slower then cpu_idle_mwait(), while deeper states are still
actively used during idle periods.
To help with diagnostics, add C-state type into dev.cpu.X.cx_supported.
Sponsored by: iXsystems, Inc.
... from a user-set persistent limit on the said level.
Allow to set the user-imposed limit below current deepest available level
as the available levels may be dynamically changed by ACPI platform
in both directions.
Allow "Cmax" as an input value for cx_lowest sysctls to mean that there
is not limit and OS can use all available C-states.
Retire global cpu_cx_count as it no longer serves any meaningful
purpose.
Reviewed by: jhb, gianni, sbruno
Tested by: sbruno, Vitaly Magerya <vmagerya@gmail.com>
MFC after: 2 weeks
although by default only C1 is enabled (cx_lowest=0) and enabling deeper
states goes through acpi_cpu_set_cx_lowest which re-evaluates cpu_non_c3
MFC after: 2 weeks
cpu_non_c3 is already evaluated in acpi_cpu_cx_cst and in
acpi_cpu_set_cx_lowest.
Besides acpi_cpu_cx_list is not protected by any locking.
As a result also move setting of cpu_can_deep_sleep to more appropriate
places.
MFC after: 2 weeks
Adjust power_profile script to handle the new world order as well.
Some vendors are opting out of a C2 state and only defining C1 & C3. This
leads the acpi_cpu display to indicate that the machine supports C1 & C2
which is caused by the (mis)use of the index of the cx_state array as the
ACPI_STATE_CX value.
e.g. the code was pretending that cx_state[i] would
always convert to i by subtracting 1.
cx_state[2] == ACPI_STATE_C3
cx_state[1] == ACPI_STATE_C2
cx_state[0] == ACPI_STATE_C1
however, on certain machines this would lead to
cx_state[1] == ACPI_STATE_C3
cx_state[0] == ACPI_STATE_C1
This didn't break anything but led to a display of:
* dev.cpu.0.cx_supported: C1/1 C2/96
Instead of
* dev.cpu.0.cx_supported: C1/1 C3/96
MFC after: 2 weeks
suspend/resume procedures are minimized among them.
common:
- Add global cpuset suspended_cpus to indicate APs are suspended/resumed.
- Remove acpi_waketag and acpi_wakemap from acpivar.h (no longer used).
- Add some variables in acpi_wakecode.S in order to minimize the difference
among amd64 and i386.
- Disable load_cr3() because now CR3 is restored in resumectx().
amd64:
- Add suspend/resume related members (such as MSR) in PCB.
- Modify savectx() for above new PCB members.
- Merge acpi_switch.S into cpu_switch.S as resumectx().
i386:
- Merge(and remove) suspendctx() into savectx() in order to match with
amd64 code.
Reviewed by: attilio@, acpi@