freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-24 11:29:10 +00:00

Author	SHA1	Message	Date
Neel Natu	07044a96d8	Increase the number of passthru devices supported by bhyve. The maximum length of an environment variable puts a limitation on the number of passthru devices that can be specified via a single variable. The workaround is to allow user to specify passthru devices via multiple environment variables instead of a single one. Obtained from: NetApp	2013-02-01 01:16:26 +00:00
Neel Natu	8faceb3292	Add emulation support for instruction "88/r: mov r/m8, r8". This instruction moves a byte from a register to a memory location. Tested by: tycho nightingale at pluribusnetworks com	2013-01-30 04:09:09 +00:00
John Baldwin	d825ce0a5d	Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h by moving bits that are MI out into headers in compat/linux. Reviewed by: Chagin Dmitry dmitry \| gmail MFC after: 2 weeks	2013-01-29 18:41:30 +00:00
Peter Grehan	1fb0ea3f1a	Always allow access to the sysenter cs/esp/eip MSRs since they are automatically saved and restored in the VMCS. Reviewed by: neel Obtained from: NetApp	2013-01-25 21:38:31 +00:00
John Baldwin	fb709557a3	Don't assume that all Linux TCP-level socket options are identical to FreeBSD TCP-level socket options (only the first two are). Instead, using a mapping function and fail unsupported options as we do for other socket option levels. MFC after: 2 weeks	2013-01-23 21:44:48 +00:00
Neel Natu	e3f0800bd1	Postpone vmm module initialization until after SMP is initialized - particularly that 'smp_started != 0'. This is required because the VT-x initialization calls smp_rendezvous() to set the CR4_VMXE bit on all the cpus. With this change we can preload vmm.ko from the loader. Reported by: alfred@, sbruno@ Obtained from: NetApp	2013-01-21 01:33:10 +00:00
Neel Natu	912a3e678a	Add svn properties to the recently merged bhyve source files. The pre-commit hook will not allow any commits without the svn:keywords property in head.	2013-01-20 03:42:49 +00:00
Neel Natu	c458fc1ed4	Merge projects/bhyve to head. 'bhyve' was developed by grehan@ and myself at NetApp (thanks!). Special thanks to Peter Snyder, Joe Caradonna and Michael Dexter for their support and encouragement. Obtained from: NetApp	2013-01-19 04:18:52 +00:00
John Baldwin	b5821c6f0e	Fix build with SMP disabled.` Reported by: bf	2013-01-19 01:18:22 +00:00
John Baldwin	f876ffeae3	Don't attempt to use clflush on the local APIC register window. Various CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on Pentium-M, and trashing of the local APIC registers on a VIA C7). The local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't necessary anyway. MFC after: 2 weeks	2013-01-17 21:32:25 +00:00
Neel Natu	c2217b9848	IFC @ r245509	2013-01-17 07:04:37 +00:00
Bryan Venteicher	ae366ffcbd	Add VirtIO to the i386 and amd64 GENERIC kernels This also removes the kludge from r239009 that covered only the network driver. Reviewed by: grehan Approved by: grehan (mentor) MFC after: 1 week	2013-01-13 07:14:16 +00:00
Neel Natu	8a60b77db8	IFC @ r245205	2013-01-09 03:32:23 +00:00
Neel Natu	1b54fbe69d	IFC @ r245178	2013-01-09 02:26:50 +00:00
Neel Natu	95102a8bcb	Add a "pause" to busy wait loops in the cpu reset path. This should not matter much when running on bare metal but it makes the guest more friendly when running inside a virtual machine. Discussed with: jhb Obtained from: NetApp	2013-01-09 02:11:16 +00:00
Neel Natu	03429e45a7	Revert changes for x2apic support from projects/bhyve. During the early days of bhyve it did not support instruction emulation which necessitated the use of x2apic to access the local apic. This is no longer the case and the dependency on x2apic has gone away. The x2apic patches can be considered independently of bhyve and will be merged into head via projects/x2apic. Discussed with: grehan	2013-01-06 05:37:26 +00:00
Neel Natu	2d28bff346	bhyve does not require a custom configuration file anymore so make the GENERIC identical to the one in HEAD. Obtained from: NetApp	2013-01-05 03:35:30 +00:00
Neel Natu	46b1c55d9e	IFC @ r244983.	2013-01-04 19:28:32 +00:00
Neel Natu	23ce7fedb4	There is no need for a special 'BHYVE' kernel configuration file anymore - 'GENERIC' works fine. Obtained from: NetApp	2013-01-04 03:02:43 +00:00
Neel Natu	014a52f3a6	There is no need for 'start_emulating()' and 'stop_emulating()' to be defined in <machine/cpufunc.h> so remove them from there. Obtained from: NetApp	2013-01-04 02:49:12 +00:00
Neel Natu	5f0677d392	The "unrestricted guest" capability is a feature of Intel VT-x that allows the guest to execute real or unpaged protected mode code - bhyve relies on this feature to execute the AP bootstrap code. Get rid of the hack that allowed bhyve to support SMP guests on processors that do not have the "unrestricted guest" capability. This hack was entirely FreeBSD-specific and would not work with any other guest OS. Instead, limit the number of vcpus to 1 when executing on processors without "unrestricted guest" capability. Suggested by: grehan Obtained from: NetApp	2013-01-04 02:04:41 +00:00
Konstantin Belousov	0dcbedfa61	Enable the UFS quotas for big-iron GENERIC kernels. Discussed with: mckusick MFC after: 2 weeks	2013-01-03 19:03:41 +00:00
Dag-Erling Smørgrav	36fca20f10	As discussed on -current last October, remove the firewire drivers from GENERIC.	2013-01-03 14:30:24 +00:00
Neel Natu	485f986ac9	Modify the default behavior of bhyve such that it no longer forces the use of x2apic mode on the guest. The guest can decide whether or not it wants to use legacy mmio or x2apic access to the APIC by writing to the MSR_APICBASE register. Obtained from: NetApp	2012-12-16 01:20:08 +00:00
Neel Natu	682b847ede	Prefer x2apic mode when running inside a virtual machine. Provide a tunable 'machdep.x2apic_desired' to let the administrator override the default behavior. Provide a read-only sysctl 'machdep.x2apic' to let the administrator know whether the kernel is using x2apic or legacy mmio to access local apic. Tested with Parallels Desktop 8 and bhyve hypervisors. Also tested running on bare metal Intel Xeon E5-2658. Obtained from: NetApp Discussed with: jhb, attilio, avg, grehan	2012-12-16 00:57:14 +00:00
Jim Harris	f2fcc434ee	Revert r243960 based on feedback regarding keeping x86 headers unified (mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@). Alternate implementation will be made in a separate commit.	2012-12-13 21:27:20 +00:00
Peter Grehan	2741efeca0	Implement an API to allow a hypervisor to save/restore guest floating point state without having to know the size of floating-point state. Unstaticize fpurestore to allow the hypervisor to save/restore guest state using fpusave/fpurestore on the allocated FPU state area. Reviewed by: kib Obtained from: NetApp/bhyve MFC after: 1 week	2012-12-12 08:35:32 +00:00
Konstantin Belousov	737d12b397	Add amd64-specific ddb command "show pte". The command displays the hierarchy of the page table entries which map the specified address. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2012-12-10 05:14:34 +00:00
Jim Harris	71a30c4436	Add amd64 implementations for 8-byte bus_space routines. Submitted by: Carl Delsey <carl.r.delsey@intel.com> Discussed with: jhb, rwatson Reviewed by: jimharris MFC after: 1 week	2012-12-06 22:33:31 +00:00
Neel Natu	32531ccb84	IFC @r243836	2012-12-04 04:37:42 +00:00
Konstantin Belousov	349438a243	Print the frame addresses for the backtraces on i386 and amd64. It allows both to inspect the frame sizes and to manually peek into the frames from ddb, if needed. Reviewed by: dim MFC after: 2 weeks	2012-12-03 22:16:51 +00:00
Jung-uk Kim	7609e73ca0	Remove duplicate code. Reduce diff between amd64 and i386.	2012-12-01 00:56:19 +00:00
Jung-uk Kim	8c2b353ead	Use volatile keywords properly.	2012-11-30 20:15:01 +00:00
Peter Grehan	e6f1f347a1	Properly screen for the AND 0x81 instruction from the set of group1 0x81 instructions that use the reg bits as an extended opcode. Still todo: properly update rflags. Pointed out by: jilles@	2012-11-30 05:40:24 +00:00
Jung-uk Kim	231ac244f8	Tidy up inline assembly. No functional change.	2012-11-30 00:59:37 +00:00
Peter Grehan	b1f95796f0	Remove debug printf. Pointed out by: emaste	2012-11-29 15:08:13 +00:00
Peter Grehan	3b2b001107	Add support for the 0x81 AND instruction, now generated by clang in the local APIC code. 0x81 is a read-modify-write instruction - the EPT check that only allowed read or write and not both has been relaxed to allow read and write. Reviewed by: neel Obtained from: NetApp	2012-11-29 06:26:42 +00:00
Neel Natu	48a29f4e07	Cleanup the user-space paging exit handler now that the unified instruction emulation is in place. Obtained from: NetApp	2012-11-28 13:34:44 +00:00
Neel Natu	b42206f300	Change emulate_rdmsr() and emulate_wrmsr() to return 0 on sucess and errno on failure. The conversion from the return value to HANDLED or UNHANDLED can be done locally in vmx_exit_process(). Obtained from: NetApp	2012-11-28 13:10:18 +00:00
Neel Natu	ba9b7bf73a	Revamp the x86 instruction emulation in bhyve. On a nested page table fault the hypervisor will: - fetch the instruction using the guest %rip and %cr3 - decode the instruction in 'struct vie' - emulate the instruction in host kernel context for local apic accesses - any other type of mmio access is punted up to user-space (e.g. ioapic) The decoded instruction is passed as collateral to the user-space process that is handling the PAGING exit. The emulation code is fleshed out to include more addressing modes (e.g. SIB) and more types of operands (e.g. imm8). The source code is unified into a single file (vmm_instruction_emul.c) that is compiled into vmm.ko as well as /usr/sbin/bhyve. Reviewed by: grehan Obtained from: NetApp	2012-11-28 00:02:17 +00:00
Neel Natu	920bc34090	Fix a bug in the MSI-X resource allocation for PCI passthrough devices. In the case where the underlying host had disabled MSI-X via the "hw.pci.enable_msix" tunable, the ppt_setup_msix() function would fail and return an error without properly cleaning up. This in turn would cause a page fault on the next boot of the guest. Fix this by calling ppt_teardown_msix() in all the error return paths. Obtained from: NetApp	2012-11-22 04:07:18 +00:00
Neel Natu	288aeb8561	Get rid of redundant comparision which is guaranteed to be "true" for unsigned integers. Obtained from: NetApp	2012-11-22 00:08:20 +00:00
Peter Grehan	a0cad47092	Handle CPUID leaf 0x7 now that FreeBSD is using it. Return 0's for now. Reviewed by: neel Obtained from: NetApp	2012-11-20 06:01:03 +00:00
Neel Natu	3248464555	IFC @ r243164	2012-11-17 02:55:47 +00:00
Konstantin Belousov	43f48b65c0	Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h to vm/vm_phys.h, where it belongs. Requested and reviewed by: alc MFC after: 2 weeks	2012-11-16 05:55:56 +00:00
Konstantin Belousov	b32ecf44bc	Flip the semantic of M_NOWAIT to only require the allocation to not sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks	2012-11-14 20:01:40 +00:00
Neel Natu	7d3d462b09	IFC @ r242940	2012-11-13 07:39:05 +00:00
Neel Natu	a10c6f5544	IFC @ r242684	2012-11-11 03:26:14 +00:00
Konstantin Belousov	5a17538e22	Do not try to enable new features in the %cr4 if running under hypervisor. Apparently, hypervisors failed to filter out 'Standard Extended Features' report from CPUID, but deliver #gp when corresponding bit in %cr4 is toggled. This shall be reconsidered later, after hypervisors correct the bug. Reported and tested by: joel Reviewed by: avg MFC after: 2 weeks	2012-11-09 16:00:30 +00:00
Peter Grehan	0a5e9bfb72	Fix issue found with clang build. Avoid code insertion by the compiler between inline asm statements that would in turn modify the flags value set by the first asm, and used by the second. Solve by making the common error block a string that can be pulled into the first inline asm, and using symbolic labels for asm variables. bhyve can now build/run fine when compiled with clang. Reviewed by: neel Obtained from: NetApp	2012-11-06 02:43:41 +00:00
Attilio Rao	cfedf924d3	Rework the known rwlock to benefit about staying on their own cache line in order to avoid manual frobbing but using struct rwlock_padalign. Reviewed by: alc, jimharris	2012-11-03 23:03:14 +00:00
Konstantin Belousov	cd9e9d1bc2	Enable the new instructions for reading and writing bases for %fs, %gs, when supported. Note that WRFSBASE and WRGSBASE are not very useful on FreeBSD right now, because a return from the kernel mode to userspace reloads the bases specified by the sysarch(2) syscall, most likely. Enable the Supervisor Mode Execution Prevention (SMEP) when supported. Since the loader(8) performs hand-off to the kernel with the page tables which contradict the SMEP, postpone enabling the SMEP on BSP until pmap switched for the proper kernel tables. Debugged with the help from: avg Tested by: avg, Michael Moll <kvedulv@kvedulv.de> MFC after: 1 month	2012-11-01 15:17:43 +00:00
Konstantin Belousov	2773649d2f	Provide the reading and display of the Standard Extended Features, introduced with the IvyBridge CPUs. Provide the definitions for new bits in CR3 and CR4 registers. Tested by: avg, Michael Moll <kvedulv@kvedulv.de> MFC after: 2 weeks	2012-11-01 15:14:37 +00:00
Neel Natu	514393f565	Convert VMCS_ENTRY_INTR_INFO field into a vmcs identifier before passing it to vmcs_getreg(). Without this conversion vmcs_getreg() will return EINVAL. In particular this prevented injection of the breakpoint exception into the guest via the "-B" option to /usr/sbin/bhyve which is hugely useful when debugging guest hangs. This was broken in r241921. Pointy hat: me Obtained from: NetApp	2012-10-29 23:58:15 +00:00
Neel Natu	b01c203325	Corral all the host state associated with the virtual machine into its own file. This state is independent of the type of hardware assist used so there is really no need for it to be in Intel-specific code. Obtained from: NetApp	2012-10-29 01:51:24 +00:00
Peter Grehan	cda5bd7f19	Set the valid field of the newly allocated field as all other vm page allocators do. This fixes a panic when a virtio block device is mounted as root, with the host system dying in vm_page_dirty with invalid bits. Reviewed by: neel Obtained from: NetApp	2012-10-26 22:32:26 +00:00
Neel Natu	bd8572e0be	Unconditionally enable fpu emulation by setting CR0.TS in the host after the guest does a vm exit. This allows us to trap any fpu access in the host context while the fpu still has "dirty" state belonging to the guest. Reported by: "s vas" on freebsd-virtualization@ Obtained from: NetApp	2012-10-26 03:12:40 +00:00
Neel Natu	f76fc5d414	If the guest vcpu wants to idle then use that opportunity to relinquish the host cpu to the scheduler until the guest is ready to run again. This implies that the host cpu utilization will now closely mirror the actual load imposed by the guest vcpu. Also, the vcpu mutex now needs to be of type MTX_SPIN since we need to acquire it inside a critical section. Obtained from: NetApp	2012-10-25 04:29:21 +00:00
Neel Natu	ff6ec151e0	Hide the monitor/mwait instruction capability from the guest until we know how to properly intercept it. Obtained from: NetApp	2012-10-25 04:08:26 +00:00
Neel Natu	f352ff0ca8	Maintain state regarding NMI delivery to guest vcpu in VT-x independent manner. Also add a stats counter to count the number of NMIs delivered per vcpu. Obtained from: NetApp	2012-10-24 02:54:21 +00:00
Neel Natu	eeefa4e4be	Test for AST pending with interrupts disabled right before entering the guest. If an IPI was delivered to this cpu before interrupts were disabled then return right away via vmx_setjmp() with a return value of VMX_RETURN_AST. Obtained from: NetApp	2012-10-23 02:20:42 +00:00
Eitan Adler	1611e2c0d1	The 'testing memory' patch gets printed too many times Approved by: cperciva (implicit)	2012-10-22 11:57:26 +00:00
Eitan Adler	267cc84937	Explain the upcoming delay by printing a message when the kernel is about to begin testing memory. Reviewed by: dteske, adri Approved by: cperciva MFC after: 1 week	2012-10-22 03:16:39 +00:00
Neel Natu	2e25737a49	Calculate the number of host ticks until the next guest timer interrupt. This information will be used in conjunction with guest "HLT exiting" to yield the thread hosting the virtual cpu. Obtained from: NetApp	2012-10-20 08:23:05 +00:00
Konstantin Belousov	802431fa2e	Print the %rip value for uprintf_signal. MFC after: 1 week	2012-10-14 17:08:46 +00:00
Andriy Gapon	851dbc07af	pciereg_cfg*: use assembly to access the mem-mapped cfg space AMD BKDG for CPU families 10h and later requires that the memory mapped config is always read into or written from al/ax/eax register. Discussed with: kib, alc Reviewed by: kib (earlier version) MFC after: 25 days	2012-10-14 10:13:50 +00:00
Peter Grehan	13ec93719a	Add the guest physical address and r/w/x bits to the paging exit in preparation for a rework of bhyve MMIO handling. Reviewed by: neel Obtained from: NetApp	2012-10-12 23:12:19 +00:00
Neel Natu	75dd336603	Provide per-vcpu locks instead of relying on a single big lock. This also gets rid of all the witness.watch warnings related to calling malloc(M_WAITOK) while holding a mutex. Reviewed by: grehan	2012-10-12 18:32:44 +00:00
Neel Natu	cdc5b9e7b1	Fix warnings generated by 'debug.witness.watch' during VM creation and destruction for calling malloc() with M_WAITOK while holding a mutex. Do not allow vmm.ko to be unloaded until all virtual machines are destroyed.	2012-10-11 19:39:54 +00:00
Neel Natu	f9d4f89e4d	Deliver the MSI to the correct guest virtual cpu. Prior to this change the MSI was being delivered unconditionally to vcpu 0 regardless of how the guest programmed the MSI delivery.	2012-10-11 19:28:07 +00:00
Kevin Lo	9823d52705	Revert previous commit... Pointyhat to: kevlo (myself)	2012-10-10 08:36:38 +00:00
Attilio Rao	3a4730256a	Add an unified macro to deny ability from the compiler to reorder instruction loads/stores at its will. The macro __compiler_membar() is currently supported for both gcc and clang, but kernel compilation will fail otherwise. Reviewed by: bde, kib Discussed with: dim, theraven MFC after: 2 weeks	2012-10-09 14:32:30 +00:00
Attilio Rao	af2bdacafb	Reverts r234074,234105,234564,234723,234989,235231-235232 and part of r234247. Use, instead, the static intializer introduced in r239923 for x86 and sparc64 intr_cpus, unwinding the code to the initial version. Reviewed by: marius	2012-10-09 12:22:43 +00:00
Kevin Lo	a10cee30c9	Prefer NULL over 0 for pointers	2012-10-09 08:27:40 +00:00
Neel Natu	7ce04d0ad9	Allocate memory pages for the guest from the host's free page queue. It is no longer necessary to hard-partition the memory between the host and guests at boot time.	2012-10-08 23:41:26 +00:00
Neel Natu	f7d51510f1	Change vm_malloc() to map pages in the guest physical address space in 4KB chunks. This breaks the assumption that the entire memory segment is contiguously allocated in the host physical address space. This also paves the way to satisfy the 4KB page allocations by requesting free pages from the VM subsystem as opposed to hard-partitioning host memory at boot time.	2012-10-04 02:27:14 +00:00
Neel Natu	4db4fb2c25	Get rid of assumptions in the hypervisor that the host physical memory associated with guest physical memory is contiguous. Add check to vm_gpa2hpa() that the range indicated by [gpa,gpa+len) is all contained within a single 4KB page.	2012-10-03 01:18:51 +00:00
Neel Natu	bda273f21e	Get rid of assumptions in the hypervisor that the host physical memory associated with guest physical memory is contiguous. Rewrite vm_gpa2hpa() to get the GPA to HPA mapping by querying the nested page tables.	2012-10-03 00:46:30 +00:00
Neel Natu	341f19c949	Get rid of assumptions in the hypervisor that the host physical memory associated with guest physical memory is contiguous. In this case vm_malloc() was using vm_gpa2hpa() to indirectly infer whether or not the address range had already been allocated. Replace this instead with an explicit API 'vm_gpa_available()' that returns TRUE if a page is available for allocation in guest physical address space.	2012-09-29 01:15:45 +00:00
John Baldwin	960b5a7080	- Re-shuffle the <machine/pc/bios.h> headers to move all kernel-specific bits under #ifdef _KERNEL but leave definitions for various structures defined by standards ($PIR table, SMAP entries, etc.) available to userland. - Consolidate duplicate SMBIOS table structure definitions in ipmi(4) and smbios(4) in <machine/pc/bios.h> and make them available to userland. MFC after: 2 weeks	2012-09-28 11:59:32 +00:00
Alan Cox	e4b8a2fc5a	Eliminate a stale comment. It describes another use case for the pmap in Mach that doesn't exist in FreeBSD.	2012-09-28 05:30:59 +00:00
Neel Natu	70593114cd	Intel VT-x provides the length of the instruction at the time of the nested page table fault. Use this when fetching the instruction bytes from the guest memory. Also modify the lapic_mmio() API so that a decoded instruction is fed into it instead of having it fetch the instruction bytes from the guest. This is useful for hardware assists like SVM that provide the faulting instruction as part of the vmexit.	2012-09-27 00:27:58 +00:00
Neel Natu	73820fb0a4	Add an option "-a" to present the local apic in the XAPIC mode instead of the default X2APIC mode to the guest.	2012-09-26 00:06:17 +00:00
Neel Natu	a2da7af6bc	Add support for trapping MMIO writes to local apic registers and emulating them. The default behavior is still to present the local apic to the guest in the x2apic mode.	2012-09-25 22:31:35 +00:00
Neel Natu	e90273829b	Add ioctls to control the X2APIC capability exposed by the virtual machine to the guest. At the moment this simply sets the state in the 'vcpu' instance but there is no code that acts upon these settings.	2012-09-25 19:08:51 +00:00
Neel Natu	edf89256dd	Add an explicit exit code 'SPINUP_AP' to tell the controlling process that an AP needs to be activated by spinning up an execution context for it. The local apic emulation is now completely done in the hypervisor and it will detect writes to the ICR_LO register that try to bring up the AP. In response to such writes it will return to userspace with an exit code of SPINUP_AP. Reviewed by: grehan	2012-09-25 02:33:25 +00:00
Neel Natu	98ed632c63	Stash the 'vm_exit' information in each 'struct vcpu'. There is no functional change at this time but this paves the way for vm exit handler functions to easily modify the exit reason going forward.	2012-09-24 19:32:24 +00:00
Dimitry Andric	f99157cced	After r205013, amd64 and i386 CPU family and model IDs were printed out in hexadecimal, but without any 0x prefix, which can be very misleading. MFC after: 3 days	2012-09-21 10:31:19 +00:00
Neel Natu	2d3a73ed6d	Restructure the x2apic access code in preparation for supporting memory mapped access to the local apic. The vlapic code is now aware of the mode that the guest is using to access the local apic. Reviewed by: grehan@	2012-09-21 03:09:23 +00:00
Jim Harris	eb85d44f06	Integrate nvme(4) and nvd(4) into the amd64 and i386 builds. Sponsored by: Intel	2012-09-17 19:26:33 +00:00
Konstantin Belousov	c5e3d0ab11	Rename the IVY_RNG option to RDRAND_RNG. Based on submission by: Arthur Mesh <arthurmesh@gmail.com> MFC after: 2 weeks	2012-09-13 10:12:16 +00:00
Alan Cox	7336315b0a	Simplify pmap_unmapdev(). Since kmem_free() eventually calls pmap_remove(), pmap_unmapdev()'s own direct efforts to destroy the page table entries are redundant, so eliminate them. Don't set PTE_W on the page table entry in pmap_kenter{,_attr}() on MIPS. Setting PTE_W on MIPS is inconsistent with the implementation of this function on other architectures. Moreover, PTE_W should not be set, unless the pmap's wired mapping count is incremented, which pmap_kenter{,_attr}() doesn't do. MFC after: 10 days	2012-09-10 16:11:29 +00:00
Attilio Rao	324e57150d	userret() already checks for td_locks when INVARIANTS is enabled, so there is no need to check if Giant is acquired after it. Reviewed by: kib MFC after: 1 week	2012-09-08 18:27:11 +00:00
Konstantin Belousov	ef9461ba0e	Add support for new Intel on-CPU Bull Mountain random number generator, found on IvyBridge and supposedly later CPUs, accessible with RDRAND instruction. From the Intel whitepapers and articles about Bull Mountain, it seems that we do not need to perform post-processing of RDRAND results, like AES-encryption of the data with random IV and keys, which was done for Padlock. Intel claims that sanitization is performed in hardware. Make both Padlock and Bull Mountain random generators support code covered by kernel config options, for the benefit of people who prefer minimal kernels. Also add the tunables to disable hardware generator even if detected. Reviewed by: markm, secteam (simon) Tested by: bapt, Michael Moll <kvedulv@kvedulv.de> MFC after: 3 weeks	2012-09-05 13:18:51 +00:00
Alan Cox	d8f9ed32c5	Rename {_,}pmap_unwire_pte_hold() to {_,}pmap_unwire_ptp() and update the comment describing them. Both the function names and the comment had grown stale. Quite some time has passed since these pmap implementations last used the page's hold count to track the number of valid mapping within a page table page. Also, returning TRUE from pmap_unwire_ptp() rather than _pmap_unwire_ptp() eliminates a few instructions from callers like pmap_enter_quick_locked() where pmap_unwire_ptp()'s return value is used directly by a conditional statement.	2012-09-05 06:02:54 +00:00
Xin LI	0807ad7422	Add hpt27xx to GENERIC kernel for amd64 and i386 systems. MFC after: 2 weeks	2012-09-04 21:02:57 +00:00
John Baldwin	778eefa40d	Fix duplicate entries for mwl(4): - Move mwlfw from {amd64,i386}/conf/NOTES to sys/conf/NOTES (mwl(4) is already present in sys/conf/NOTES). - Remove duplicate mwl(4) entries from {amd64,i386}/conf/NOTES. - While here, add a description to the sfxge line in amd64/conf/NOTES.	2012-09-04 19:19:36 +00:00
John Baldwin	4c1044491b	Fix misspelled "Infiniband". Submitted by: gcooper MFC after: 3 days	2012-08-28 11:34:09 +00:00
Peter Grehan	177fd53318	Add sysctls to display the total and free amount of hard-wired mem for VMs # sysctl hw.vmm hw.vmm.mem_free: 2145386496 hw.vmm.mem_total: 2145386496 Submitted by: Takeshi HASEGAWA hasegaw at gmail com	2012-08-26 01:41:41 +00:00
Glen Barber	67944c4572	Grammar fix: s/NIC's/NICs/ MFC after: 3 days	2012-08-26 01:21:02 +00:00
Dag-Erling Smørgrav	e2082935f0	As discussed on -current, remove the hardcoded default maxswzone. MFC after: 3 weeks	2012-08-14 17:01:21 +00:00
Konstantin Belousov	b6d3609050	Add a hackish debugging facility to provide a bit of information about reason for generated trap. The dump of basic signal information and 8 bytes of the faulting instruction are printed on the controlling terminal of the process, if the machdep.uprintf_signal syscal is enabled. The print is the only practical way to debug traps from a.out processes I am aware of. Because I have to reimplement it each time I debug an issue with a.out support on amd64, commit the hack to main tree. MFC after: 1 week	2012-08-14 12:15:01 +00:00
Konstantin Belousov	95fd15898b	Real hardware, as opposed to QEMU, does not allow to have a call gate in long mode which transfers control to 32bit code segment. Unbreak the lcall $7,$0 implementation on amd64 by putting the 64bit user code segment' selector into call gate, and execute the 64bit trampoline which converts the return frame into 32bit format and switches back to 32bit mode for executing int $0x80 trampoline. Note that all jumps over the hoops are performed in the user mode. MFC after: 1 week	2012-08-14 12:13:27 +00:00
John Baldwin	6e4ac34b07	Remove the deassert INIT IPI from the IPI startup sequence for APs. It is not listed in the boot sequence in the MP specification (1.4), and it is explicitly ignored on modern CPUs. It was only ever required when bootstrapping systems with external APICs (that is, SMP machines with 486s), which FreeBSD has never supported (and never will). While here, tidy some comments and remove some banal ones.	2012-08-13 18:52:51 +00:00
John Baldwin	a4284ef768	Add a 10 millisecond delay after sending the initial INIT IPI. This matches the algorithm in the MP specification (1.4). Previously we were sending out the deassert INIT IPI immediately after the initial INIT IPI was sent.	2012-08-13 16:33:22 +00:00
Colin Percival	347c7fd7bf	Build modules along with the XENHVM kernels. No objections from: freebsd-xen mailing list MFC after: 1 week	2012-08-13 07:36:57 +00:00
Alan Cox	663f8700d4	The assertion that I added in r238889 could legitimately fail when a debugger creates a breakpoint. Replace that assertion with a narrower one that still achieves my objective. Reported and tested by: kib	2012-08-08 05:28:30 +00:00
Konstantin Belousov	65211d02c4	Do not apply errata 721 workaround when under hypervisor, since typical hypervisor does not implement access to the required MSR, causing #GP on boot. Reported and tested by: olgeni PR: amd64/170388 MFC after: 3 days	2012-08-07 08:36:10 +00:00
Sergey Kandaurov	16ec457aeb	Remove duplicate header inclusion of <sys/sysent.h> Discussed with: bz	2012-08-07 05:46:36 +00:00
Alan Cox	59fa03faa3	Shave off a few more cycles from the average execution time of pmap_enter() by simplifying the control flow and reducing the live range of "om".	2012-08-05 16:59:02 +00:00
Neel Natu	8124debe13	Include 'device uart' in the guest kernel.	2012-08-04 04:30:26 +00:00
Neel Natu	39c21c2db2	Force certain bits in %cr4 to be hard-wired to '1' or '0' from a guest's perspective. If we don't do this some guest OSes (e.g. Linux) will reset the CR4_VMXE bit in %cr4 with disastrous consequences. Reported by: grehan	2012-08-04 02:06:55 +00:00
Konstantin Belousov	0220d04fe3	Add lfence(). MFC after: 1 week	2012-08-01 17:24:53 +00:00
Alan Cox	879eedbc7b	Revise pmap_enter()'s handling of mapping updates that change the PTE's PG_M and PG_RW bits but not the physical page frame. First, only perform vm_page_dirty() on a managed vm_page when the PG_M bit is being cleared. If the updated PTE continues to have PG_M set, then there is no requirement to perform vm_page_dirty(). Second, flush the mapping from the TLB when PG_M alone is cleared, not just when PG_M and PG_RW are cleared. Otherwise, a stale TLB entry may stop PG_M from being set again on the next store to the virtual page. However, since the vm_page's dirty field already shows the physical page as being dirty, no actual harm comes from the PG_M bit not being set. Nonetheless, it is potentially confusing to someone expecting to see the PTE change after a store to the virtual page.	2012-08-01 16:04:13 +00:00
Konstantin Belousov	a42fa0af44	Change (unused) prototype for stmxcsr() to match reality. Noted by: jhb MFC after: 1 week	2012-07-30 19:26:02 +00:00
Alan Cox	bc27d6c608	Shave off a few more cycles from pmap_enter()'s critical section. In particular, do a little less work with the PV list lock held.	2012-07-29 18:20:49 +00:00
Neel Natu	4bff7fad95	Verify that VMX operation has been enabled by BIOS before executing the VMXON instruction. Reported by "s vas" on freebsd-virtualization@	2012-07-25 00:21:16 +00:00
Konstantin Belousov	59c1a8315c	Forcibly shut up clang warning about NULL pointer dereference. MFC after: 3 weeks	2012-07-23 19:16:31 +00:00
Konstantin Belousov	7c80fcfdba	Constently use 2-space sentence breaks. Submitted by: bde MFC after: 1 week	2012-07-21 13:53:00 +00:00
Konstantin Belousov	1965c139f1	Stop caching curpcb in the local variable. Requested by: bde MFC after: 1 week	2012-07-21 13:47:37 +00:00
Konstantin Belousov	700de5109a	The PT_I386_{GET,SET}XMMREGS and PT_{GET,SET}XSTATE operate on the stopped threads. Implementation assumes that the thread's FPU context is spilled into the PCB due to stop. This is mostly true, except when FPU state for the thread is not initialized. Then the requests operate on the garbage state which is currently left in the PCB, causing confusion. The situation is indeed observed after a signal delivery and before #NM fault on execution of any FPU instruction in the signal handler, since sendsig(9) drops FPU state for current thread, clearing PCB_FPUINITDONE. When inspecting context state for the signal handler, debugger sees the FPU state of the main program context instead of the clear state supposed to be provided to handler. Fix this by forcing clean FPU state in PCB user FPU save area by performing getfpuregs(9) before accessing user FPU save area in ptrace_machdep.c. Note: this change will be merged to i386 kernel as well, where it is much more important, since e.g. gdb on i386 uses PT_I386_GETXMMREGS to inspect FPU context on CPUs that support SSE. Amd64 version of gdb uses PT_GETFPREGS to inspect both 64 and 32 bit processes, which does not exhibit the bug. Reported by: bde MFC after: 1 week	2012-07-21 13:06:37 +00:00
Konstantin Belousov	dfa8a51288	Stop clearing x87 exceptions in the #MF handler on amd64. If user code understands FPU hardware enough to catch SIGFPE and unmask exceptions in control word, then it may as well properly handle return from SIGFPE without causing an infinite loop of #MF exceptions due to faulting instruction restart, when needed. Clearing exceptions causes information loss for handlers which do understand FPU hardware, and struct siginfo si_code member cannot be considered adequate replacement for en_sw content due to translation. Supposed reason for clearing the exceptions, which is IRQ13 handling oddities, were never applicable to amd64. Note: this change will be merged to i386 kernel as well, since we do not support IRQ13 delivery of #MF notifications for some time. Requested by: bde MFC after: 1 week	2012-07-21 13:05:34 +00:00
Konstantin Belousov	83b22b05e6	Introduce curpcb magic variable, similar to curthread, which is MD amd64. It is implemented as __pure2 inline with non-volatile asm read from pcpu, which allows a compiler to cache its results. Convert most PCPU_GET(pcb) and curthread->td_pcb accesses into curpcb. Note that __curthread() uses magic value 0 as an offsetof(struct pcpu, pc_curthread). It seems to be done this way due to machine/pcpu.h needs to be processed before sys/pcpu.h, because machine/pcpu.h contributes machine-depended fields to the struct pcpu definition. As result, machine/pcpu.h cannot use struct pcpu yet. The __curpcb() also uses a magic constant instead of offsetof(struct pcpu, pc_curpcb) for the same reason. The constants are now defined as symbols and CTASSERTs are added to ensure that future KBI changes do not break the code. Requested and reviewed by: bde MFC after: 3 weeks	2012-07-19 19:09:12 +00:00
Alan Cox	3088e08c4b	Don't unnecessarily set PGA_REFERENCED in pmap_enter().	2012-07-19 05:34:19 +00:00
Konstantin Belousov	bc84db6267	On AMD64, provide siginfo.si_code for floating point errors when error occurs using the SSE math processor. Update comments describing the handling of the exception status bits in coprocessors control words. Remove GET_FPU_CW and GET_FPU_SW macros which were used only once. Prefer to use curpcb to access pcb_save over the longer path of referencing pcb through the thread structure. Based on the submission by: Ed Alley <wea llnl gov> PR: amd64/169927 Reviewed by: bde MFC after: 3 weeks	2012-07-18 15:43:47 +00:00
Konstantin Belousov	a81f9fed5d	Add stmxcsr. Submitted by: Ed Alley <wea llnl gov> PR: amd64/169927 MFC after: 3 weeks	2012-07-18 15:36:03 +00:00
Konstantin Belousov	333d0c6060	Add support for the XSAVEOPT instruction use. Our XSAVE/XRSTOR usage mostly meets the guidelines set by the Intel SDM: 1. We use XRSTOR and XSAVE from the same CPL using the same linear address for the store area 2. Contrary to the recommendations, we cannot zero the FPU save area for a new thread, since fork semantic requires the copy of the previous state. This advice seemingly contradicts to the advice from the item 6. 3. We do use XSAVEOPT in the context switch code only, and the area for XSAVEOPT already always contains the data saved by XSAVE. 4. We do not modify the save area between XRSTOR, when the area is loaded into FPU context, and XSAVE. We always spit the fpu context into save area and start emulation when directly writing into FPU context. 5. We do not use segmented addressing to access save area, or rather, always address it using %ds basing. 6. XSAVEOPT can be only executed in the area which was previously loaded with XRSTOR, since context switch code checks for FPU use by outgoing thread before saving, and thread which stopped emulation forcibly get context loaded with XRSTOR. 7. The PCB cannot be paged out while FPU emulation is turned off, since stack of the executing thread is never swapped out. The context switch code is patched to issue XSAVEOPT instead of XSAVE if supported. This approach eliminates one conditional in the context switch code, which would be needed otherwise. For user-visible machine context to have proper data, fpugetregs() checks for unsaved extension blocks and manually copies pristine FPU state into them, according to the description provided by CPUID leaf 0xd. MFC after: 1 month	2012-07-14 15:48:30 +00:00
Alan Cox	b9592bdab3	Wring a few cycles out of pmap_enter(). In particular, on a user-space pmap, avoid walking the page table twice.	2012-07-13 04:10:41 +00:00
Peter Grehan	b652778e42	IFC @ r238370	2012-07-11 19:54:21 +00:00
John Baldwin	d706ec297a	Add a clts() wrapper around the 'clts' instruction to <machine/cpufunc.h> on x86 and use that to implement stop_emulating() in the fpu/npx code. Reimplement start_emulating() in the non-XEN case by using load_cr0() and rcr0() instead of the 'lmsw' and 'smsw' instructions. Intel explicitly discourages the use of 'lmsw' and 'smsw' on 80386 and later processors in the description of these instructions in Volume 2 of the ADM. Reviewed by: kib MFC after: 1 month	2012-07-09 20:55:39 +00:00
John Baldwin	5355f65974	Partially revert r217515 so that the mem_range_softc variable is always present on x86 kernels. This fixes the build of kernels that include 'device acpi' but do not include 'device mem'. MFC after: 1 month	2012-07-09 20:42:08 +00:00
Konstantin Belousov	f18d5bf44b	Use assembler mnemonic instead of manually assembling, contination for r238142. Reviewed by: jhb MFC after: 1 month	2012-07-06 20:11:58 +00:00
John Baldwin	6632f45773	Several fixes to the amd64 disassembler: - Add generic support for opcodes that are escape bytes used for multi-byte opcodes (such as the 0x0f prefix). Use this to replace the hard-coded 0x0f special case and add support for three-byte opcodes that use the 0x0f38 prefix. - Decode all Intel VMX instructions. invept and invvpid in particular are three-byte opcodes that use the 0x0f38 escape prefix. - Rework how the special 'SDEP' size flag works such that the default instruction name (i_name) is the instruction when the data size prefix (0x66) is not specified, and the alternate name in i_extra is used when the prefix is included. - Add a new 'ADEP' size flag similar to 'SDEP' except that it chooses between i_name and i_extra based on the address size prefix (0x67). Use this to fix the decoding for jrcxz vs jecxz which is determined by the address size prefix, not the operand size prefix. Also, jcxz is not possible in 64-bit mode, but jrcxz is the default instruction for that opcode. - Add support for handling instructions that have a mandatory 'rep' prefix (this means not outputting the 'repe ' prefix until determining if it is used as part of an opcode). Make 'pause' less of a special case this way. - Decode 'cmpxchg16b' and 'cdqe' which are variants of other instructions but with a REX.W prefix. MFC after: 1 month	2012-07-06 14:25:59 +00:00
Alan Cox	cc861283f4	Make pmap_enter()'s management of PV entries consistent with the other pmap functions that manage PV entries. Specifically, remove the PV entry from the containing PV list only after the corresponding PTE is destroyed. Update the pmap's wired mapping count in pmap_enter() before the PV list lock is acquired.	2012-07-06 06:42:25 +00:00
John Baldwin	7574a595f2	Now that our assembler supports the xsave family of instructions, use them natively rather than hand-assembled versions. For xgetbv/xsetbv, add a wrapper API to deal with xcr* registers: rxcr() and load_xcr(). Reviewed by: kib MFC after: 1 month	2012-07-05 18:19:35 +00:00
Alan Cox	8f2994ce67	Calculate the new PTE value in pmap_enter() before acquiring any locks. Move an assertion to the beginning of pmap_enter().	2012-07-05 07:20:16 +00:00
Alan Cox	1bc8531c1e	Correct an error in r237513. The call to reserve_pv_entries() must come before pmap_demote_pde() updates the PDE. Otherwise, pmap_pv_demote_pde() can crash. Crash reported by: kib Patch tested by: kib	2012-07-05 00:08:47 +00:00
John Baldwin	66f9aec075	Decode the 'xsave', 'xrstor', 'xsaveopt', 'xgetbv', 'xsetbv', and 'rdtscp' instructions. MFC after: 1 month	2012-07-04 16:47:39 +00:00
Xin LI	309dca0171	tws(4) is interfaced with CAM so move it to the same section. Reported by: joel MFC after: 3 days	2012-07-01 08:10:49 +00:00
Alan Cox	2bde6e3518	Optimize reserve_pv_entries() using the popcnt instruction.	2012-06-30 20:25:12 +00:00
Alan Cox	92e2574577	In r237592, I forgot that pmap_enter() might already hold a PV list lock at the point that it calls get_pv_entry(). Thus, pmap_enter()'s PV list lock pointer must be passed to get_pv_entry() for those rare occasions when get_pv_entry() calls reclaim_pv_chunk(). Update some related comments.	2012-06-29 18:15:56 +00:00
Alan Cox	6c67613030	Avoid some unnecessary PV list locking in pmap_enter().	2012-06-28 22:03:59 +00:00
Alan Cox	23e59dfa8d	Optimize pmap_pv_demote_pde().	2012-06-28 05:42:04 +00:00
Alan Cox	e30df26e7b	Add new pmap layer locks to the predefined lock order. Change the names of a few existing VM locks to follow a consistent naming scheme.	2012-06-27 03:45:25 +00:00
Alan Cox	5b5b0ef34d	Introduce RELEASE_PV_LIST_LOCK().	2012-06-26 16:45:18 +00:00
Alan Cox	0d646df757	Add PV list locking to pmap_enter(). Its execution is no longer serialized by the pvh global lock. Add a needed atomic operation to pmap_object_init_pt().	2012-06-26 06:02:43 +00:00
Alan Cox	aaf3bc56fd	Add PV chunk and list locking to pmap_change_wiring(), pmap_protect(), and pmap_remove(). The execution of these functions is no longer serialized by the pvh global lock. Make some stylistic changes to the affected code for the sake of consistency with related code elsewhere in the pmap.	2012-06-25 07:13:25 +00:00
Alan Cox	f745b16359	Introduce reserve_pv_entry() and use it in pmap_pv_demote_pde(). In order to add PV list locking to pmap_pv_demote_pde(), it is necessary to change the way that pmap_pv_demote_pde() allocates PV entries. Specifically, once pmap_pv_demote_pde() begins modifying the PV lists, it can't allocate any new PV chunks, because that could require the PV list lock to be dropped. So, all necessary PV chunks must be allocated in advance. To my surprise, this new approach is a few percent faster than the old one.	2012-06-23 22:54:25 +00:00
Konstantin Belousov	aea810386d	Implement mechanism to export some kernel timekeeping data to usermode, using shared page. The structures and functions have vdso prefix, to indicate the intended location of the code in some future. The versioned per-algorithm data is exported in the format of struct vdso_timehands, which mostly repeats the content of in-kernel struct timehands. Usermode reading of the structure can be lockless. Compatibility export for 32bit processes on 64bit host is also provided. Kernel also provides usermode with indication about currently used timecounter, so that libc can fall back to syscall if configured timecounter is unknown to usermode code. The shared data updates are initiated both from the tc_windup(), where a fast task is queued to do the update, and from sysctl handlers which change timecounter. A manual override switch kern.timecounter.fast_gettime allows to turn off the mechanism. Only x86 architectures export the real algorithm data, and there, only for tsc timecounter. HPET counters page could be exported as well, but I prefer to not further glue the kernel and libc ABI there until proper vdso-based solution is developed. Minimal stubs neccessary for non-x86 architectures to still compile are provided. Discussed with: bde Reviewed by: jhb Tested by: flo MFC after: 1 month	2012-06-22 07:06:40 +00:00
Konstantin Belousov	232aa31fb9	Reserve AT_TIMEKEEP auxv entry for providing usermode the pointer to timekeeping information. MFC after: 1 week	2012-06-22 06:38:31 +00:00

1 2 3 4 5 ...

6430 Commits