freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-24 11:29:10 +00:00

Author	SHA1	Message	Date
Neel Natu	0acb0d84c5	Support array-type of stats in bhyve. An array-type stat in vmm.ko is defined as follows: VMM_STAT_ARRAY(IPIS_SENT, VM_MAXCPU, "ipis sent to vcpu"); It is incremented as follows: vmm_stat_array_incr(vm, vcpuid, IPIS_SENT, array_index, 1); And output of 'bhyvectl --get-stats' looks like: ipis sent to vcpu[0] 3114 ipis sent to vcpu[1] 0 Reviewed by: grehan Obtained from: NetApp	2013-05-10 02:59:49 +00:00
Dmitry Chagin	d127f15308	Retire write-only PCB_GS32BIT pcb flag on amd64.	2013-05-09 21:42:43 +00:00
Konstantin Belousov	241b67bb47	Correct the type for the literal used on the left side of the shift up to 63 bit positions. Do not fill the save area and do not set the saved bit in the xstate bit vector for the state which is not marked as enabled in xsave_mask. Reported and tested by: Jim Ohlstein <jim@ohlste.in> MFC after: 3 days	2013-05-09 17:25:29 +00:00
Attilio Rao	941646f5ec	Rename VM_NDOMAIN into MAXMEMDOM and move it into machine/param.h in order to match the MAXCPU concept. The change should also be useful for consolidation and consistency. Sponsored by: EMC / Isilon storage division Obtained from: jeff Reviewed by: alc	2013-05-07 22:46:24 +00:00
Ed Maste	f5efbffe52	Switch to standard copyright license text The initial version of this came from Sandvine but had "PROVIDED BY NETAPP, INC" in the copyright text, presuambly because the license block was copied from another file. Replace it with standard "AUTHOR AND CONTRIBUTORS" form. Approvided by: grehan@	2013-05-02 12:35:15 +00:00
Konstantin Belousov	14f525595c	Partially saved extended state must be handled always, i.e. for both fpu-owned context, and for pcb-saved one. More, the XSAVE could do partial save, same as XSAVEOPT, so qualifier for the handler should be use_xsave and not use_xsaveopt. Since xsave_area_desc is now needed regardless of the XSAVEOPT use, remove the write-only use_xsaveopt variable. In collaboration with: jhb MFC after: 1 week	2013-05-01 20:08:33 +00:00
Konstantin Belousov	6f2d9906a9	The check to ensure that xstate_bv always has XFEATURE_ENABLED_X87 and XFEATURE_ENABLED_SSE bits set is not needed. CPU correctly handles any bitmask which is subset of the enabled bits in %XCR0. More, CPU instructions XSAVE and XSAVEOPT could write the mask without e.g. XFEATURE_ENABLED_SSE, after the VZEROALL. The check prevents the restoration of the otherwise valid FPU save area. In collaboration with: jhb MFC after: 1 week	2013-05-01 20:03:50 +00:00
Carl Delsey	e47937d1b7	Add a new driver to support the Intel Non-Transparent Bridge(NTB). The NTB allows you to connect two systems with this device using a PCI-e link. The driver is made of two modules: - ntb_hw which is a basic hardware abstraction layer for the device. - if_ntb which implements the ntb network device and the communication protocol. The driver is limited at the moment to CPU memcpy instead of using DMA, and only Back-to-Back mode is supported. Also the network device isn't full featured yet. These changes will be coming soon. The DMA change will also bring in the ioat driver from the project branch it is on now. This is an initial port of the GPL/BSD Linux driver contributed by Jon Mason from Intel. Any bugs are my contributions. Sponsored by: Intel Reviewed by: jimharris, joel (man page only) Approved by: jimharris (mentor)	2013-04-29 22:48:53 +00:00
Peter Grehan	d3c11f40a5	Add RIP-relative addressing to the instruction decoder. Rework the guest register fetch code to allow the RIP to be extracted from the VMCS while the kernel decoder is functioning. Hit by the OpenBSD local-apic code. Submitted by: neel Reviewed by: grehan Obtained from: NetApp	2013-04-25 04:56:43 +00:00
Rui Paulo	068e8f74e4	Print RDSEED, ADX, and SMAP. Pointed out by: kib	2013-04-18 01:21:44 +00:00
Gabor Kovesdan	a8b5c2a0aa	- Correct spelling in comments Submitted by: Christoph Mallon <christoph.mallon@gmx.de> (via private mail)	2013-04-17 11:56:11 +00:00
Rui Paulo	d1dcd93145	Print more bits from the standard extended features CPUID which will be available in the Haswell architecture (c.f. Intel Document #319433-012A).	2013-04-17 06:51:17 +00:00
Neel Natu	3565b59ec0	Create sysctl node 'hw.vmm.vmx' and populate it with oids that expose the VMX hardware capabilities. Obtained from: NetApp	2013-04-13 21:41:51 +00:00
Konstantin Belousov	fcb29b9210	Fix the name of the pcb member in the comments. Submitted by: Oliver Pinter <oliver.pntr@gmail.com> MFC after: 3 days	2013-04-13 15:20:33 +00:00
Neel Natu	26d66b9d58	Use the MAKEDEV_CHECKNAME flag to check for an invalid device name and return an error instead of panicking. Obtained from: NetApp	2013-04-13 05:11:21 +00:00
Edward Tomasz Napierala	8ed9860914	Remove ctl(4) from GENERIC. Also remove 'options CTL_DISABLE' and kern.cam.ctl.disable tunable; those were introduced as a workaround to make it possible to boot GENERIC on low memory machines. With ctl(4) being built as a module and automatically loaded by ctladm(8), this makes CTL work out of the box. Reviewed by: ken Sponsored by: FreeBSD Foundation	2013-04-12 16:25:03 +00:00
Neel Natu	d5408b1d26	If vmm.ko could not be initialized correctly then prevent the creation of virtual machines subsequently. Submitted by: Chris Torek	2013-04-12 01:16:52 +00:00
Neel Natu	150369ab7c	Make the code to check if VMX is enabled more readable by using macros instead of magic numbers. Discussed with: Chris Torek	2013-04-11 04:29:45 +00:00
Neel Natu	1472b87f2f	Unsynchronized TSCs on the host require special handling in bhyve: - use clock_gettime(2) as the time base for the emulated ACPI timer instead of directly using rdtsc(). - don't advertise the invariant TSC capability to the guest to discourage it from using the TSC as its time base. Discussed with: jhb@ (about making 'smp_tsc' a global) Reported by: Dan Mack on freebsd-virtualization@ Obtained from: NetApp	2013-04-10 05:59:07 +00:00
Gleb Smirnoff	4e76af6a41	Merge from projects/counters: counter(9). Introduce counter(9) API, that implements fast and raceless counters, provided (but not limited to) for gathering of statistical data. See http://lists.freebsd.org/pipermail/freebsd-arch/2013-April/014204.html for more details. In collaboration with: kib Reviewed by: luigi Tested by: ae, ray Sponsored by: Nginx, Inc.	2013-04-08 19:40:53 +00:00
Gleb Smirnoff	17dece86fe	Merge from projects/counters: Pad struct pcpu so that its size is denominator of PAGE_SIZE. This is done to reduce memory waste in UMA_PCPU_ZONE zones. Sponsored by: Nginx, Inc.	2013-04-08 19:19:10 +00:00
Peter Grehan	117e8f378e	Don't panic when a valid divisor of 1 has been requested. Obtained from: NetApp	2013-04-05 22:16:31 +00:00
Alexander Motin	45f6d66569	Remove all legacy ATA code parts, not used since options ATA_CAM enabled in most kernels before FreeBSD 9.0. Remove such modules and respective kernel options: atadisk, ataraid, atapicd, atapifd, atapist, atapicam. Remove the atacontrol utility and some man pages. Remove useless now options ATA_CAM. No objections: current@, stable@ MFC after: never	2013-04-04 07:12:24 +00:00
Neel Natu	77d8fd9bb3	Add counter to keep track of the number of timer interrupts generated by the local apic for each virtual cpu.	2013-03-31 03:56:48 +00:00
Neel Natu	b5aaf7b22b	Add some more stats to keep track of all the reasons that a vcpu is exiting.	2013-03-30 17:46:03 +00:00
Neel Natu	66f71b7d24	Allow caller to skip 'guest linear address' validation when doing instruction decode. This is to accomodate hardware assist implementations that do not provide the 'guest linear address' as part of nested page fault collateral. Submitted by: Anish Gupta (akgupt3 at gmail dot com)	2013-03-28 21:26:19 +00:00
Konstantin Belousov	ee75e7de7b	Implement the concept of the unmapped VMIO buffers, i.e. buffers which do not map the b_pages pages into buffer_map KVA. The use of the unmapped buffers eliminate the need to perform TLB shootdown for mapping on the buffer creation and reuse, greatly reducing the amount of IPIs for shootdown on big-SMP machines and eliminating up to 25-30% of the system time on i/o intensive workloads. The unmapped buffer should be explicitely requested by the GB_UNMAPPED flag by the consumer. For unmapped buffer, no KVA reservation is performed at all. The consumer might request unmapped buffer which does have a KVA reserve, to manually map it without recursing into buffer cache and blocking, with the GB_KVAALLOC flag. When the mapped buffer is requested and unmapped buffer already exists, the cache performs an upgrade, possibly reusing the KVA reservation. Unmapped buffer is translated into unmapped bio in g_vfs_strategy(). Unmapped bio carry a pointer to the vm_page_t array, offset and length instead of the data pointer. The provider which processes the bio should explicitely specify a readiness to accept unmapped bio, otherwise g_down geom thread performs the transient upgrade of the bio request by mapping the pages into the new bio_transient_map KVA submap. The bio_transient_map submap claims up to 10% of the buffer map, and the total buffer_map + bio_transient_map KVA usage stays the same. Still, it could be manually tuned by kern.bio_transient_maxcnt tunable, in the units of the transient mappings. Eventually, the bio_transient_map could be removed after all geom classes and drivers can accept unmapped i/o requests. Unmapped support can be turned off by the vfs.unmapped_buf_allowed tunable, disabling which makes the buffer (or cluster) creation requests to ignore GB_UNMAPPED and GB_KVAALLOC flags. Unmapped buffers are only enabled by default on the architectures where pmap_copy_page() was implemented and tested. In the rework, filesystem metadata is not the subject to maxbufspace limit anymore. Since the metadata buffers are always mapped, the buffers still have to fit into the buffer map, which provides a reasonable (but practically unreachable) upper bound on it. The non-metadata buffer allocations, both mapped and unmapped, is accounted against maxbufspace, as before. Effectively, this means that the maxbufspace is forced on mapped and unmapped buffers separately. The pre-patch bufspace limiting code did not worked, because buffer_map fragmentation does not allow the limit to be reached. By Jeff Roberson request, the getnewbuf() function was split into smaller single-purpose functions. Sponsored by: The FreeBSD Foundation Discussed with: jeff (previous version) Tested by: pho, scottl (previous version), jhb, bf MFC after: 2 weeks	2013-03-19 14:13:12 +00:00
Attilio Rao	774d251d99	Sync back vmcontention branch into HEAD: Replace the per-object resident and cached pages splay tree with a path-compressed multi-digit radix trie. Along with this, switch also the x86-specific handling of idle page tables to using the radix trie. This change is supposed to do the following: - Allowing the acquisition of read locking for lookup operations of the resident/cached pages collections as the per-vm_page_t splay iterators are now removed. - Increase the scalability of the operations on the page collections. The radix trie does rely on the consumers locking to ensure atomicity of its operations. In order to avoid deadlocks the bisection nodes are pre-allocated in the UMA zone. This can be done safely because the algorithm needs at maximum one new node per insert which means the maximum number of the desired nodes is the number of available physical frames themselves. However, not all the times a new bisection node is really needed. The radix trie implements path-compression because UFS indirect blocks can lead to several objects with a very sparse trie, increasing the number of levels to usually scan. It also helps in the nodes pre-fetching by introducing the single node per-insert property. This code is not generalized (yet) because of the possible loss of performance by having much of the sizes in play configurable. However, efforts to make this code more general and then reusable in further different consumers might be really done. The only KPI change is the removal of the function vm_page_splay() which is now reaped. The only KBI change, instead, is the removal of the left/right iterators from struct vm_page, which are now reaped. Further technical notes broken into mealpieces can be retrieved from the svn branch: http://svn.freebsd.org/base/user/attilio/vmcontention/ Sponsored by: EMC / Isilon storage division In collaboration with: alc, jeff Tested by: flo, pho, jhb, davide Tested by: ian (arm) Tested by: andreast (powerpc)	2013-03-18 00:25:02 +00:00
Neel Natu	3f23d3ca9f	Fix the '-Wtautological-compare' warning emitted by clang for comparing the unsigned enum type with a negative value. Obtained from: NetApp	2013-03-16 22:53:05 +00:00
Neel Natu	61592433eb	Allow vmm stats to be specific to the underlying hardware assist technology. This can be done by using the new macros VMM_STAT_INTEL() and VMM_STAT_AMD(). Statistic counters that are common across the two are defined using VMM_STAT(). Suggested by: Anish Gupta Discussed with: grehan Obtained from: NetApp	2013-03-16 22:40:20 +00:00
Konstantin Belousov	e8a4a618cf	Add pmap function pmap_copy_pages(), which copies the content of the pages around, taking array of vm_page_t both for source and destination. Starting offsets and total transfer size are specified. The function implements optimal algorithm for copying using the platform-specific optimizations. For instance, on the architectures were the direct map is available, no transient mappings are created, for i386 the per-cpu ephemeral page frame is used. The code was typically borrowed from the pmap_copy_page() for the same architecture. Only i386/amd64, powerpc aim and arm/arm-v6 implementations were tested at the time of commit. High-level code, not committed yet to the tree, ensures that the use of the function is only allowed after explicit enablement. For sparc64, the existing code has known issues and a stab is added instead, to allow the kernel linking. Sponsored by: The FreeBSD Foundation Tested by: pho (i386, amd64), scottl (amd64), ian (arm and arm-v6) MFC after: 2 weeks	2013-03-14 20:18:12 +00:00
Alan Cox	9f585991ba	The kernel pmap is statically allocated, so there is really no need to explicitly initialize its pm_root field to zero. Sponsored by: EMC / Isilon Storage Division	2013-03-10 21:07:44 +00:00
Attilio Rao	89f6b8632c	Switch the vm_object mutex to be a rwlock. This will enable in the future further optimizations where the vm_object lock will be held in read mode most of the time the page cache resident pool of pages are accessed for reading purposes. The change is mostly mechanical but few notes are reported: * The KPI changes as follow: - VM_OBJECT_LOCK() -> VM_OBJECT_WLOCK() - VM_OBJECT_TRYLOCK() -> VM_OBJECT_TRYWLOCK() - VM_OBJECT_UNLOCK() -> VM_OBJECT_WUNLOCK() - VM_OBJECT_LOCK_ASSERT(MA_OWNED) -> VM_OBJECT_ASSERT_WLOCKED() (in order to avoid visibility of implementation details) - The read-mode operations are added: VM_OBJECT_RLOCK(), VM_OBJECT_TRYRLOCK(), VM_OBJECT_RUNLOCK(), VM_OBJECT_ASSERT_RLOCKED(), VM_OBJECT_ASSERT_LOCKED() * The vm/vm_pager.h namespace pollution avoidance (forcing requiring sys/mutex.h in consumers directly to cater its inlining functions using VM_OBJECT_LOCK()) imposes that all the vm/vm_pager.h consumers now must include also sys/rwlock.h. * zfs requires a quite convoluted fix to include FreeBSD rwlocks into the compat layer because the name clash between FreeBSD and solaris versions must be avoided. At this purpose zfs redefines the vm_object locking functions directly, isolating the FreeBSD components in specific compat stubs. The KPI results heavilly broken by this commit. Thirdy part ports must be updated accordingly (I can think off-hand of VirtualBox, for example). Sponsored by: EMC / Isilon storage division Reviewed by: jeff Reviewed by: pjd (ZFS specific review) Discussed with: alc Tested by: pho	2013-03-09 02:32:23 +00:00
Bryan Venteicher	0cfbcf8c7b	Remove the virtio dependency entry for the VirtIO device drivers. This will prevent the kernel from linking if the device driver are included without the virtio module. Remove pci and scbus for the same reason. Also explain the relationship and necessity of the virtio and virtio_pci modules. Currently in FreeBSD, we only support VirtIO PCI, but it could be replaced with a different interface (like MMIO) and the device (network, block, etc) will still function. Requested by: luigi Approved by: grehan (mentor) MFC after: 3 days	2013-03-06 07:17:53 +00:00
Kenneth D. Merry	3a45b4781a	Re-enable CTL in GENERIC on i386 and amd64, but turn on the CTL disable tunable by default. This will allow GENERIC configurations to boot on small memory boxes, but not require end users who want to use CTL to recompile their kernel. They can simply set kern.cam.ctl.disable=0 in loader.conf. The eventual solution to the memory usage problem is to change the way CTL allocates memory to be more configurable, but this should fix things for small memory situations in the mean time. UPDATING: Explain the change in the CTL configuration, and how users can enable CTL if they would like to use it. sys/conf/options: Add a new option, CTL_DISABLE, that prevents CTL from initializing. ctl.c: If CTL_DISABLE is turned on, don't initialize. i386/conf/GENERIC, amd64/conf/GENERIC: Re-enable device ctl, and add the CTL_DISABLE option.	2013-03-04 21:18:45 +00:00
Attilio Rao	b38d37f7b5	Merge from vmc-playground branch: Rename the pv_entry_t iterator from pv_list to pv_next. Besides being more correct technically (as the name seems to suggest this is a list while it is an iterator), it will also be needed by vm_radix work to avoid a nameclash on macro expansions. Sponsored by: EMC / Isilon storage division Reviewed by: alc, jeff Tested by: flo, pho, jhb, davide	2013-03-02 14:19:08 +00:00
Adrian Chadd	fe138cc2af	Disable the ctl driver in GENERIC. It unfortunately steals a fair chunk of RAM at startup even if it's not actively used, which prevents FreeBSD VMs of 128MB from successfully booting and running.	2013-03-02 08:12:41 +00:00
Davide Italiano	acccf7d8b4	MFcalloutng: When CPU becomes idle, cpu_idleclock() calculates time to the next timer event in order to reprogram hw timer. Return that time in sbintime_t to the caller and pass it to acpi_cpu_idle(), where it can be used as one more factor (quite precise) to extimate furter sleep time and choose optimal sleep state. This is a preparatory change for further callout improvements will be committed in the next days. The commmit is not targeted for MFC.	2013-02-28 10:46:54 +00:00
Attilio Rao	dc1558d1cd	Merge from vmobj-rwlock: VM_OBJECT_LOCKED() macro is only used to implement a custom version of lock assertions right now (which likely spread out thanks to copy and paste). Remove it and implement actual assertions. Sponsored by: EMC / Isilon storage division Reviewed by: alc Tested by: pho	2013-02-27 18:12:13 +00:00
Konstantin Belousov	31a53cd036	Convert machine/elf.h, machine/frame.h, machine/sigframe.h, machine/signal.h and machine/ucontext.h into common x86 includes, copying from amd64 and merging with i386. Kernel-only compat definitions are kept in the i386/include/sigframe.h and i386/include/signal.h, to reduce amd64 kernel namespace pollution. The amd64 compat uses its own definitions so far. The _MACHINE_ELF_WANT_32BIT definition is to allow the sys/boot/userboot/userboot/elf32_freebsd.c to use i386 ELF definitions on the amd64 compile host. The same hack could be usefully abused by other code too.	2013-02-20 17:39:52 +00:00
Jung-uk Kim	00a54dfb1c	Consistently use round_page(x) rather than roundup(x, PAGE_SIZE). There is no functional change.	2013-02-15 22:43:08 +00:00
Konstantin Belousov	bf94adb3e1	Print slightly more useful information on the 'bad pte' panic. No objections from: alc MFC after: 1 week	2013-02-14 19:22:15 +00:00
Konstantin Belousov	252b1f6e22	Assert that user address is never qremoved. No objections from: alc MFC after: 1 week	2013-02-14 19:21:20 +00:00
Neel Natu	25448de222	Requests for invalid CPUID leaves should map to the highest known leaf instead. Reviewed by: grehan Obtained from: NetApp	2013-02-13 23:22:17 +00:00
Neel Natu	485b3300cc	Implement guest vcpu pinning using 'pthread_setaffinity_np(3)'. Prior to this change pinning was implemented via an ioctl (VM_SET_PINNING) that called 'sched_bind()' on behalf of the user thread. The ULE implementation of 'sched_bind()' bumps up 'td_pinned' which in turn runs afoul of the assertion '(td_pinned == 0)' in userret(). Using the cpuset affinity to implement pinning of the vcpu threads works with both 4BSD and ULE schedulers and has the happy side-effect of getting rid of a bunch of code in vmm.ko. Discussed with: grehan	2013-02-11 20:36:07 +00:00
Neel Natu	6d62a48f47	Compute the number of initial kernel page table pages (NKPT) dynamically. This eliminates the need to recompile the kernel when the default value of NKPT is not big enough - for e.g. when loading large kernel modules or memory disk images from the loader. If NKPT is defined in the kernel configuration file then it overrides the dynamic calculation. Reviewed by: alc, kib	2013-02-06 04:53:00 +00:00
Andriy Gapon	1a89ca4cf5	cpususpend_handler: mark AP as resumed only after fully setting up lapic Reviewed by: jhb Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>, KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> MFC after: 12 days	2013-02-02 12:04:32 +00:00
Andriy Gapon	548b201607	x86 suspend/resume: suspend pics and pseudo-pics in reverse order - change 'pics' from STAILQ to TAILQ - ensure that Local APIC is always first in 'pics' Reviewed by: jhb Tested by: Sergey V. Dyatko <sergey.dyatko@gmail.com>, KAHO Toshikazu <kaho@elam.kais.kyoto-u.ac.jp> MFC after: 12 days	2013-02-02 12:02:42 +00:00
Eitan Adler	4752ed3d7f	Remove support for plip from the GENERIC kernel as no systems in the last 10 years require this support. Discussed with: db Discussed with: kib Reviewed by: imp Reviewed by: jhb Reviewed by: -hackers Approved by: cperciva (mentor)	2013-02-01 20:17:11 +00:00
Neel Natu	2b89a04496	Fix a broken assumption in the passthru implementation that the MSI-X table can only be located at the beginning or the end of the BAR. If the MSI-table is located in the middle of a BAR then we will split the BAR into two and create two mappings - one before the table and one after the table - leaving a hole in place of the table so accesses to it can be trapped and emulated. Obtained from: NetApp	2013-02-01 03:49:09 +00:00
Neel Natu	07044a96d8	Increase the number of passthru devices supported by bhyve. The maximum length of an environment variable puts a limitation on the number of passthru devices that can be specified via a single variable. The workaround is to allow user to specify passthru devices via multiple environment variables instead of a single one. Obtained from: NetApp	2013-02-01 01:16:26 +00:00
Neel Natu	8faceb3292	Add emulation support for instruction "88/r: mov r/m8, r8". This instruction moves a byte from a register to a memory location. Tested by: tycho nightingale at pluribusnetworks com	2013-01-30 04:09:09 +00:00
John Baldwin	d825ce0a5d	Reduce duplication between i386/linux/linux.h and amd64/linux32/linux.h by moving bits that are MI out into headers in compat/linux. Reviewed by: Chagin Dmitry dmitry \| gmail MFC after: 2 weeks	2013-01-29 18:41:30 +00:00
Peter Grehan	1fb0ea3f1a	Always allow access to the sysenter cs/esp/eip MSRs since they are automatically saved and restored in the VMCS. Reviewed by: neel Obtained from: NetApp	2013-01-25 21:38:31 +00:00
John Baldwin	fb709557a3	Don't assume that all Linux TCP-level socket options are identical to FreeBSD TCP-level socket options (only the first two are). Instead, using a mapping function and fail unsupported options as we do for other socket option levels. MFC after: 2 weeks	2013-01-23 21:44:48 +00:00
Neel Natu	e3f0800bd1	Postpone vmm module initialization until after SMP is initialized - particularly that 'smp_started != 0'. This is required because the VT-x initialization calls smp_rendezvous() to set the CR4_VMXE bit on all the cpus. With this change we can preload vmm.ko from the loader. Reported by: alfred@, sbruno@ Obtained from: NetApp	2013-01-21 01:33:10 +00:00
Neel Natu	912a3e678a	Add svn properties to the recently merged bhyve source files. The pre-commit hook will not allow any commits without the svn:keywords property in head.	2013-01-20 03:42:49 +00:00
Neel Natu	c458fc1ed4	Merge projects/bhyve to head. 'bhyve' was developed by grehan@ and myself at NetApp (thanks!). Special thanks to Peter Snyder, Joe Caradonna and Michael Dexter for their support and encouragement. Obtained from: NetApp	2013-01-19 04:18:52 +00:00
John Baldwin	b5821c6f0e	Fix build with SMP disabled.` Reported by: bf	2013-01-19 01:18:22 +00:00
John Baldwin	f876ffeae3	Don't attempt to use clflush on the local APIC register window. Various CPUs exhibit bad behavior if this is done (Intel Errata AAJ3, hangs on Pentium-M, and trashing of the local APIC registers on a VIA C7). The local APIC is implicitly mapped UC already via MTRRs, so the clflush isn't necessary anyway. MFC after: 2 weeks	2013-01-17 21:32:25 +00:00
Neel Natu	c2217b9848	IFC @ r245509	2013-01-17 07:04:37 +00:00
Bryan Venteicher	ae366ffcbd	Add VirtIO to the i386 and amd64 GENERIC kernels This also removes the kludge from r239009 that covered only the network driver. Reviewed by: grehan Approved by: grehan (mentor) MFC after: 1 week	2013-01-13 07:14:16 +00:00
Neel Natu	8a60b77db8	IFC @ r245205	2013-01-09 03:32:23 +00:00
Neel Natu	1b54fbe69d	IFC @ r245178	2013-01-09 02:26:50 +00:00
Neel Natu	95102a8bcb	Add a "pause" to busy wait loops in the cpu reset path. This should not matter much when running on bare metal but it makes the guest more friendly when running inside a virtual machine. Discussed with: jhb Obtained from: NetApp	2013-01-09 02:11:16 +00:00
Neel Natu	03429e45a7	Revert changes for x2apic support from projects/bhyve. During the early days of bhyve it did not support instruction emulation which necessitated the use of x2apic to access the local apic. This is no longer the case and the dependency on x2apic has gone away. The x2apic patches can be considered independently of bhyve and will be merged into head via projects/x2apic. Discussed with: grehan	2013-01-06 05:37:26 +00:00
Neel Natu	2d28bff346	bhyve does not require a custom configuration file anymore so make the GENERIC identical to the one in HEAD. Obtained from: NetApp	2013-01-05 03:35:30 +00:00
Neel Natu	46b1c55d9e	IFC @ r244983.	2013-01-04 19:28:32 +00:00
Neel Natu	23ce7fedb4	There is no need for a special 'BHYVE' kernel configuration file anymore - 'GENERIC' works fine. Obtained from: NetApp	2013-01-04 03:02:43 +00:00
Neel Natu	014a52f3a6	There is no need for 'start_emulating()' and 'stop_emulating()' to be defined in <machine/cpufunc.h> so remove them from there. Obtained from: NetApp	2013-01-04 02:49:12 +00:00
Neel Natu	5f0677d392	The "unrestricted guest" capability is a feature of Intel VT-x that allows the guest to execute real or unpaged protected mode code - bhyve relies on this feature to execute the AP bootstrap code. Get rid of the hack that allowed bhyve to support SMP guests on processors that do not have the "unrestricted guest" capability. This hack was entirely FreeBSD-specific and would not work with any other guest OS. Instead, limit the number of vcpus to 1 when executing on processors without "unrestricted guest" capability. Suggested by: grehan Obtained from: NetApp	2013-01-04 02:04:41 +00:00
Konstantin Belousov	0dcbedfa61	Enable the UFS quotas for big-iron GENERIC kernels. Discussed with: mckusick MFC after: 2 weeks	2013-01-03 19:03:41 +00:00
Dag-Erling Smørgrav	36fca20f10	As discussed on -current last October, remove the firewire drivers from GENERIC.	2013-01-03 14:30:24 +00:00
Neel Natu	485f986ac9	Modify the default behavior of bhyve such that it no longer forces the use of x2apic mode on the guest. The guest can decide whether or not it wants to use legacy mmio or x2apic access to the APIC by writing to the MSR_APICBASE register. Obtained from: NetApp	2012-12-16 01:20:08 +00:00
Neel Natu	682b847ede	Prefer x2apic mode when running inside a virtual machine. Provide a tunable 'machdep.x2apic_desired' to let the administrator override the default behavior. Provide a read-only sysctl 'machdep.x2apic' to let the administrator know whether the kernel is using x2apic or legacy mmio to access local apic. Tested with Parallels Desktop 8 and bhyve hypervisors. Also tested running on bare metal Intel Xeon E5-2658. Obtained from: NetApp Discussed with: jhb, attilio, avg, grehan	2012-12-16 00:57:14 +00:00
Jim Harris	f2fcc434ee	Revert r243960 based on feedback regarding keeping x86 headers unified (mdf@, tijl@) and use of KASSERT/systm.h in bus.h (zeising@, bde@). Alternate implementation will be made in a separate commit.	2012-12-13 21:27:20 +00:00
Peter Grehan	2741efeca0	Implement an API to allow a hypervisor to save/restore guest floating point state without having to know the size of floating-point state. Unstaticize fpurestore to allow the hypervisor to save/restore guest state using fpusave/fpurestore on the allocated FPU state area. Reviewed by: kib Obtained from: NetApp/bhyve MFC after: 1 week	2012-12-12 08:35:32 +00:00
Konstantin Belousov	737d12b397	Add amd64-specific ddb command "show pte". The command displays the hierarchy of the page table entries which map the specified address. Reviewed by: alc (previous version) Sponsored by: The FreeBSD Foundation MFC after: 1 week	2012-12-10 05:14:34 +00:00
Jim Harris	71a30c4436	Add amd64 implementations for 8-byte bus_space routines. Submitted by: Carl Delsey <carl.r.delsey@intel.com> Discussed with: jhb, rwatson Reviewed by: jimharris MFC after: 1 week	2012-12-06 22:33:31 +00:00
Neel Natu	32531ccb84	IFC @r243836	2012-12-04 04:37:42 +00:00
Konstantin Belousov	349438a243	Print the frame addresses for the backtraces on i386 and amd64. It allows both to inspect the frame sizes and to manually peek into the frames from ddb, if needed. Reviewed by: dim MFC after: 2 weeks	2012-12-03 22:16:51 +00:00
Jung-uk Kim	7609e73ca0	Remove duplicate code. Reduce diff between amd64 and i386.	2012-12-01 00:56:19 +00:00
Jung-uk Kim	8c2b353ead	Use volatile keywords properly.	2012-11-30 20:15:01 +00:00
Peter Grehan	e6f1f347a1	Properly screen for the AND 0x81 instruction from the set of group1 0x81 instructions that use the reg bits as an extended opcode. Still todo: properly update rflags. Pointed out by: jilles@	2012-11-30 05:40:24 +00:00
Jung-uk Kim	231ac244f8	Tidy up inline assembly. No functional change.	2012-11-30 00:59:37 +00:00
Peter Grehan	b1f95796f0	Remove debug printf. Pointed out by: emaste	2012-11-29 15:08:13 +00:00
Peter Grehan	3b2b001107	Add support for the 0x81 AND instruction, now generated by clang in the local APIC code. 0x81 is a read-modify-write instruction - the EPT check that only allowed read or write and not both has been relaxed to allow read and write. Reviewed by: neel Obtained from: NetApp	2012-11-29 06:26:42 +00:00
Neel Natu	48a29f4e07	Cleanup the user-space paging exit handler now that the unified instruction emulation is in place. Obtained from: NetApp	2012-11-28 13:34:44 +00:00
Neel Natu	b42206f300	Change emulate_rdmsr() and emulate_wrmsr() to return 0 on sucess and errno on failure. The conversion from the return value to HANDLED or UNHANDLED can be done locally in vmx_exit_process(). Obtained from: NetApp	2012-11-28 13:10:18 +00:00
Neel Natu	ba9b7bf73a	Revamp the x86 instruction emulation in bhyve. On a nested page table fault the hypervisor will: - fetch the instruction using the guest %rip and %cr3 - decode the instruction in 'struct vie' - emulate the instruction in host kernel context for local apic accesses - any other type of mmio access is punted up to user-space (e.g. ioapic) The decoded instruction is passed as collateral to the user-space process that is handling the PAGING exit. The emulation code is fleshed out to include more addressing modes (e.g. SIB) and more types of operands (e.g. imm8). The source code is unified into a single file (vmm_instruction_emul.c) that is compiled into vmm.ko as well as /usr/sbin/bhyve. Reviewed by: grehan Obtained from: NetApp	2012-11-28 00:02:17 +00:00
Neel Natu	920bc34090	Fix a bug in the MSI-X resource allocation for PCI passthrough devices. In the case where the underlying host had disabled MSI-X via the "hw.pci.enable_msix" tunable, the ppt_setup_msix() function would fail and return an error without properly cleaning up. This in turn would cause a page fault on the next boot of the guest. Fix this by calling ppt_teardown_msix() in all the error return paths. Obtained from: NetApp	2012-11-22 04:07:18 +00:00
Neel Natu	288aeb8561	Get rid of redundant comparision which is guaranteed to be "true" for unsigned integers. Obtained from: NetApp	2012-11-22 00:08:20 +00:00
Peter Grehan	a0cad47092	Handle CPUID leaf 0x7 now that FreeBSD is using it. Return 0's for now. Reviewed by: neel Obtained from: NetApp	2012-11-20 06:01:03 +00:00
Neel Natu	3248464555	IFC @ r243164	2012-11-17 02:55:47 +00:00
Konstantin Belousov	43f48b65c0	Move the declaration of vm_phys_paddr_to_vm_page() from vm/vm_page.h to vm/vm_phys.h, where it belongs. Requested and reviewed by: alc MFC after: 2 weeks	2012-11-16 05:55:56 +00:00
Konstantin Belousov	b32ecf44bc	Flip the semantic of M_NOWAIT to only require the allocation to not sleep, and perform the page allocations with VM_ALLOC_SYSTEM class. Previously, the allocation was also allowed to completely drain the reserve of the free pages, being translated to VM_ALLOC_INTERRUPT request class for vm_page_alloc() and similar functions. Allow the caller of malloc* to request the 'deep drain' semantic by providing M_USE_RESERVE flag, now translated to VM_ALLOC_INTERRUPT class. Previously, it resulted in less aggressive VM_ALLOC_SYSTEM allocation class. Centralize the translation of the M_* malloc(9) flags in the single inline function malloc2vm_flags(). Discussion started by: "Sears, Steven" <Steven.Sears@netapp.com> Reviewed by: alc, mdf (previous version) Tested by: pho (previous version) MFC after: 2 weeks	2012-11-14 20:01:40 +00:00
Neel Natu	7d3d462b09	IFC @ r242940	2012-11-13 07:39:05 +00:00
Neel Natu	a10c6f5544	IFC @ r242684	2012-11-11 03:26:14 +00:00
Konstantin Belousov	5a17538e22	Do not try to enable new features in the %cr4 if running under hypervisor. Apparently, hypervisors failed to filter out 'Standard Extended Features' report from CPUID, but deliver #gp when corresponding bit in %cr4 is toggled. This shall be reconsidered later, after hypervisors correct the bug. Reported and tested by: joel Reviewed by: avg MFC after: 2 weeks	2012-11-09 16:00:30 +00:00
Peter Grehan	0a5e9bfb72	Fix issue found with clang build. Avoid code insertion by the compiler between inline asm statements that would in turn modify the flags value set by the first asm, and used by the second. Solve by making the common error block a string that can be pulled into the first inline asm, and using symbolic labels for asm variables. bhyve can now build/run fine when compiled with clang. Reviewed by: neel Obtained from: NetApp	2012-11-06 02:43:41 +00:00

1 2 3 4 5 ...

6430 Commits