freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-21 11:13:30 +00:00

Author	SHA1	Message	Date
Edward Tomasz Napierala	a2f510e8ec	Fix comment intentation.	2010-12-04 17:41:58 +00:00
Warner Losh	6f1a8765be	To make minidumps work properly on mips for memory that's direct mapped and entered via vm_page_setup, keep track of it like we do for amd64. # A separate commit will be made to move this to a capability-based ifdef # rather than arch-based ifdef. Submitted by: alc@ MFC after: 1 week	2010-12-03 04:39:48 +00:00
Edward Tomasz Napierala	ef694c1ac4	Replace pointer to "struct uidinfo" with pointer to "struct ucred" in "struct vm_object". This is required to make it possible to account for per-jail swap usage. Reviewed by: kib@ Tested by: pho@ Sponsored by: FreeBSD Foundation	2010-12-02 17:37:16 +00:00
Alan Cox	05cb58f669	Correct an error in the allocation of the vm_page_dump array in vm_page_startup(). Specifically, the dump_avail array should be used instead of the phys_avail array to calculate the size of vm_page_dump. For example, the pages for the message buffer are allocated prior to vm_page_startup() by subtracting them from the last entry in the phys_avail array, but the first thing that vm_page_startup() does after creating the vm_page_dump array is to set the bits corresponding to the message buffer pages in that array. However, these bits might not actually exist in the array, because the size of the array is determined by the current value in the last entry of the phys_avail array. In general, the only reason why this doesn't always result in an out-of-bounds array access is that the size of the vm_page_dump array is rounded up to the next page boundary. This change eliminates that dependence on rounding (and luck). MFC after: 6 weeks	2010-12-01 03:35:19 +00:00
Jayachandran C.	aa54636620	Fix issue noted by alc while reviewing r215938: The current implementation of vm_page_alloc_freelist() does not handle order > 0 correctly. Remove order parameter to the function and use it only for order 0 pages. Submitted by: alc	2010-11-28 05:51:31 +00:00
Konstantin Belousov	780636b72a	After the sleep caused by encountering a busy page, relookup the page. Submitted and reviewed by: alc Reprted and tested by: pho MFC after: 5 days	2010-11-24 12:25:17 +00:00
Konstantin Belousov	3157c50313	Eliminate the mab, maf arrays and related variables. The change also fixes off-by-one error in the calculation of mreq. Suggested and reviewed by: alc Tested by: pho MFC after: 5 days	2010-11-21 10:18:28 +00:00
Alan Cox	17ea6f00d5	Optimize vm_object_terminate(). Reviewed by: kib MFC after: 1 week	2010-11-20 22:30:09 +00:00
Konstantin Belousov	4c7b9a2063	The runlen returned from vm_pageout_flush() might be zero legitimately, when mreq page has status VM_PAGER_AGAIN. MFC after: 5 days	2010-11-20 17:27:38 +00:00
Alan Cox	00f8bffc22	Reduce the amount of detail printed by vm_page_free_toq() when it panics. Reviewed by: kib	2010-11-19 17:49:08 +00:00
Max Laier	85f2a0c91e	Off by one page in vm_reserv_reclaim_contig(): Also reclaim reservations with only a single free page if that satisfies the requested size. MFC after: 3 days Reviewed by: alc	2010-11-19 04:30:33 +00:00
Konstantin Belousov	1e8a675c73	vm_pageout_flush() might cache the pages that finished write to the backing storage. Such pages might be then reused, racing with the assert in vm_object_page_collect_flush() that verified that dirty pages from the run (most likely, pages with VM_PAGER_AGAIN status) are write-protected still. In fact, the page indexes for the pages that were removed from the object page list should be ignored by vm_object_page_clean(). Return the length of successfully written run from vm_pageout_flush(), that is, the count of pages between requested page and first page after requested with status VM_PAGER_AGAIN. Supply the requested page index in the array to vm_pageout_flush(). Use the returned run length to forward the index of next page to clean in vm_object_page_clean(). Reported by: avg Reviewed by: alc MFC after: 1 week	2010-11-18 21:09:02 +00:00
Konstantin Belousov	4166faaee0	Only increment object generation count when inserting the page into object page list. The only use of object generation count now is a restart of the scan in vm_object_page_clean(), which makes sense to do on the page addition. Page removals do not affect the dirtiness of the object, as well as manipulations with the shadow chain. Suggested and reviewed by: alc MFC after: 1 week	2010-11-18 20:46:28 +00:00
Konstantin Belousov	7022f954c3	Do not use __FreeBSD_version prefix for the special osrel version. The ports/Mk/bsd.port.mk uses sys/param.h to fetch osrel, and cannot grok several constants with the prefix. Reported and tested by: swell.k gmail com MFC after: 1 week	2010-11-14 21:59:11 +00:00
Konstantin Belousov	94bce4535d	Use symbolic names instead of hardcoding values for magic p_osrel constants. MFC after: 1 week	2010-11-14 18:24:12 +00:00
Konstantin Belousov	9a6d144ff8	Implement a (soft) stack guard page for auto-growing stack mappings. The unmapped page separates the tip of the stack and possible adjanced segment, making some uses of stack overflow harder. The stack growing code refuses to expand the segment to the last page of the reseved region when sysctl security.bsd.stack_guard_page is set to 1. The default value for sysctl and accompanying tunable is 0. Please note that mmap(MAP_FIXED) still can place a mapping right up to the stack, making continuous region. Reviewed by: alc MFC after: 1 week	2010-11-14 17:53:52 +00:00
Alan Cox	2cf36c8f67	Enable reservation-based physical memory allocation. Even without the creation of large page mappings in the pmap, it can provide modest performance benefits. In particular, for a "buildworld" on a 2x 1GHz Ultrasparc IIIi it reduced the wall clock time by 2.2% and the system time by 12.6%. Tested by: marius@	2010-11-10 17:57:34 +00:00
Alan Cox	e48262487a	In case the stack size reaches its limit and its growth must be restricted, ensure that grow_amount is a multiple of the page size. Otherwise, the kernel may crash in swap_reserve_by_uid() on HEAD and FreeBSD 8.x, and produce a core file with a missing stack on FreeBSD 7.x. Diagnosed and reported by: jilles Reviewed by: kib MFC after: 1 week	2010-11-07 21:40:34 +00:00
Oleksandr Tymoshenko	903ba3da86	- Add minidump support for FreeBSD/mips	2010-11-07 03:09:02 +00:00
John Baldwin	e9a069d8af	Update startup_alloc() to support multi-page allocations and allow internal zones whose objects are larger than a page to use startup_alloc(). This allows allocation of zone objects during early boot on machines with a large number of CPUs since the resulting zone objects are larger than a page. Submitted by: trema Reviewed by: attilio MFC after: 1 week	2010-11-04 15:33:50 +00:00
Alan Cox	d689bc0082	Correct some format strings used by sysctls. MFC after: 1 week	2010-10-30 18:00:53 +00:00
John Baldwin	1a587ef2a5	- Make 'vm_refcnt' volatile so that compilers won't be tempted to treat its value as a loop invariant. Currently this is a no-op because 'atomic_cmpset_int()' clobbers all memory on current architectures. - Use atomic_fetchadd_int() instead of an atomic_cmpset_int() loop to drop a reference in vmspace_free(). Reviewed by: alc MFC after: 1 month	2010-10-21 17:29:32 +00:00
Andriy Gapon	55144670c2	PG_BUSY -> VPO_BUSY, PG_WANTED -> VPO_WANTED in manual pages and comments Reviewed by: alc MFC after: 4 days	2010-10-20 05:17:23 +00:00
Matthew D Fleming	20ed0cb0c6	uma_zfree(zone, NULL) should do nothing, to match free(9). Noticed by: Ron Steinke <rsteinke at isilon dot com> MFC after: 3 days	2010-10-19 16:06:00 +00:00
Lawrence Stewart	1c6cae9711	Change uma_zone_set_max to return the effective value of "nitems" after rounding. The same value can also be obtained with uma_zone_get_max, but this change avoids a caller having to make two back-to-back calls. Sponsored by: FreeBSD Foundation Reviewed by: gnn, jhb	2010-10-16 04:41:45 +00:00
Lawrence Stewart	c4ae7908a7	- Simplify implementation of uma_zone_get_max. - Add uma_zone_get_cur which returns the current approximate occupancy of a zone. This is useful for providing stats via sysctl amongst other things. Sponsored by: FreeBSD Foundation Reviewed by: gnn, jhb MFC after: 2 weeks	2010-10-16 04:14:45 +00:00
Alan Cox	f8616ebfae	If vm_map_find() is asked to allocate a superpage-aligned region of virtual addresses that is greater than a superpage in size but not a multiple of the superpage size, then vm_map_find() is not always expanding the kernel pmap to support the last few small pages being allocated. These failures are not commonplace, so this was first noticed by someone porting FreeBSD to a new architecture. Previously, we grew the kernel page table in vm_map_findspace() when we found the first available virtual address. This works most of the time because we always grow the kernel pmap or page table by an amount that is a multiple of the superpage size. Now, instead, we defer the call to pmap_growkernel() until we are committed to a range of virtual addresses in vm_map_insert(). In general, there is another reason to prefer calling pmap_growkernel() in vm_map_insert(). It makes it possible for someone to do the equivalent of an mmap(MAP_FIXED) on the kernel map. Reported by: Svatopluk Kraus Reviewed by: kib@ MFC after: 3 weeks	2010-10-04 16:49:40 +00:00
Matthew D Fleming	d69b01efc2	Replace an XXX comment with the appropriate code. Submitted by: alc	2010-09-20 20:41:59 +00:00
Alan Cox	da0483096d	Allow a POSIX shared memory object that is opened for read but not for write to nonetheless be mapped PROT_WRITE and MAP_PRIVATE, i.e., copy-on-write. (This is a regression in the new implementation of POSIX shared memory objects that is used by HEAD and RELENG_8. This bug does not exist in RELENG_7's user-level, file-based implementation.) PR: 150260 MFC after: 3 weeks	2010-09-19 19:42:04 +00:00
Alan Cox	8304adaac6	Make refinements to r212824. In particular, don't make vm_map_unlock_nodefer() part of the synchronization interface for maps. Add comments to vm_map_unlock_and_wait() and vm_map_wakeup() describing how they should be used. In particular, describe the deferred deallocations issue with vm_map_unlock_and_wait(). Redo the implementation of vm_map_unlock_and_wait() so that it passes along the caller's file and line information, just like the other map locking primitives. Reviewed by: kib X-MFC after: r212824	2010-09-19 17:43:22 +00:00
Konstantin Belousov	0b367bd8c0	Adopt the deferring of object deallocation for the deleted map entries on map unlock to the lock downgrade and later read unlock operation. System map entries cannot be backed by OBJT_VNODE objects, no need to defer deallocation for them. Map entries from user maps do not require the owner map for deallocation, and can be accumulated in the thread-local list for freeing when a user map is unlocked. Move the collection of entries for deferred reclamation into vm_map_delete(). Create helper vm_map_process_deferred(), that is called from locations where processing is feasible. Do not process deferred entries in vm_map_unlock_and_wait() since map_sleep_mtx is held. Reviewed by: alc, rstone (previous versions) Tested by: pho MFC after: 2 weeks	2010-09-18 15:03:31 +00:00
Matthew D Fleming	4e6571599b	Re-add r212370 now that the LOR in powerpc64 has been resolved: Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough SBUF_FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk (original patch)	2010-09-16 16:13:12 +00:00
Matthew D Fleming	404a593e28	Revert r212370, as it causes a LOR on powerpc. powerpc does a few unexpected things in copyout(9) and so wiring the user buffer is not sufficient to perform a copyout(9) while holding a random mutex. Requested by: nwhitehorn	2010-09-13 18:48:23 +00:00
Matthew D Fleming	dd67e2103c	Add a drain function for struct sysctl_req, and use it for a variety of handlers, some of which had to do awkward things to get a large enough FIXEDLEN buffer. Note that some sysctl handlers were explicitly outputting a trailing NUL byte. This behaviour was preserved, though it should not be necessary. Reviewed by: phk	2010-09-09 18:33:46 +00:00
Nathan Whitehorn	42768fec0f	On architectures with non-tree-based page tables like PowerPC, every page in a range must be checked when calling pmap_remove(). Calling pmap_remove() from vm_pageout_map_deactivate_pages() with the entire range of the map could result in attempting to demap an extraordinary number of pages (> 10^15), so iterate through each map entry and unmap each of them individually. MFC after: 6 weeks	2010-09-09 13:32:58 +00:00
Ryan Stone	d473d3a164	Fix a typo in r212281. uintptr -> uintptr_t Pointy hat to: rstone Approved by: emaste (mentor) MFC after: 2 weeks	2010-09-07 02:51:11 +00:00
Ryan Stone	0d41964095	In munmap() downgrade the vm_map_lock to a read lock before taking a read lock on the pmc-sx lock. This prevents a deadlock with pmc_log_process_mappings, which has an exclusive lock on pmc-sx and tries to get a read lock on a vm_map. Downgrading the vm_map_lock in munmap allows pmc_log_process_mappings to continue, preventing the deadlock. Without this change I could cause a deadlock on a multicore 8.1-RELEASE system by having one thread constantly mmap'ing and then munmap'ing a PROT_EXEC mapping in a loop while I repeatedly invoked and stopped pmcstat in system-wide sampling mode. Reviewed by: fabient Approved by: emaste (mentor) MFC after: 2 weeks	2010-09-07 00:23:45 +00:00
Andriy Gapon	a9b89cf1c1	vm_page.c: include opt_msgbuf.h for MSGBUF_SIZE use in vm_page_startup vm_page_startup uses MSGBUF_SIZE value for adding msgbuf pages to minidump. If opt_msgbuf.h is not included and MSGBUF_SIZE is overriden in kernel config, then not all msgbuf pages will be dumped. And most importantly, struct msgbuf itself will not be included. Thus the dump would look corrupted/incomplete to tools like kgdb, dmesg, etc that try to access struct msgbuf as one of the first things they do when working on a crash dump. MFC after: 5 days	2010-09-03 10:40:53 +00:00
Matthew D Fleming	a2a200a24d	Have memguard(9) crash with an easier-to-debug message on double-free. Reviewed by: zml MFC after: 3 weeks	2010-08-31 17:43:47 +00:00
Matthew D Fleming	6d3ed393d6	The realloc case for memguard(9) will copy too many bytes when reallocating to a smaller-sized allocation. Fix this issue. Noticed by: alc Reviewed by: alc Approved by: zml (mentor) MFC after: 3 weeks	2010-08-31 16:57:58 +00:00
Alan Cox	74ffb9af15	Add the MAP_PREFAULT_READ option to mmap(2). Reviewed by: jhb, kib	2010-08-28 16:57:07 +00:00
Andre Oppermann	e49471b04b	Add uma_zone_get_max() to obtain the effective limit after a call to uma_zone_set_max(). The UMA zone limit is not exactly set to the value supplied but rounded up to completely fill the backing store increment (a page normally). This can lead to surprising situations where the number of elements allocated from UMA is higher than the supplied limit value. The new get function reads back the effective value so that the supplied limit value can be adjusted to the real limit. Reviewed by: jeffr MFC after: 1 week	2010-08-16 14:24:00 +00:00
Matthew D Fleming	f02d86e269	Fix compile. It seemed better to have memguard.c include opt_vm.h in case future compile-time knobs were added that it wants to use. Also add include guards and forward declarations to vm/memguard.h. Approved by: zml (mentor) MFC after: 1 month	2010-08-12 16:54:43 +00:00
Matthew D Fleming	e3813573bd	Rework memguard(9) to reserve significantly more KVA to detect use-after-free over a longer time. Also release the backing pages of a guarded allocation at free(9) time to reduce the overhead of using memguard(9). Allow setting and varying the malloc type at run-time. Add knobs to allow: - randomly guarding memory - adding un-backed KVA guard pages to detect underflow and overflow - a lower limit on the size of allocations that are guarded Reviewed by: alc Reviewed by: brueffer, Ulrich Spörlein <uqs spoerlein net> (man page) Silence from: -arch Approved by: zml (mentor) MFC after: 1 month	2010-08-11 22:10:37 +00:00
Konstantin Belousov	3979450b4c	Add new make_dev_p(9) flag MAKEDEV_ETERNAL to inform devfs that created cdev will never be destroyed. Propagate the flag to devfs vnodes as VV_ETERNVALDEV. Use the flags to avoid acquiring devmtx and taking a thread reference on such nodes. In collaboration with: pho MFC after: 1 month	2010-08-06 09:42:15 +00:00
John Baldwin	a3870a1826	Very rough first cut at NUMA support for the physical page allocator. For now it uses a very dumb first-touch allocation policy. This will change in the future. - Each architecture indicates the maximum number of supported memory domains via a new VM_NDOMAIN parameter in <machine/vmparam.h>. - Each cpu now has a PCPU_GET(domain) member to indicate the memory domain a CPU belongs to. Domain values are dense and numbered from 0. - When a platform supports multiple domains, the default freelist (VM_FREELIST_DEFAULT) is split up into N freelists, one for each domain. The MD code is required to populate an array of mem_affinity structures. Each entry in the array defines a range of memory (start and end) and a domain for the range. Multiple entries may be present for a single domain. The list is terminated by an entry where all fields are zero. This array of structures is used to split up phys_avail[] regions that fall in VM_FREELIST_DEFAULT into per-domain freelists. - Each memory domain has a separate lookup-array of freelists that is used when fulfulling a physical memory allocation. Right now the per-domain freelists are listed in a round-robin order for each domain. In the future a table such as the ACPI SLIT table may be used to order the per-domain lookup lists based on the penalty for each memory domain relative to a specific domain. The lookup lists may be examined via a new vm.phys.lookup_lists sysctl. - The first-touch policy is implemented by using PCPU_GET(domain) to pick a lookup list when allocating memory. Reviewed by: alc	2010-07-27 20:33:50 +00:00
Edward Tomasz Napierala	fd6f4ffb27	Fix commented out resource limit check in mlockall(2). It's still racy, but at least less misleading.	2010-07-27 19:26:18 +00:00
Alan Cox	2af6e14d39	Introduce exec_alloc_args(). The objective being to encapsulate the details of the string buffer allocation in one place. Eliminate the portion of the string buffer that was dedicated to storing the interpreter name. The pointer to the interpreter name can simply be made to point to the appropriate argument string. Reviewed by: kib	2010-07-27 17:31:03 +00:00
Alan Cox	9e4e511499	Change the order in which the file name, arguments, environment, and shell command are stored in exec*()'s demand-paged string buffer. For a "buildworld" on an 8GB amd64 multiprocessor, the new order reduces the number of global TLB shootdowns by 31%. It also eliminates about 330k page faults on the kernel address space. Change exec_shell_imgact() to use "args->begin_argv" consistently as the start of the argument and environment strings. Previously, it would sometimes use "args->buf", which is the start of the overall buffer, but no longer the start of the argument and environment strings. While I'm here, eliminate unnecessary passing of "&length" to copystr(), where we don't actually care about the length of the copied string. Clean up the initialization of the exec map. In particular, use the correct size for an entry, and express that size in the same way that is used when an entry is allocated. The old size was one page too large. (This discrepancy originated in 2004 when I rewrote exec_map_first_page() to use sf_buf_alloc() instead of the exec map for mapping the first page of the executable.) Reviewed by: kib	2010-07-25 17:43:38 +00:00
Jayachandran C.	49ca10d40c	Redo the page table page allocation on MIPS, as suggested by alc@. The UMA zone based allocation is replaced by a scheme that creates a new free page list for the KSEG0 region, and a new function in sys/vm that allocates pages from a specific free page list. This also fixes a race condition introduced by the UMA based page table page allocation code. Dropping the page queue and pmap locks before the call to uma_zfree, and re-acquiring them afterwards will introduce a race condtion(noted by alc@). The changes are : - Revert the earlier changes in MIPS pmap.c that added UMA zone for page table pages. - Add a new freelist VM_FREELIST_HIGHMEM to MIPS vmparam.h for memory that is not directly mapped (in 32bit kernel). Normal page allocations will first try the HIGHMEM freelist and then the default(direct mapped) freelist. - Add a new function 'vm_page_t vm_page_alloc_freelist(int flind, int order, int req)' to vm/vm_page.c to allocate a page from a specified freelist. The MIPS page table pages will be allocated using this function from the freelist containing direct mapped pages. - Move the page initialization code from vm_phys_alloc_contig() to a new function vm_page_alloc_init(), and use this function to initialize pages in vm_page_alloc_freelist() too. - Split the function vm_phys_alloc_pages(int pool, int order) to create vm_phys_alloc_freelist_pages(int flind, int pool, int order), and use this function from both vm_page_alloc_freelist() and vm_phys_alloc_pages(). Reviewed by: alc	2010-07-21 09:27:00 +00:00

1 2 3 4 5 ...

2752 Commits