pmap_clear_modify() on a page is pointless if that page is not mapped or
it is only mapped for read access. Instead, assert that the page is not
mapped or not mapped for write access as appropriate.
Eliminate unnecessary clearing of a page's dirty mask. Instead, assert
that the page's dirty mask is clear.
into the separate function vm_pageout_oom(). Supply a parameter for
vm_pageout_oom() describing a reason for the call.
Call vm_pageout_oom() from the swp_pager_meta_build() when swap zone
is exhausted.
Reviewed by: alc
Tested by: pho, jhb
MFC after: 2 weeks
for the bio for swapout write. It allows the page allocator to drain
free page list deeper. As result, a deadlock where pageout deamon sleeps
waiting for bio to be allocated for swapout is no more reproducable in
practice.
Alan said that M_USE_RESERVE shall be ressurrected and used there, but
until this is implemented, M_NOWAIT does exactly what is needed.
Tested by: pho, kris
Reviewed by: alc
No objections from: phk
MFC after: 2 weeks (RELENG_7 only)
conjuction with 'thread' argument passing which is always curthread.
Remove the unuseful extra-argument and pass explicitly curthread to lower
layer functions, when necessary.
KPI results broken by this change, which should affect several ports, so
version bumping and manpage update will be further committed.
Tested by: kris, pho, Diego Sardina <siarodx at gmail dot com>
Remove this argument and pass curthread directly to underlying
VOP_LOCK1() VFS method. This modify makes the code cleaner and in
particular remove an annoying dependence helping next lockmgr() cleanup.
KPI results, obviously, changed.
Manpage and FreeBSD_version will be updated through further commits.
As a side note, would be valuable to say that next commits will address
a similar cleanup about VFS methods, in particular vop_lock1 and
vop_unlock.
Tested by: Diego Sardina <siarodx at gmail dot com>,
Andrea Di Pasquale <whyx dot it at gmail dot com>
assertion hit in swapoff_one() when we un-mount a swap partition. We
should be using curthread where we used thread0 before. This change
also replaces the thread argument with a credential argument, as the
MAC framework only requires the cred.
It should be noted that this allows the machine to be rebooted without
panicing with "cannot differ from curthread or NULL" when MAC is enabled.
Submitted by: rwatson
Reviewed by: attilio
MFC after: 2 weeks
from Mac OS X Leopard--rationalize naming for entry points to
the following general forms:
mac_<object>_<method/action>
mac_<object>_check_<method/action>
The previous naming scheme was inconsistent and mostly
reversed from the new scheme. Also, make object types more
consistent and remove spaces from object types that contain
multiple parts ("posix_sem" -> "posixsem") to make mechanical
parsing easier. Introduce a new "netinet" object type for
certain IPv4/IPv6-related methods. Also simplify, slightly,
some entry point names.
All MAC policy modules will need to be recompiled, and modules
not updates as part of this commit will need to be modified to
conform to the new KPI.
Sponsored by: SPARTA (original patches against Mac OS X)
Obtained from: TrustedBSD Project, Apple Computer
vm_object_terminate() on a device-backed object at the same time that
another processor, call it Pa, is performing dev_pager_alloc() on the
same device. The problem is that vm_pager_object_lookup() should not be
allowed to return a doomed object, i.e., an object with OBJ_DEAD set,
but it does. In detail, the unfortunate sequence of events is: Pt in
vm_object_terminate() holds the doomed object's lock and sets OBJ_DEAD
on the object. Pa in dev_pager_alloc() holds dev_pager_sx and calls
vm_pager_object_lookup(), which returns the doomed object. Next, Pa
calls vm_object_reference(), which requires the doomed object's lock, so
Pa waits for Pt to release the doomed object's lock. Pt proceeds to the
point in vm_object_terminate() where it releases the doomed object's
lock. Pa is now able to complete vm_object_reference() because it can
now complete the acquisition of the doomed object's lock. So, now the
doomed object has a reference count of one! Pa releases dev_pager_sx
and returns the doomed object from dev_pager_alloc(). Pt now acquires
dev_pager_mtx, removes the doomed object from dev_pager_object_list,
releases dev_pager_mtx, and finally calls uma_zfree with the doomed
object. However, the doomed object is still in use by Pa.
Repeating my key point, vm_pager_object_lookup() must not return a
doomed object. Moreover, the test for the object's state, i.e.,
doomed or not, and the increment of the object's reference count
should be carried out atomically.
Reviewed by: kib
Approved by: re (kensmith)
MFC after: 3 weeks
Now, we assume no more sched_lock protection for some of them and use the
distribuited loads method for vmmeter (distribuited through CPUs).
Reviewed by: alc, bde
Approved by: jeff (mentor)
Probabilly, a general approach is not the better solution here, so we should
solve the sched_lock protection problems separately.
Requested by: alc
Approved by: jeff (mentor)
Change the VOP_OPEN(), vn_open() vnode operation and d_fdopen() cdev operation
argument from being file descriptor index into the pointer to struct file.
Proposed and reviewed by: jhb
Reviewed by: daichi (unionfs)
Approved by: re (kensmith)
vmcnts. This can be used to abstract away pcpu details but also changes
to use atomics for all counters now. This means sched lock is no longer
responsible for protecting counts in the switch routines.
Contributed by: Attilio Rao <attilio@FreeBSD.org>
- Restore support for fetching swap information from crash dumps via
kvm_get_swapinfo(3) to fix pstat -T/-s on crash dumps.
Reviewed by: arch@, phk
MFC after: 1 week
- Add a printf in swp_pager_meta_build() to warn if the swapzone becomes
exhausted so that there's at least a warning before a box that runs out
of swapzone space before running out of swap space deadlocks.
MFC after: 1 week
Reviwed by: alc
specific privilege names to a broad range of privileges. These may
require some future tweaking.
Sponsored by: nCircle Network Security, Inc.
Obtained from: TrustedBSD Project
Discussed on: arch@
Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri,
Alex Lyashkov <umka at sevcity dot net>,
Skip Ford <skip dot ford at verizon dot net>,
Antoine Brodin <antoine dot brodin at laposte dot net>
begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now
contains the userspace and user<->kernel API and definitions, with all
in-kernel interfaces moved to mac_framework.h, which is now included
across most of the kernel instead.
This change is the first step in a larger cleanup and sweep of MAC
Framework interfaces in the kernel, and will not be MFC'd.
Obtained from: TrustedBSD Project
Sponsored by: SPARTA
synchronized by the lock on the object containing the page.
Transition PG_WANTED and PG_SWAPINPROG to use the new field,
eliminating the need for holding the page queues lock when setting
or clearing these flags. Rename PG_WANTED and PG_SWAPINPROG to
VPO_WANTED and VPO_SWAPINPROG, respectively.
Eliminate the assertion that the page queues lock is held in
vm_page_io_finish().
Eliminate the acquisition and release of the page queues lock
around calls to vm_page_io_finish() in kern_sendfile() and
vfs_unbusy_pages().
they are. They should be NULL at this point, except if we're coming from
swapdev_strategy().
It should only affect the case where we're swapping directly on a file over
NFS.
I'm not sure this is the right thing to do, but at least I don't panic
anymore when swapping on a NFS file without using md(4).
X-MFC after: proper review
make the b_iodone callback responsible for setting it if it is needed.
Previously, it was set unconditionally by bufdone() without holding
whichever lock is shared by the b_iodone callback and the corresponding
top-half function. Consequently, in a race, the top-half function could
conclude that operation was done before the b_iodone callback finished.
See, for example, aio_physwakeup() and aio_fphysio().
Note: I don't believe that the other, more widely-used b_iodone callbacks
are affected.
Discussed with: jeff
Reviewed by: phk
MFC after: 2 weeks
swapoff: failed to locate %d swap blocks
The race occurred because putpages() can block between the time it
allocates swap space and the time it updates the swap metadata to
associate that space with a vm_object, so swapoff() would complain
about the temporary inconsistency. I hoped to fix this by making
swp_pager_getswapspace() and swp_pager_meta_build() a single atomic
operation, but that proved to be inconvenient. With this change,
swapoff() simply doesn't attempt to be so clever about detecting when
all the pageout activity to the target device should have drained.
In order to avoid livelock, swapoff() skips over objects with a
nonzero pip count and makes another pass if necessary. Since it is
impossible to know which objects we care about, it would choose an
arbitrary object with a nonzero pip count and wait for it before
making another pass, the theory being that this object would finish
paging about as quickly as the ones we care about. Unfortunately,
we may have slept since we acquired a reference to this object.
Hack around this problem by tsleep()ing on the pointer anyway, but
timeout after a fixed interval. More elegant solutions are possible,
but the ones I considered unnecessarily complicate this rare case.
Also, kill some nits that seem to have crept into the swapoff() code
in the last 75 revisions or so:
- Don't pass both sp and sp->sw_used to swap_pager_swapoff(), since
the latter can be derived from the former.
- Replace swp_pager_find_dev() with something simpler. There's no
need to iterate over the entire list of swap devices just to determine
if a given block is assigned to the one we're interested in.
- Expand the scope of the swhash_mtx in a couple of places so that it
isn't released and reacquired once for every hash bucket.
- Don't drop the swhash_mtx while holding a reference to an object.
We need to lock the object first. Unfortunately, doing so would
violate the established lock order, so use VM_OBJECT_TRYLOCK() and
try again on a subsequent pass if the object is already locked.
- Refactor swp_pager_force_pagein() and swap_pager_swapoff() a bit.
Extend it with a strategy method.
Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.
Rename ibwrite to bufwrite().
Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.
Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().
Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj. Bufobj_wdrop() replaces vwakeup().
Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.
Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
kmem_alloc_pageable(). The difference between these is that an errant
memory access to the zone will be detected sooner with
kmem_alloc_nofault().
The following changes serve to eliminate the following lock-order
reversal reported by witness:
1st 0xc1a3c084 vm object (vm object) @ vm/swap_pager.c:1311
2nd 0xc07acb00 swap_pager swhash (swap_pager swhash) @ vm/swap_pager.c:1797
3rd 0xc1804bdc vm object (vm object) @ vm/uma_core.c:931
There is no potential deadlock in this case. However, witness is unable
to recognize this because vm objects used by UMA have the same type as
ordinary vm objects. To remedy this, we make the following changes:
- Add a mutex type argument to VM_OBJECT_LOCK_INIT().
- Use the mutex type argument to assign distinct types to special
vm objects such as the kernel object, kmem object, and UMA objects.
- Define a static swap zone object for use by UMA. (Only static
objects are assigned a special mutex type.)
case of NFS mounted swap, so do not try to dereference it.
While we're here, brucify the printf() call which happens when we
time out on acquisition of vm_page_queue_mtx.
PR: kern/67898
Submitted by: bde (style)
The big lines are:
NODEV -> NULL
NOUDEV -> NODEV
udev_t -> dev_t
udev2dev() -> findcdev()
Various minor adjustments including handling of userland access to kernel
space struct cdev etc.
allocation and deallocation. This flag's principal use is shortly after
allocation. For such cases, clearing the flag is pointless. The only
unusual use of PG_ZERO is in vfs_bio_clrbuf(). However, allocbuf() never
requests a prezeroed page. So, vfs_bio_clrbuf() never sees a prezeroed
page.
Reviewed by: tegge@
swap_pager_putpages()'s buffer completion code. Note: the only
difference between swp_pager_sync_iodone() and bdone(), aside from
the locking in the latter, was the unnecessary clearing of B_ASYNC.
- Remove an unnecessary pmap_page_protect() from
swp_pager_async_iodone().
Reviewed by: tegge
shown that it is not useful.
Rename the relative count g_access_rel() function to g_access(), only
the name has changed.
Change all g_access_rel() calls in our CVS tree to call g_access() instead.
Add an #ifndef BURN_BRIDGES #define of g_access_rel() for source
code compatibility.
full state. (When swap is added their state will change appropriately.)
2. Set swap_pager_full and swap_pager_almost_full to the full state when
the last swap device is removed.
Combined these changes eliminate nonsense messages from the kernel on swap-
less machines.
Item 2 submitted by: Divacky Roman <xdivac02@stud.fit.vutbr.cz>
Prodding by: phk
destination objects are locked on entry and exit. Add comments to
the callers noting that the locks can be released by swap_pager_copy().
- Remove several instances of GIANT_REQUIRED.
vm_pageout_page_stats() from Giant.
- Modify vm_pager_put_pages() and vm_pager_page_unswapped() to expect the
vm object to be locked on entry. (All of the pager routines now expect
this.)
Remove the vnode and dev_t fields and replace them with a void *.
Introduce separate strategy functions for devices and regular (NFS)
vnodes.
For devices we don't need the vnode v_numoutput stuff.
Add a generic swaponsomething() function to add a swapdevice and
split the remainder of swaponvp() into swaponvp() and swapondev()
which calls this backend.
Eliminate a lot of checkes to make sure requests are not cross-device
which is unnecessary with the new layout. We know a sequential request
cannot possibly be cross-device because there is a reserved page between
the devices.
Remove a couple of comments which no longer are relevant.
to not get any cross-device I/O requests. (The unallocated first page
protecting BSD labels already gave us this, but that hack may go away
at some point in time).
Remove the check for cross-device I/O requests in swap_pager_strategy.
Move the repeated statistics updating into flushchainbuf().
Use ->bio_children to count child buffers, rather than abuse the
bio_caller1 pointer.
Expand the relevant bits of waitchainbuf() inline, this clarifies
the code a little bit.
striping to a per device round-robin algorithm.
Because of the policy of not attempting to retain previous swap
allocation on page-out, this means that a newly added swap device
almost instantly takes its 1/N share of the I/O load but it takes
somewhat longer for it to assume it's 1/N share of the pages if there
is plenty of space on the other devices.
Change the 8G total swapspace limitation to 8G per device instead
by using a per device blist rather than one global blist. This
reduces the memory footprint by 75% (typically a couple hundred
kilobytes) for the common case with one swapdevice but NSWAPDEV=4.
Remove the compile time constant limit of number of swap devices,
there is no limit now. Instead of a fixed size array, store the
per swapdev structure in a TAILQ.
Total swap space is still addressed by a 32 bit page number and
therefore the upper limit is now 2^42 bytes = 16TB (for i386).
We still do not allocate the first page of each device in order to
give some amount of protection to any bsdlabel at the start of the
device.
A new device is appended after the existing devices in the swap space,
no attempt is made to fill in holes left behind by swapoff (this can
trivially be changed should it ever become a problem).
The sysctl vm.nswapdev now reflects the number of currently configured
swap devices.
Rename vm_swap_size to swap_pager_avail for consistency with other
exported names.
Change argument type for vm_proc_swapin_all() and swap_pager_isswapped()
to be a struct swdevt pointer rather than an index.
Not changed: we are still using blists to manage the free space,
but since the swapspace is no longer fragmented by the striping
different resource managers might fare better.
contain the filedescriptor number on opens from userland.
The index is used rather than a "struct file *" since it conveys a bit
more information, which may be useful to in particular fdescfs and /dev/fd/*
For now pass -1 all over the place.