Submitted by: tegge
o Eliminate the "!mapentzone" check from vm_map_entry_create() and
vm_map_entry_dispose(). Reviewed by: tegge
o Fix white-space usage in vm_map_entry_create().
or user vm_maps. This implementation has two key benefits when compared
to vm_map_{user_,}pageable(): (1) it avoids a race condition through
the use of "in-transition" vm_map entries and (2) it eliminates lock
recursion on the vm_map.
Note: there is still an error case that requires clean up.
Reviewed by: tegge
obviously bogous return value of ad1816chan_setformat().
PR: 37932
Submitted by: Martin Kaeske <Martin.Kaeske@Stud.TU-Ilmenau.DE>
Reviewed by: hm
MFC after: 10 days
o Add a stub for vm_map_wire().
Note: the description of the previous commit had an error. The in-
transition flag actually blocks the deallocation of a vm_map_entry by
vm_map_delete() and vm_map_simplify_entry().
in their tlb which the prom doesn't clear out, so we have to do so manually
before mapping the kernel page table or the cpu can hang due various
conditions which cause undefined behaviour from the tlb.
or user vm_maps. In accordance with the standards for munlock(2),
and in contrast to vm_map_user_pageable(), this implementation does not
allow holes in the specified region. This implementation uses the
"in transition" flag described below.
o Introduce a new flag, "in transition," to the vm_map_entry.
Eventually, vm_map_delete() and vm_map_simplify_entry() will respect
this flag by deallocating in-transition vm_map_entrys, allowing
the vm_map lock to be safely released in vm_map_unwire() and (the
forthcoming) vm_map_wire().
o Modify vm_map_simplify_entry() to respect the in-transition flag.
In collaboration with: tegge
options do. Comments should be in NOTES and having the comments in two
places usually means that one place will just bitrot. Thus, remove the
comment for KTRACE_REQUEST_POOL from the previous revision.
Requested by: bde
- ktrace no longer requires Giant so do ktrace syscall events before and
after acquiring and releasing Giant, respectively.
- For i386, ia32 syscalls on ia64, powerpc, and sparc64, get rid of the
goto bad hack and instead use the model on ia64 and alpha were we
skip the actual syscall invocation if error != 0. This fixes a bug
where if we the copyin() of the arguments failed for a syscall that
was not marked MP safe, we would try to release Giant when we had
not acquired it.
operations to dump a ktrace event out to an output file are now handled
asychronously by a ktrace worker thread. This enables most ktrace events
to not need Giant once p_tracep and p_traceflag are suitably protected by
the new ktrace_lock.
There is a single todo list of pending ktrace requests. The various
ktrace tracepoints allocate a ktrace request object and tack it onto the
end of the queue. The ktrace kernel thread grabs requests off the head of
the queue and processes them using the trace vnode and credentials of the
thread triggering the event.
Since we cannot assume that the user memory referenced when doing a
ktrgenio() will be valid and since we can't access it from the ktrace
worker thread without a bit of hassle anyways, ktrgenio() requests are
still handled synchronously. However, in order to ensure that the requests
from a given thread still maintain relative order to one another, when a
synchronous ktrace event (such as a genio event) is triggered, we still put
the request object on the todo list to synchronize with the worker thread.
The original thread blocks atomically with putting the item on the queue.
When the worker thread comes across an asynchronous request, it wakes up
the original thread and then blocks to ensure it doesn't manage to write a
later event before the original thread has a chance to write out the
synchronous event. When the original thread wakes up, it writes out the
synchronous using its own context and then finally wakes the worker thread
back up. Yuck. The sychronous events aren't pretty but they do work.
Since ktrace events can be triggered in fairly low-level areas (msleep()
and cv_wait() for example) the ktrace code is designed to use very few
locks when posting an event (currently just the ktrace_mtx lock and the
vnode interlock to bump the refcoun on the trace vnode). This also means
that we can't allocate a ktrace request object when an event is triggered.
Instead, ktrace request objects are allocated from a pre-allocated pool
and returned to the pool after a request is serviced.
The size of this pool defaults to 100 objects, which is about 13k on an
i386 kernel. The size of the pool can be adjusted at compile time via the
KTRACE_REQUEST_POOL kernel option, at boot time via the
kern.ktrace_request_pool loader tunable, or at runtime via the
kern.ktrace_request_pool sysctl.
If the pool of request objects is exhausted, then a warning message is
printed to the console. The message is rate-limited in that it is only
printed once until the size of the pool is adjusted via the sysctl.
I have tested all kernel traces but have not tested user traces submitted
by utrace(2), though they should work fine in theory.
Since a ktrace request has several properties (content of event, trace
vnode, details of originating process, credentials for I/O, etc.), I chose
to drop the first argument to the various ktrfoo() functions. Currently
the functions just assume the event is posted from curthread. If there is
a great desire to do so, I suppose I could instead put back the first
argument but this time make it a thread pointer instead of a vnode pointer.
Also, KTRPOINT() now takes a thread as its first argument instead of a
process. This is because the check for a recursive ktrace event is now
per-thread instead of process-wide.
Tested on: i386
Compiles on: sparc64, alpha
when a thread is in the ktrace subsystem to avoid ktrace'ing internal
ktrace events.
- Update the locking notes for p_traceflag and p_tracep taking into account
the new ktrace_lock mutex.
lock_object by another pointer (though all of lock_object should be
conditional on LOCK_DEBUG anyways) in exchange for an O(1) TAILQ_REMOVE()
in witness_destroy() (called for every mtx_destroy() and sx_destroy())
instead of an O(n) STAILQ_REMOVE. Since WITNESS is so dog slow as it is,
the speed-up is worth the space cost.
Suggested by: iedowse
being created and destroyed without a single long-term one around to ensure
the witness associated with that group of locks stays alive. The pipe
mutexes are an example of this group. For a dead witness we no longer
clear the witness name. Instead, when looking up the witness for a lock,
if a dead witness' (a witness with a refcount of 0) w_name pointer is
identical to the witness name of the lock then we revive that witness
instead of using a new witness for the lock. This results in far fewer
dead witness objects and also better preserves locking orders over the long
term resulting in more correct lock order checking. Note that we can't
ever derefence w_name of a dead witness since we don't know if the string
it is pointing to has been free()'d or kldunload()'d out from under us.