while we are copying it to the kinfo_proc structure.
- Test against p_stat to see if we are blocked on a mutex.
- Terminate ki_mtxname with a null char rather than ki_wmesg.
m_reclaim() and re-acquire it when m_reclaim() returns. This means that
we now call the drain routines without holding the mutex lock and
recursing into it. This was done for mainly two reasons:
(i) Avoid the long recursion; long recursions are typically bad and this
is the case here because we block all other code from freeing mbufs
if they need to. Doing that is kind of counter-productive, since we're
really hoping that someone will free.
(ii) More importantly, avoid a potential lock order reversal. Right now,
not all the locks have been added to our networking code; but
without this change, we're introducing the possibility for deadlock.
Consider for example ip_drain(). We will likely eventually introduce
a lock for ipq there, and so ip_freef() will be called with ipq lock
held. But, ip_freef() calls m_freem() which in turn acquires the
mmbfree lock. Since we were previously calling ip_drain() with mmbfree
held, our lock order would be: mmbfree->ipq->mmbfree. Some other code
may very well lock ipq first and then call ip_freef(). This would
result in the regular lock order, ipq->mmbfree. Clearly, we have
deadlock if one thread acquires the ipq lock and sits waiting for
mmbfree while another thread calling m_reclaim() acquires mmbfree
and sits waiting for the ipq lock.
Also, make sure to add a comment above m_reclaim()'s definition briefly
explaining this. Also document this above the call to m_reclaim() in
m_mballoc_wait().
Suggested and reviewed by: alfred
the pipe, then we were corrupting the pipe_zone free list by calling
pipeclose on rpipe twice. NULL out rpipe to avoid this.
Reviewed by: dillon
Reviewed by: iedowse
- Provide TUNABLE_INT() hooks for ktr_cpumask, ktr_mask, and ktr_verbose
so that they can be set from the loader by their respective sysctl names.
For example, to turn on KTR_INTR and KTR_PROC in ktr_mask, one could
stick 'debug.ktr.mask="0x1200"' in /boot/loader.conf.
o Use 8 space hard tabs
o Eliminate trailing white space (while I'm here, just in a couple of places)
o wrap mostly at 80 columns (printf literal strings being the notable
exception)
o use return (foo) consistantly
o use 0 vs NULL more consistantly
o use queue(3) xxx_FOREACH macros where appropriate (some places used it
before, others didn't).
o use BSD line continuation parameters
Pendants will likely notice minor style(9) violations, but for the
most part the file now looks much much closer to style(9) and is
mostly self-consistant.
Approved in principle by: dfr
Reviewed by: md5 (no changes to the .o)
to the SYSCTL_ADD_FOO() macros is a constant that should be turned into
a string via the pre-processor. Instead, require it to be an explicit
string so that names can be generated on the fly.
- Make some of the char * arguments to sysctl_add_oid() const to quiet
warnings.
file.
While there fix the layout of function headers (noticable in long headers)
Fix up some style nits. It's Perl and should be written in that style.
out: label in psignal() did not grab sched_lock before trying to release
it. Also, the previous version had several cases where it grabbed
sched_lock before jumping to out: unneccessarily, so rework this a bit.
The runfast: and out: labels must be called with sched_lock released, and
the run: label must be called with it held. Appropriate mtx_assert()'s
have been added that should catch any bugs that may still be in this
code.
Noticed by: bde
attaching to running processes, it completely breaks normal debugging.
A better fix is in the works, but cannot be properly tested until
the problem with gdb hanging the system in -current is solved.
process. This fixes a problem when attaching to a process in gdb
and the process staying in the STOP'd state after quiting gdb.
This whole process seems a bit suspect, but this seems to work.
Reviewed by: peter
in 4.2-REL which I ripped out in -stable and -current when implementing the
low-memory handling solution. However, maxlaunder turns out to be the saving
grace in certain very heavily loaded systems (e.g. newsreader box). The new
algorithm limits the number of pages laundered in the first pageout daemon
pass. If that is not sufficient then suceessive will be run without any
limit.
Write I/O is now pipelined using two sysctls, vfs.lorunningspace and
vfs.hirunningspace. This prevents excessive buffered writes in the
disk queues which cause long (multi-second) delays for reads. It leads
to more stable (less jerky) and generally faster I/O streaming to disk
by allowing required read ops (e.g. for indirect blocks and such) to occur
without interrupting the write stream, amoung other things.
NOTE: eventually, filesystem write I/O pipelining needs to be done on a
per-device basis. At the moment it is globalized.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.
point in calling a function just to set a flag.
Keep better track of the syslog FAC/PRI code and try to DTRT if
they mingle.
Log all writes to /dev/console to syslog with <console.info>
priority. The formatting is not preserved, there is no robust,
way of doing it. (Ideas with patches welcome).
the kernel console. Instead, change logwakeup() to set a flag in the
softc. A callout then wakes up every so often and wakes up any processes
selecting on /dev/log (such as syslogd) if the flag is set. By default
this callout fires 5 times a second, but that can be adjusted by the
sysctl kern.log_wakeups_per_second.
Reviewed by: phk
commands have also been slightly updated as follows:
- Use ktr_idx to find the newest entry rather than walking the buffer
comparing timespecs. Timespecs are not always unique after the change
to use getnanotime(9).
- Add a new verbose setting. When the verbose setting is on, then the
timestamp is printed with each message. If KTR_EXTEND is on, then the
filename and line number are output as well. By default this option is
off. It can be turned on with the 'v' modifier passed to the 'tbuf'
and 'tall' commands. For the 'tnext' command, the 'v' modifier toggles
the verbose mode.
- Only display the cpu number for each message on SMP systems.
- Don't display anything for an empty entry that hasn't been used yet.
functions. If this flag is set, then no KTR log messages are issued.
This is useful for blocking excessive logging, such as with the internal
mutex used by the witness code.
- Use MTX_QUIET on all of the mtx_enter/exit operations on the internal
mutex used by the witness code.
- If we are in a panic, don't do witness checks in witness_enter(),
witness_exit(), and witness_try_enter(), just return.
vm86_trap() to return to the calling program directly. vm86_trap()
doesn't return, thus it was never returning to trap() to release
Giant. Thus, release Giant before calling vm86_trap().
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
no longer contains kernel specific data structures, but rather
only scalar values and structures that are already part of the
kernel/user interface, specifically rusage and rtprio. It no
longer contains proc, session, pcred, ucred, procsig, vmspace,
pstats, mtx, sigiolst, klist, callout, pasleep, or mdproc. If
any of these changed in size, ps, w, fstat, gcore, systat, and
top would all stop working. The new structure has over 200 bytes
of unassigned space for future values to be added, yet is nearly
100 bytes smaller per entry than the structure that it replaced.
be safely held across an eventhandler function call.
- Fix an instance of the head of an eventhandler list being read without
the lock being held.
- Break down and use a SYSINIT at the new SI_SUB_EVENTHANDLER to initialize
the eventhandler global mutex and the eventhandler list of lists rather
than using a non-MP safe initialization during the first call to
eventhandler_register().
- Add in a KASSERT() to eventhandler_register() to ensure that we don't try
to register an eventhandler before things have been initialized.
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.
can lead to further panics.
- Call getnanotime() instead of nanotime() for the timestamp. nanotime()
is more precise, but it also calls into the timer code, which results
in mutex operations on the i386 arch. If KTR_LOCK is turned on, then
ktr_tracepoint() recurses on itself until it exhausts the kernel stack.
Eventually this should change to use get_cyclecount() instead, but that
can't happen if get_cyclecount() is calling nanotime() instead of
getnanotime().
Deal with excessive dirty buffers when msync() syncs non-contiguous
dirty buffers by checking for the case in UFS *before* checking for
clusterability.
__P() prototypes when an ansi-style static inline is a prototype already.
Since vnode_if.[ch] are generated on the fly, there are no CVS diffs to
mess up.
SMP problem. Compaq, in their infinite wisdom, forgot to put the IO apic
intpin #0 connection to the 8259 PIC into the mptable. This hack is to
look and see if intpin #0 has *no* table entry and adds a fake ExtInt
entry for the remap routines to use. isa/clock.c will still test the
interrupts. This entry is only ever used on an already broken system.
where fork1() could put the process on the run queue where it could be
snatched up by another CPU before kthread_create() had set the proper
fork handler. Instead, we put the new kthread on the runqueue after its
fork handler has been sent.
Noticed by: jake
Looked over by: peter
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.
Reviewed by: jhb
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.
Reviewed by: jhb, jakeb
and numvnodes are longs in the kernel. They should remain longs in systat,
what really needs to change is that they should be using SYSCTL_LONG rather
than SYSCTL_INT. I also changed wantfreevnodes to SYSCTL_LONG because I
happened to notice it.
I wish there was a way to find all of these automatically..
Pointed out by: bde
from struct proc, which are now unused (p_nthread already was).
Remove process flag P_KTHREADP which was untested and only set
in vfs_aio.c (it should use kthread_create). Move the yield
system call to kern_synch.c as kern_threads.c has been removed
completely.
moral support from: alfred, jhb
lock. Otherwise, if we block on the backing mutex while releasing the
allproc lock, then when we resume, we will be at SRUN, and we will stay
that way all the way through cpu_exit. As a result, our parent will never
harvest us.
depend on MUTEX_DEBUG. The MUTEX_DEBUG option turns on extra assertions
and checks to verify that mutexes themselves are implemented properly.
The WITNESS option uses extra checks and diagnostics to verify that other
code is using mutexes properly.
passed vnode must be locked; this is the case because of calls
to VOP_GETATTR(), VOP_ACCESS(), and VOP_OPEN(). This becomes
more of an issue when VOP_ACCESS() gets a bit more complicated,
which it does when you introduce ACL, Capability, and MAC
support.
Obtained from: TrustedBSD Project
recently discussed at -hackers. The problem is a null-pointer
dereference that happens in kern/vfs_lookup.c when accessing ".."
with a v_mount entry for the current directory vnode of NULL. This
happens when a volume is forcibly unmounted, and the vnode for a
working directory in the mounted volume is cleared.
PR: 23191
Submitted by: Thomas Moestl <tmoestl@gmx.net>
locking the global hash on each uifree()
make struct uidinfo only visible to the kernel
make uihold() a function rather than a macro to reduce bloat
swap the order of a spl/mutex to maintain consistancy
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.
We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.
PR: 22286