process. This fixes a problem when attaching to a process in gdb
and the process staying in the STOP'd state after quiting gdb.
This whole process seems a bit suspect, but this seems to work.
Reviewed by: peter
in 4.2-REL which I ripped out in -stable and -current when implementing the
low-memory handling solution. However, maxlaunder turns out to be the saving
grace in certain very heavily loaded systems (e.g. newsreader box). The new
algorithm limits the number of pages laundered in the first pageout daemon
pass. If that is not sufficient then suceessive will be run without any
limit.
Write I/O is now pipelined using two sysctls, vfs.lorunningspace and
vfs.hirunningspace. This prevents excessive buffered writes in the
disk queues which cause long (multi-second) delays for reads. It leads
to more stable (less jerky) and generally faster I/O streaming to disk
by allowing required read ops (e.g. for indirect blocks and such) to occur
without interrupting the write stream, amoung other things.
NOTE: eventually, filesystem write I/O pipelining needs to be done on a
per-device basis. At the moment it is globalized.
This is because calls with M_WAIT (now M_TRYWAIT) may not wait
forever when nothing is available for allocation, and may end up
returning NULL. Hopefully we now communicate more of the right thing
to developers and make it very clear that it's necessary to check whether
calls with M_(TRY)WAIT also resulted in a failed allocation.
M_TRYWAIT basically means "try harder, block if necessary, but don't
necessarily wait forever." The time spent blocking is tunable with
the kern.ipc.mbuf_wait sysctl.
M_WAIT is now deprecated but still defined for the next little while.
* Fix a typo in a comment in mbuf.h
* Fix some code that was actually passing the mbuf subsystem's M_WAIT to
malloc(). Made it pass M_WAITOK instead. If we were ever to redefine the
value of the M_WAIT flag, this could have became a big problem.
point in calling a function just to set a flag.
Keep better track of the syslog FAC/PRI code and try to DTRT if
they mingle.
Log all writes to /dev/console to syslog with <console.info>
priority. The formatting is not preserved, there is no robust,
way of doing it. (Ideas with patches welcome).
the kernel console. Instead, change logwakeup() to set a flag in the
softc. A callout then wakes up every so often and wakes up any processes
selecting on /dev/log (such as syslogd) if the flag is set. By default
this callout fires 5 times a second, but that can be adjusted by the
sysctl kern.log_wakeups_per_second.
Reviewed by: phk
commands have also been slightly updated as follows:
- Use ktr_idx to find the newest entry rather than walking the buffer
comparing timespecs. Timespecs are not always unique after the change
to use getnanotime(9).
- Add a new verbose setting. When the verbose setting is on, then the
timestamp is printed with each message. If KTR_EXTEND is on, then the
filename and line number are output as well. By default this option is
off. It can be turned on with the 'v' modifier passed to the 'tbuf'
and 'tall' commands. For the 'tnext' command, the 'v' modifier toggles
the verbose mode.
- Only display the cpu number for each message on SMP systems.
- Don't display anything for an empty entry that hasn't been used yet.
functions. If this flag is set, then no KTR log messages are issued.
This is useful for blocking excessive logging, such as with the internal
mutex used by the witness code.
- Use MTX_QUIET on all of the mtx_enter/exit operations on the internal
mutex used by the witness code.
- If we are in a panic, don't do witness checks in witness_enter(),
witness_exit(), and witness_try_enter(), just return.
vm86_trap() to return to the calling program directly. vm86_trap()
doesn't return, thus it was never returning to trap() to release
Giant. Thus, release Giant before calling vm86_trap().
of explicit calls to lockmgr. Also provides macros for the flags
pased to specify shared, exclusive or release which map to the
lockmgr flags. This is so that the use of lockmgr can be easily
replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
no longer contains kernel specific data structures, but rather
only scalar values and structures that are already part of the
kernel/user interface, specifically rusage and rtprio. It no
longer contains proc, session, pcred, ucred, procsig, vmspace,
pstats, mtx, sigiolst, klist, callout, pasleep, or mdproc. If
any of these changed in size, ps, w, fstat, gcore, systat, and
top would all stop working. The new structure has over 200 bytes
of unassigned space for future values to be added, yet is nearly
100 bytes smaller per entry than the structure that it replaced.
be safely held across an eventhandler function call.
- Fix an instance of the head of an eventhandler list being read without
the lock being held.
- Break down and use a SYSINIT at the new SI_SUB_EVENTHANDLER to initialize
the eventhandler global mutex and the eventhandler list of lists rather
than using a non-MP safe initialization during the first call to
eventhandler_register().
- Add in a KASSERT() to eventhandler_register() to ensure that we don't try
to register an eventhandler before things have been initialized.
held and panic if so (conditional on witness).
- Change witness_list to return the number of locks held so this is easier.
- Add kern/syscalls.c to the kernel build if witness is defined so that the
panic message can contain the name of the offending system call.
- Add assertions that Giant and sched_lock are not held when returning from
a system call, which were missing for alpha and ia64.
can lead to further panics.
- Call getnanotime() instead of nanotime() for the timestamp. nanotime()
is more precise, but it also calls into the timer code, which results
in mutex operations on the i386 arch. If KTR_LOCK is turned on, then
ktr_tracepoint() recurses on itself until it exhausts the kernel stack.
Eventually this should change to use get_cyclecount() instead, but that
can't happen if get_cyclecount() is calling nanotime() instead of
getnanotime().
Deal with excessive dirty buffers when msync() syncs non-contiguous
dirty buffers by checking for the case in UFS *before* checking for
clusterability.
__P() prototypes when an ansi-style static inline is a prototype already.
Since vnode_if.[ch] are generated on the fly, there are no CVS diffs to
mess up.
SMP problem. Compaq, in their infinite wisdom, forgot to put the IO apic
intpin #0 connection to the 8259 PIC into the mptable. This hack is to
look and see if intpin #0 has *no* table entry and adds a fake ExtInt
entry for the remap routines to use. isa/clock.c will still test the
interrupts. This entry is only ever used on an already broken system.
where fork1() could put the process on the run queue where it could be
snatched up by another CPU before kthread_create() had set the proper
fork handler. Instead, we put the new kthread on the runqueue after its
fork handler has been sent.
Noticed by: jake
Looked over by: peter
spending, which was unused now that all software interrupts have
their own thread. Make the legacy schednetisr use an atomic op
for setting bits in the netisr mask.
Reviewed by: jhb
Also, while here, run up to 32 interrupt sources on APIC systems.
Normalize INTREN/INTRDIS so they are the same on both UP and SMP systems
rather than sometimes a macro, and sometimes a function.
Reviewed by: jhb, jakeb
and numvnodes are longs in the kernel. They should remain longs in systat,
what really needs to change is that they should be using SYSCTL_LONG rather
than SYSCTL_INT. I also changed wantfreevnodes to SYSCTL_LONG because I
happened to notice it.
I wish there was a way to find all of these automatically..
Pointed out by: bde
from struct proc, which are now unused (p_nthread already was).
Remove process flag P_KTHREADP which was untested and only set
in vfs_aio.c (it should use kthread_create). Move the yield
system call to kern_synch.c as kern_threads.c has been removed
completely.
moral support from: alfred, jhb
lock. Otherwise, if we block on the backing mutex while releasing the
allproc lock, then when we resume, we will be at SRUN, and we will stay
that way all the way through cpu_exit. As a result, our parent will never
harvest us.
depend on MUTEX_DEBUG. The MUTEX_DEBUG option turns on extra assertions
and checks to verify that mutexes themselves are implemented properly.
The WITNESS option uses extra checks and diagnostics to verify that other
code is using mutexes properly.
passed vnode must be locked; this is the case because of calls
to VOP_GETATTR(), VOP_ACCESS(), and VOP_OPEN(). This becomes
more of an issue when VOP_ACCESS() gets a bit more complicated,
which it does when you introduce ACL, Capability, and MAC
support.
Obtained from: TrustedBSD Project
recently discussed at -hackers. The problem is a null-pointer
dereference that happens in kern/vfs_lookup.c when accessing ".."
with a v_mount entry for the current directory vnode of NULL. This
happens when a volume is forcibly unmounted, and the vnode for a
working directory in the mounted volume is cleared.
PR: 23191
Submitted by: Thomas Moestl <tmoestl@gmx.net>
locking the global hash on each uifree()
make struct uidinfo only visible to the kernel
make uihold() a function rather than a macro to reduce bloat
swap the order of a spl/mutex to maintain consistancy
process is on the alternate stack or not. For compatibility
with sigstack(2) state is being updated if such is needed.
We now determine whether the process is on the alternate
stack by looking at its stack pointer. This allows a process
to siglongjmp from a signal handler on the alternate stack
to the place of the sigsetjmp on the normal stack. When
maintaining state, this would have invalidated the state
information and causing a subsequent signal to be delivered
on the normal stack instead of the alternate stack.
PR: 22286
- Use a better test for determining when a process is running.
- Convert some checks to assertions.
- Remove unnecessary tests.
- Save the priority before acquiring a mutex rather than in msleep(9).
timeout. If DIAGNOSTIC is turned on, then display a message to the console
with a map of which CPUs failed to stop or restart. This gives an SMP box
at least a fighting chance of getting into DDB if one of the other CPUs has
interrupts disabled.
Alter consumers of this method to conform to the new convention.
Minor cosmetic adjustments to bus.h.
This isn't of concern as this interface isn't in use yet.
1) mpsafe (protect the refcount with a mutex).
2) reduce duplicated code by removing the inlined crdup() from crcopy()
and make crcopy() call crdup().
3) use M_ZERO flag when allocating initial structs instead of calling bzero
after allocation.
4) expand the size of the refcount from a u_short to an u_int, by using
shorts we might have an overflow.
Glanced at by: jake
use a mutex lock when looking up/deleting entries on the hashlist
use a mutex lock on each uidinfo when updating fields
make uifree() a void function rather than 'int' since no one cares
allocate uidinfo structs with the M_ZERO flag and don't explicitly initialize
them
Assisted by: eivind, jhb, jakeb
a kevent upon completion of the I/O. Specifically, introduce a new type
of sigevent notification, SIGEV_EVENT. If sigev_notify is SIGEV_EVENT,
then sigev_notify_kqueue names the kqueue that should receive the event
and sigev_value contains the "void *" is copied into the kevent's udata
field.
In contrast to the existing interface, this one: 1) works on
the Alpha 2) avoids the extra copyin() call for the kevent because all
of the information needed is in the sigevent and 3) could be
applied to request a single kevent upon completion of an entire lio_listio().
Reviewed by: jlemon