Commit Graph

122 Commits

Author SHA1 Message Date
Matt Jacob 15516f16d2 Do not do the commenting out the way that saves bytes and looks cleaner
to you. Do it the way Vox Populi wants it.
2001-01-23 16:35:33 +00:00
Matt Jacob 462574faf5 Move (now) unused variable declaration inside the block (now commented out). 2001-01-22 22:22:38 +00:00
John Baldwin 049ebc15a1 Temporarily disable the printf() for micruptime() going backwards, the
SIGXCPU signal, and killing of processes that exceed their allowed run
time until they can play nice with sched_lock.  Right now they are just
potentital panics waiting to happen.  The printf() has bitten several
people.
2001-01-20 02:57:59 +00:00
Jason Evans 238510fc46 Implement condition variables. 2001-01-16 01:00:43 +00:00
Jake Burkholder ef73ae4b0c Use PCPU_GET, PCPU_PTR and PCPU_SET to access all per-cpu variables
other then curproc.
2001-01-10 04:43:51 +00:00
Jake Burkholder c0c2557090 - Change the allproc_lock to use a macro, ALLPROC_LOCK(how), instead
of explicit calls to lockmgr.  Also provides macros for the flags
  pased to specify shared, exclusive or release which map to the
  lockmgr flags.  This is so that the use of lockmgr can be easily
  replaced with optimized reader-writer locks.
- Add some locking that I missed the first time.
2000-12-13 00:17:05 +00:00
John Baldwin 7b29322c25 Add in #include of <sys/lock.h> since it was axed from <sys/proc.h>.
Noticed by:	Wesley Morgan <morganw@chemikals.org>
Pointy hat to:	me
2000-12-06 00:33:58 +00:00
Jake Burkholder 86360fee54 Remove thr_sleep and thr_wakeup. Remove fields p_nthread and p_wakeup
from struct proc, which are now unused (p_nthread already was).
Remove process flag P_KTHREADP which was untested and only set
in vfs_aio.c (it should use kthread_create).  Move the yield
system call to kern_synch.c as kern_threads.c has been removed
completely.

moral support from:	alfred, jhb
2000-12-02 05:41:30 +00:00
Jake Burkholder 1512b5d6ab Use an mp-safe callout for endtsleep. 2000-12-01 04:55:52 +00:00
John Baldwin 1bd0eefb4c Fix up priority propagation:
- Use a better test for determining when a process is running.
- Convert some checks to assertions.
- Remove unnecessary tests.
- Save the priority before acquiring a mutex rather than in msleep(9).
2000-11-30 00:51:16 +00:00
John Baldwin e2979dcc85 Don't drop Giant and the passed in mutex incorrectly in the
cold || panicstr case.  Do drop the passed in mutex in that case if
PDROP is specified.
2000-11-29 18:32:50 +00:00
Jake Burkholder 4f55983606 Use callout_reset instead of timeout(9). Most callouts are statically
allocated, 2 have been added to struct proc for setitimer and sleep.

Reviewed by:	jhb, jlemon
2000-11-27 22:52:31 +00:00
Jake Burkholder 553629ebc9 Protect the following with a lockmgr lock:
allproc
	zombproc
	pidhashtbl
	proc.p_list
	proc.p_hash
	nextpid

Reviewed by:	jhb
Obtained from:	BSD/OS and netbsd
2000-11-22 07:42:04 +00:00
Jake Burkholder 7da6f97772 - Split the run queue and sleep queue linkage, so that a process
may block on a mutex while on the sleep queue without corrupting
it.
- Move dropping of Giant to after the acquire of sched_lock.

Tested by:	John Hay <jhay@icomtek.csir.co.za>
		jhb
2000-11-17 18:09:18 +00:00
John Baldwin 20cdcc5b73 Don't release and acquire Giant in mi_switch(). Instead, release and
acquire Giant as needed in functions that call mi_switch().  The releases
need to be done outside of the sched_lock to avoid potential deadlocks
from trying to acquire Giant while interrupts are disabled.

Submitted by:	witness
2000-11-16 02:16:44 +00:00
John Baldwin 92c79c7e3e Argh, add in a missing release of the sched_lock. 2000-11-16 01:16:54 +00:00
John Baldwin 95de685572 CURSIG() calls functions that acquire sleep mutexes, so it is not a good
idea to be holding the sched_lock while we are calling it.  As such,
release sched_lock before calling CURSIG() in msleep() and mawait() and
reacquire it after CURSIG() returns.

Submitted by:	witness
2000-11-16 01:07:19 +00:00
John Baldwin b84988521c - Rename await() to mawait(). mawait() is to await() as msleep() is to
tsleep().  Namely, mawait() takes an extra argument which is a mutex
  to drop when going to sleep.  Just as with msleep(), if the priority
  argument includes the PDROP flag, then the mutex will be dropped and will
  not be reacquired when the process wakes up.
- Add in a backwards compatible macro await() that passes in NULL as the
  mutex argument to mawait().
2000-11-15 22:39:35 +00:00
John Baldwin 3ae4dd935b - Replace a KASSERT() that knew too much about mutex internals with a
mtx_assert() that ensures the mutex we release during msleep() is both
  not recursed and owned by the current process.
2000-11-15 22:30:48 +00:00
John Baldwin f33a072eb9 - Convert references from tsleep() -> msleep()
- Fix a buglet in a comment above await()
2000-11-15 22:27:38 +00:00
John Baldwin 700bfa750f - GC some #if 0'd code regarding the non-existant safepri variable.
- Don't dink with the witness state of Giant unless we actually own it
  during mi_switch().
2000-10-20 07:52:10 +00:00
John Baldwin 6c56727456 - Change fast interrupts on x86 to push a full interrupt frame and to
return through doreti to handle ast's.  This is necessary for the
  clock interrupts to work properly.
- Change the clock interrupts on the x86 to be fast instead of threaded.
  This is needed because both hardclock() and statclock() need to run in
  the context of the current process, not in a separate thread context.
- Kill the prevproc hack as it is no longer needed.
- We really need Giant when we call psignal(), but we don't want to block
  during the clock interrupt.  Instead, use two p_flag's in the proc struct
  to mark the current process as having a pending SIGVTALRM or a SIGPROF
  and let them be delivered during ast() when hardclock() has finished
  running.
- Remove CLKF_BASEPRI, which was #ifdef'd out on the x86 anyways.  It was
  broken on the x86 if it was turned on since cpl is gone.  It's only use
  was to bogusly run softclock() directly during hardclock() rather than
  scheduling an SWI.
- Remove the COM_LOCK simplelock and replace it with a clock_lock spin
  mutex.  Since the spin mutex already handles disabling/restoring
  interrupts appropriately, this also lets us axe all the *_intr() fu.
- Back out the hacks in the APIC_IO x86 cpu_initclocks() code to use
  temporary fast interrupts for the APIC trial.
- Add two new process flags P_ALRMPEND and P_PROFPEND to mark the pending
  signals in hardclock() that are to be delivered in ast().

Submitted by:	jakeb (making statclock safe in a fast interrupt)
Submitted by:	cp (concept of delaying signals until ast())
2000-10-06 02:20:21 +00:00
John Baldwin fd2802cfe0 Add a KASSERT() to catch instances where the mutex that we pass in to
msleep() are recursed.

Suggested by:	cp
2000-09-24 00:33:51 +00:00
John Baldwin 606f8eb27a Remove the mtx_t, witness_t, and witness_blessed_t types. Instead, just
use struct mtx, struct witness, and struct witness_blessed.

Requested by:	bde
2000-09-14 20:15:16 +00:00
Jake Burkholder 817bf5d4a6 Rename tsleep to msleep and add a mutex argument, which is
released before sleeping and re-acquired before msleep
returns.  A compatibility cpp macro has been provided for
tsleep to avoid changing all occurences of it in the kernel.

Remove an assertion that the Giant mutex be held before
calling tsleep or asleep.

This is intended to serve the same purpose as condition
variables, but does not preclude their addition in the
future.

Approved by:	jasone
Obtained from:	BSD/OS
2000-09-11 00:20:02 +00:00
Doug Rabson 4eb38057ea Fix printf warnings in CTRx calls. 2000-09-10 13:34:35 +00:00
Jason Evans 0384fff8c5 Major update to the way synchronization is done in the kernel. Highlights
include:

* Mutual exclusion is used instead of spl*().  See mutex(9).  (Note: The
  alpha port is still in transition and currently uses both.)

* Per-CPU idle processes.

* Interrupts are run in their own separate kernel threads and can be
  preempted (i386 only).

Partially contributed by:	BSDi (BSD/OS)
Submissions by (at least):	cp, dfr, dillon, grog, jake, jhb, sheldonh
2000-09-07 01:33:02 +00:00
Poul-Henning Kamp 77978ab8bc Previous commit changing SYSCTL_HANDLER_ARGS violated KNF.
Pointed out by:	bde
2000-07-04 11:25:35 +00:00
Poul-Henning Kamp 82d9ae4e32 Style police catches up with rev 1.26 of src/sys/sys/sysctl.h:
Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our
sources:

        -sysctl_vm_zone SYSCTL_HANDLER_ARGS
        +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)
2000-07-03 09:35:31 +00:00
Jake Burkholder e39756439c Back out the previous change to the queue(3) interface.
It was not discussed and should probably not happen.

Requested by:		msmith and others
2000-05-26 02:09:24 +00:00
Jake Burkholder 740a1973a6 Change the way that the queue(3) structures are declared; don't assume that
the type argument to *_HEAD and *_ENTRY is a struct.

Suggested by:	phk
Reviewed by:	phk
Approved by:	mdodd
2000-05-23 20:41:01 +00:00
Greg Lehey 72cc7e2dce Correct a couple of typos. 2000-05-07 05:09:45 +00:00
Brian Feldman 226f14bc83 Change the scheduler to actually respect the PUSER barrier. It's been
wrong for many years that negative niceness would lower the priority
of a process below PUSER, and once below PUSER, there were conditionals
in the code that are required to test for whether a process was in
the kernel which would break.

The breakage could (and did) cause lock-ups, basically nothing else
but the least nice program being able to run in some conditions.  The
algorithm which adjusts the priority now subtracts PRIO_MIN to do
things properly, and the ESTCPULIM() algorithm was updated to use
PRIO_TOTAL (PRIO_MAX - PRIO_MIN) to calculate the estcpu.

NICE_WEIGHT is now 1 to accomodate the full range of priorities better
(a -20 process with full CPU time has the priority of a +0 process with
no CPU time).  There are now 20 queues (exactly; 80 priorities) for
use in user processes' scheduling, and PUSER has been lowered to 48
to accomplish this.

This means, to the user, that things will be scheduled more correctly
(noticeable), there is no lock-up anymore WRT a niced -20 process
never releasing the CPU time for other processes.  In this fair system,
tsleep()ed < PUSER processes now will get the proper higher priority
than priority >= PUSER user processes.

The detective work of this was done by me, along with part of the
solution.  Luoqi Chen has provided most of the solution, and really
helped me understand what was happening better, to boot :)

Submitted by:   luoqi
Concept reviewed by:    bde
2000-04-30 18:33:43 +00:00
Matthew Dillon db6a426158 The SMP cleanup commit broke UP compiles. Make UP compiles work again. 2000-03-28 18:06:49 +00:00
Matthew Dillon 36e9f877df Commit major SMP cleanups and move the BGL (big giant lock) in the
syscall path inward.  A system call may select whether it needs the MP
    lock or not (the default being that it does need it).

    A great deal of conditional SMP code for various deadended experiments
    has been removed.  'cil' and 'cml' have been removed entirely, and the
    locking around the cpl has been removed.  The conditional
    separately-locked fast-interrupt code has been removed, meaning that
    interrupts must hold the CPL now (but they pretty much had to anyway).
    Another reason for doing this is that the original separate-lock for
    interrupts just doesn't apply to the interrupt thread mechanism being
    contemplated.

    Modifications to the cpl may now ONLY occur while holding the MP
    lock.  For example, if an otherwise MP safe syscall needs to mess with
    the cpl, it must hold the MP lock for the duration and must (as usual)
    save/restore the cpl in a nested fashion.

    This is precursor work for the real meat coming later: avoiding having
    to hold the MP lock for common syscalls and I/O's and interrupt threads.
    It is expected that the spl mechanisms and new interrupt threading
    mechanisms will be able to run in tandem, allowing a slow piecemeal
    transition to occur.

    This patch should result in a moderate performance improvement due to
    the considerable amount of code that has been removed from the critical
    path, especially the simplification of the spl*() calls.  The real
    performance gains will come later.

Approved by: jkh
Reviewed by: current, bde (exception.s)
Some work taken from: luoqi's patch
2000-03-28 07:16:37 +00:00
Peter Dufault 6d9a8d3e8f I applied the wrong patch set. Back out anything associated
with the known bogus currtpriority.  This undoes the previous changes to
sys/i386/i386/trap.c, sys/alpha/alpha/trap.c, sys/sys/systm.h

Now we have the patch set approved by bde.

Approved by:	bde
2000-03-02 22:03:49 +00:00
Peter Dufault 383774c417 Patches that eliminate extra context switches in FIFO case.
Fixes p1003_1b regression test in the simple case of no RR and
FIFO processes competing.

Reviewed by:	jkh, bde
2000-03-02 16:20:07 +00:00
Peter Wemm ebc49c5654 Don't make the ktrace hook in tsleep() deref a null curproc after a panic.
PR:		15169
Submitted by:	David Gilbert <dgilbert@velocet.ca>
1999-11-30 09:01:46 +00:00
Poul-Henning Kamp 8f04f6c729 Add a bit of sanity checking and problem avoidance in case the
timecounter hardware is bogus.

This will produce a new warning "microuptime() went backwards"
and try to not screw up the process resource accounting.
1999-11-29 11:29:04 +00:00
Bruce Evans f0ebe4973f Scheduler fixes equivalent to the ones logged in the following NetBSD
commit to kern_synch.c:

  ----------------------------
  revision 1.55
  date: 1999/02/23 02:56:03;  author: ross;  state: Exp;  lines: +39 -10
  Scheduler bug fixes and reorganization
  * fix the ancient nice(1) bug, where nice +20 processes incorrectly
    steal 10 - 20% of the CPU, (or even more depending on load average)
  * provide a new schedclk() mechanism at a new clock at schedhz, so high
    platform hz values don't cause nice +0 processes to look like they are
    niced
  * change the algorithm slightly, and reorganize the code a lot
  * fix percent-CPU calculation bugs, and eliminate some no-op code

  === nice bug === Correctly divide the scheduler queues between niced and
  compute-bound processes. The current nice weight of two (sort of, see
  `algorithm change' below) neatly divides the USRPRI queues in half; this
  should have been used to clip p_estcpu, instead of UCHAR_MAX.  Besides
  being the wrong amount, clipping an unsigned char to UCHAR_MAX is a no-op,
  and it was done after decay_cpu() which can only _reduce_ the value.  It
  has to be kept <= NICE_WEIGHT * PRIO_MAX - PPQ or processes can
  scheduler-penalize themselves onto the same queue as nice +20 processes.
  (Or even a higher one.)

  === New schedclk() mechansism === Some platforms should be cutting down
  stathz before hitting the scheduler, since the scheduler algorithm only
  works right in the vicinity of 64 Hz. Rather than prescale hz, then scale
  back and forth by 4 every time p_estcpu is touched (each occurance an
  abstraction violation), use p_estcpu without scaling and require schedhz
  to be generated directly at the right frequency. Use a default stathz (well,
  actually, profhz) / 4, so nothing changes unless a platform defines schedhz
  and a new clock.  Define these for alpha, where hz==1024, and nice was
  totally broke.

  === Algorithm change === The nice value used to be added to the
  exponentially-decayed scheduler history value p_estcpu, in _addition_ to
  be incorporated directly (with greater wieght) into the priority calculation.
  At first glance, it appears to be a pointless increase of 1/8 the nice
  effect (pri = p_estcpu/4 + nice*2), but it's actually at least 3x that
  because it will ramp up linearly but be decayed only exponentially, thus
  converging to an additional .75 nice for a loadaverage of one. I killed
  this, it makes the behavior hard to control, almost impossible to analyze,
  and the effect (~~nothing at for the first second, then somewhat increased
  niceness after three seconds or more, depending on load average) pointless.

  === Other bugs === hz -> profhz in the p_pctcpu = f(p_cpticks) calcuation.
  Collect scheduler functionality. Try to put each abstraction in just one
  place.
  ----------------------------

The details are a little different in FreeBSD:

=== nice bug ===   Fixing this is the main point of this commit.  We use
essentially the same clipping rule as NetBSD (our limit on p_estcpu
differs by a scale factor).  However, clipping at all is fundamentally
bad.  It gives free CPU the hoggiest hogs once they reach the limit, and
reaching the limit is normal for long-running hogs.  This will be fixed
later.

=== New schedclk() mechanism ===  We don't use the NetBSD schedclk()
(now schedclock()) mechanism.  We require (real)stathz to be about 128
and scale by an extra factor of 2 compared with NetBSD's statclock().
We scale p_estcpu instead of scaling the clock.  This is more accurate
and flexible.

=== Algorithm change ===  Same change.

=== Other bugs ===  The p_pctcpu bug was fixed long ago.  We don't try as
hard to abstract functionality yet.

Related changes: the new limit on p_estcpu must be exported to kern_exit.c
for clipping in wait1().

Agreed with by:		dufault
1999-11-28 12:12:13 +00:00
Bruce Evans 9bc8d885ed Updated comments for the move in the previous commit. 1999-11-27 15:27:11 +00:00
Bruce Evans 8a9d4d98b1 Moved scheduling-related code to kern_synch.c so that it is easier to fix
and extend.  The new function containing the code is named schedclock()
as in NetBSD, but it has slightly different semantics (it already handles
incrementation of p->p_cpticks, and it should handle any calling frequency).

Agreed with in principle by:	dufault
1999-11-27 12:32:27 +00:00
Poul-Henning Kamp 2e3c8fcbd0 This is a partial commit of the patch from PR 14914:
Alot of the code in sys/kern directly accesses the *Q_HEAD and *Q_ENTRY
   structures for list operations.  This patch makes all list operations
   in sys/kern use the queue(3) macros, rather than directly accessing the
   *Q_{HEAD,ENTRY} structures.

This batch of changes compile to the same object files.

Reviewed by:    phk
Submitted by:   Jake Burkholder <jake@checker.org>
PR:     14914
1999-11-16 10:56:05 +00:00
Marcel Moolenaar 2c42a14602 sigset_t change (part 2 of 5)
-----------------------------

The core of the signalling code has been rewritten to operate
on the new sigset_t. No methodological changes have been made.
Most references to a sigset_t object are through macros (see
signalvar.h) to create a level of abstraction and to provide
a basis for further improvements.

The NSIG constant has not been changed to reflect the maximum
number of signals possible. The reason is that it breaks
programs (especially shells) which assume that all signals
have a non-null name in sys_signame. See src/bin/sh/trap.c
for an example. Instead _SIG_MAXSIG has been introduced to
hold the maximum signal possible with the new sigset_t.

struct sigprop has been moved from signalvar.h to kern_sig.c
because a) it is only used there, and b) access must be done
though function sigprop(). The latter because the table doesn't
holds properties for all signals, but only for the first NSIG
signals.

signal.h has been reorganized to make reading easier and to
add the new and/or modified structures. The "old" structures
are moved to signalvar.h to prevent namespace polution.

Especially the coda filesystem suffers from the change, because
it contained lines like (p->p_sigmask == SIGIO), which is easy
to do for integral types, but not for compound types.

NOTE: kdump (and port linux_kdump) must be recompiled.

Thanks to Garrett Wollman and Daniel Eischen for pressing the
importance of changing sigreturn as well.
1999-09-29 15:03:48 +00:00
Peter Wemm c3aac50f28 $Id$ -> $FreeBSD$ 1999-08-28 01:08:13 +00:00
Peter Wemm 26d12af46c Don't initialize run queues here, do it all in one place. 1999-08-19 00:14:43 +00:00
Bruce Evans efc96764e0 The magic "no-cpu" cpu number is 0xff. Don't misrepresent cpu
numbers as chars or use bogus casts in an attempt to unmisrepresnt
them.  In top, don't assume that 0xff is the only negative cpu
number when cpu numbers are (mis)represented.
1999-03-05 16:38:13 +00:00
Julian Elischer 90b4d77467 The tunable parameter for the scheduler quantum was inverted.
Higher numbers led to smaller quanta.
In discussion with BDE, change this parameter to be in uSecs
to make it machine independent,
and limit it to non zero multiples of 'tick' (rounding down).
Also make the variabel globally available so that the present function that
returns its value (used for posix scheduling I believe) can go away.

Submitted by: Bruce Evans <bde@freebsd.org>
1999-03-03 18:15:29 +00:00
Bruce Evans e7ba67f274 Removed all traces of `p_switchtime'. The relevant timestamp is per-cpu,
not per-process.  Keep it in `switchtime' consistently.

It is now clear that the timestamp is always valid in fork_trampoline()
except when the child is running on a previously idle cpu, which
can only happen if there are multiple cpus, so don't check or set
the timestamp in fork_trampoline except in the (i386) SMP case.
Just remove the alpha code for setting it unconditionally, since
there is no SMP case for alpha and the code had rotted.

Parts reviewed by:	dfr, phk
1999-02-28 10:53:29 +00:00
Bruce Evans 554dedb3c9 Improved scheduling in uiomove(), etc. resched_wanted() is true too
often for it to be a good criterion for switching kernel cpu hogs --
it is true after most wakeups.  Use the criterion "has been running
for >= 2 quanta" instead.
1999-02-22 16:57:48 +00:00