1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.
Reviewed by: bde
(usually a couple of thousand) to 25. The measured impact on cache-hits
doesn't justify spending memory this way:
Target number of free vnodes versus namecache hit rate in % during a
make world:
10 98.5316
200 98.5479
500 98.5546
1000 98.5709
3000 98.6006
4000 98.6126
hash chain traversal isn't needed. This also allows untimeout to recompute
the hash to find the bucket that the entry to remove is stored in so
that each callout entry no longer needs to store that information.
Reviewed by: Nate Williams <nate@mt.sri.com>
Add support for "interrupt driven configuration hooks".
A component of the kernel can register a hook, most likely
during auto-configuration, and receive a callback once
interrupt services are available. This callback will occur before
the root and dump devices are configured, so the configuration
task can affect the selection of those two devices or complete
any tasks that need to be performed prior to launching init.
System boot is posponed so long as a hook is registered. The
hook owner is responsible for removing the hook once their task
is complete or the system boot can continue.
kern_acct.c kern_clock.c kern_exit.c kern_synch.c kern_time.c:
Change the interface and implementation for the kernel callout
service. The new implemntaion is based on the work of
Adam M. Costello and George Varghese, published in a technical
report entitled "Redesigning the BSD Callout and Timer Facilities".
The interface used in FreeBSD is a little different than the one
outlined in the paper. The new function prototypes are:
struct callout_handle timeout(void (*func)(void *),
void *arg, int ticks);
void untimeout(void (*func)(void *), void *arg,
struct callout_handle handle);
If a client wishes to remove a timeout, it must store the
callout_handle returned by timeout and pass it to untimeout.
The new implementation gives 0(1) insert and removal of callouts
making this interface scale well even for applications that
keep 100s of callouts outstanding.
See the updated timeout.9 man page for more details.
Add cpu_rootconf and cpu_dumpconf so that configuring these
two devices can be better controlled by the MI configuration
code.
machdep.c:
MD initialization code for the new callout interface.
trap.c:
Add support for printing out whether cam interrupts are masked
during a panic.
plus the previous changes to use the zone allocator decrease the useage
of malloc by half. The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs. Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
extern in <sys/malloc.h> and it should not have been staticized for
the !(KMEMSTATS || DIAGNOSTIC) case.
Fixed the !(KMEMSTATS || DIAGNOSTIC) case. The MALLOC() and FREE()
macros are evil, but code generally doesn't allow for this and some code
involving else clauses did not compile.
Finished staticization.
or a partition is larger than the slice.
Now `disklabel -Brw sdX auto' should fail properly on sliced disks
without partition of type 165, e.g., on zip disks with the factory
default formatting. Previously it set a bogus in-core label for
the compatibility slice and used this to corrupt the MBR (the slice
has offset 0 and size 0, but setting the label in effect corrupted
its size to nonzero).
`disklabel -Brw sdX auto' already failed properly on normally (not
dangerously dedicated) sliced disks _with_ partition of type 165,
because the compatibility slice has a nonzero offset so the MBR
remained inaccessible when the size was corrupted.
This bug only affected in-core labels. On-disk labels are checked
carefully when they read and written.
A couple of stylistic nits from Bruce.
If your libc contains version 1.11 or 1.12 of getcwd.c, (ie: if
you recompiled libc one of the last couple of days):
>>> Recompile LIBC before you boot a new kernel <<<
A new libc will deal with both old and new kernels.
adapted from NetBSD.. However, there are some differences in the tty
system that are big enough to cause their code to not fit comfortably.
Obtained from: NetBSD (I think)
detail is passed back and forwards). This mostly came from NetBSD, except
that our interfaces have changed a lot and this funciton is in a different
part of the kernel.
Obtained from: NetBSD
The implementation is done (unlike what i've originally been
contemplating) by reparenting kids of processes that have the
appropriate bit set to PID 1, and let PID 1 handle the zombie. This
is far less problematical than what would seem to be ``doing it
right'', for a number of reasons.
Of our currently shipping PID-1-intended programs, 50 % fail the above
assumption. ;-) (Read this: sysinstall doesn't do it right. This is
no problem as long as no program called by sysinstall actually uses
SA_NOCLDWAIT.)
ToDo: . clarify the correct SA_* flag inheritance, compared
to other systems,
. decide whether the compat cruft (osigvec(9)) should
deal with new system additions or not,
. merge OpenBSD's SA_SIGINFO implementation. ;)
Reviewed by: bde
local filesystem metadata at the first brelse call when the
block device vnode has v_tag set to VT_NFS.
Reviewed by: phk
Submitted by: Tor Egge <tegge@idi.ntnu.no>
declaration macros so that a semicolon can be added when the macros
are invoked without giving a (pedantic) syntax error. Invocations
need to be followed by a semicolon so that programs like indent and
gtags don't get confused.
Fixed the one invocation that wasn't followed by a trailing semicolon.
since that might cause in_pcballoc to call MALLOC with M_WAITOK during
a software interrupt.
Reviewed by: Garrett Wollman <wollman@khavrinen.lcs.mit.edu>
Introduce VFREE which indicates that vnode is on freelist.
Rename vholdrele() to vdrop().
Create vfree() and vbusy() to add/delete vnode from freelist.
Add vfree()/vbusy() to keep (v_holdcnt != 0 || v_usecount != 0)
vnodes off the freelist.
Generalize vhold()/v_holdcnt to mean "do not recycle".
Fix reassignbuf()s lack of use of vhold().
Use vhold() instead of checking v_cache_src list.
Remove vtouch(), the vnodes are always vget'ed soon enough
after for it to have any measuable effect.
Add sysctl debug.freevnodes to keep track of things.
Move cache_purge() up in getnewvnodes to avoid race.
Decrement v_usecount after VOP_INACTIVE(), put a vhold() on
it during VOP_INACTIVE()
Unmacroize vhold()/vdrop()
Print out VDOOMED and VFREE flags (XXX: should use %b)
Reviewed by: dyson
holding CPU along with the lock. When a CPU fails to get the lock
it compares its own id to the holder id. If they are the same it
panic()s, as simple locks are binary, and this would cause a deadlock.
Controlled by smptests.h: SL_DEBUG, ON by default.
Some minor cleanup.
Add a simplelock to deal with disable_intr()/enable_intr() as used in UP kernel.
UP kernel expects that this is enough to guarantee exclusive access to
regions of code bracketed by these 2 functions.
Add a simplelock to bracket clock accesses in clock.c: clock_lock.
Help from: Bruce Evans <bde@zeta.org.au>
smp_active = 1 used to indicate that the system had frozen previously
started AP's, while smp_active = 0 was "AP's not yet started". I have split
this into smp_started (which is set when the AP's come online), and
smp_active is left for turning on/off AP scheduling.
- We now have enough per-cpu idle context, the real idle loop has been
revived (cpu's halt now with nothing to do).
- Some preliminary support for running some operations outside the
global lock (eg: zeroing "free but not yet zeroed pages") is present
but appears to cause problems. Off by default.
- the smp_active sysctl now behaves differently. It's merely a 'true/false'
option. Setting smp_active to zero causes the AP's to halt in the idle
loop and stop scheduling processes.
- bootstrap is a lot safer. Instead of sharing a statically compiled in
stack a number of times (which has caused lots of problems) and then
abandoning it, we use the idle context to boot the AP's directly. This
should help >2 cpu support since the bootlock stuff was in doubt.
- print physical apic id in traps.. helps identify private pages getting
out of sync. (You don't want to know how much hair I tore out with this!)
More cleanup to follow, this is more of a checkpoint than a
'finished' thing.
This unifies several times in theory indentical 50 lines of code.
The filesystems have a new method: vop_cachedlookup, which is the
meat of the lookup, and use vfs_cache_lookup() for their vop_lookup
method. vfs_cache_lookup() will check the namecache and pass on
to the vop_cachedlookup method in case of a miss.
It's still the task of the individual filesystems to populate the
namecache with cache_enter().
Filesystems that do not use the namecache will just provide the
vop_lookup method as usual.
Added a new variable, 'bsp_apic_ready', which is set as soon as the bootstrap
CPU has initialized its local APIC. Conditionalize the GENSPLR functions
to call ss_lock ONLY after bsp_apic_ready is TRUE; This should prevent
any problems with races between the time the 1st AP becomes ready and the
time smp_active is set.
free list problem. Also, the vnode age flag is no longer used by the
vnode pager. (It is actually incorrect to use then.) Constructive
feedback welcome -- just be kind.
Made NEW_STRATEGY default.
Removed misc. old cruft.
Centralized simple locks into mp_machdep.c
Centralized simple lock macros into param.h
More cleanup in the direction of making splxx()/cpl MP-safe.
Several new fine-grained locks.
New FAST_INTR() methods:
- separate simplelock for FAST_INTR, no more giant lock.
- FAST_INTR()s no longer checks ipending on way out of ISR.
sio made MP-safe (I hope).
same syscall number as NetBSD/OpenBSD. The getpgid() came from NetBSD
(I think) originally, but it's basically cut/paste/edit from the other
simple get*() syscalls.
VM systems usage of the kernel lock (lockmgr) code. This is a first
pass implementation, and is expected to evolve as needed. The API
for the lock manager code has not changed, but the underlying implementation
has changed significantly. This change should not materially affect
our current SMP or UP code without non-standard parameters being used.
This version.
1/ avoids garret's introduced potential page fault. (I got one)
2/ removes compiler warnings
Also fix the tunable scheduling quantum to return a better error code when
fed a bad argument.
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
We now tsleep() in kthread_init() between start_init()
and prepare_usermode() while waiting for ALL the idle_loop()
processes to come online.
Debugged & tested by: "Thomas D. Dean" <tomdean@ix.netcom.com>
Reviewed by: David Greenman <dg@root.com>
as chargeable CPU usage. This should mitigate the problem of processes
doing disk I/O hogging the CPU. Various users have reported the
problem, and test code shows that the problem should now be gone.
Work done by BSDI, Jonathan Lemon <jlemon@americantv.com>,
Mike Smith <msmith@gsoft.com.au>, Sean Eric Fagan <sef@kithrup.com>,
and probably alot of others.
Submitted by: Jnathan Lemon <jlemon@americantv.com>
enough and can cause some strange performance problems. Specifically, at
or near startup time is when the problem is worst. To reproduce
the problem, run "lat_syscall stat" from the alpha lmbench code right
after bootup. A positive side effect of this mod is that the name
cache can be set to grow again by sysctl. A noticable positive
performance impact is realized due to a larger namecache being available
as needed (or tuned.)
when execing a setuid/setgid binary. Code submitted by Sean Eric Fagan
(sef@freebsd.org).
Also consolidated the setuid/setgid checks into one place.
Reviewed by: dyson,sef
defined, your really getting 32. Also warn about how you can't have
more than 256 pty's when your using DEVFS (non DEVFS can use more, just
the makedev script doesn't know how to make >256). it also doesn't
allocate more memory than needed in this case.
Make sure that the signal passed in TIOCSIG isn't 0 as it might cause
a panic. I personally haven't seen this happen, but after a similar
bug in syscons crashed my machine, I'm acutely aware of this one. :)
- removed TEST_ALTTIMER.
- removed APIC_PIN0_TIMER.
- removed TIMER_ALL.
mplock.s:
- minor update of try_mplock for new algorithm where a CPU uses try_mplock
instead of get_mplock in the ISRs.
- s_lock_init()
- s_lock()
- s_lock_try()
- s_unlock()
Created lock for IO APIC and apic_imen (SMP version of imen)
- imen_lock
Code to use imen_lock for access from apic_ipl.s and apic_vector.s.
Moved this code *outside* of mp_lock.
It seems to work!!!
1) Make sure that the region mapped by a 4MB page is
properly aligned.
2) Don't turn on the PG_G flag in locore for SMP. I plan
to do that later in startup anyway.
3) Make sure the 2nd processor has PSE enabled, so that 4MB
pages don't hose it.
We don't use PG_G yet on SMP -- there is work to be done to make that
work correctly. It isn't that important anyway...
this code is controlled by smptests.h: TEST_CPUSTOP, OFF by default
new code for handling mixed-mode 8259/APIC programming without 'ExtInt'
this code is controlled by smptests.h: TEST_ALTTIMER, ON by default