probes). Apart from there being no reason to set SCSI_NOSLEEP on every
tape command, this prevents controller drivers from sleeping when resources
are fully utilized causing unecessary "Oops not queued" errors. This is
only noticed for controllers that can run out of resources like the
27/2842 adaptec controllers. Before this fix, it is almost impossible to
perform extended tape operations if more than one scsi disk is on the
bus with the tape drive with these controllers. This does not address a
similar problem that could occur if devices are probed while other targets
are active since SCSI_NOSLEEP will still be set in that case.
made a change to NFS that caused buffers at EOF to be variable size. This
had the undesired side-effect of breaking delayed writes on NFS. This
fixes it.
Submitted by: John Dyson
syslog connections unless they were rejected. This helps save wear and
tear on the syslog facility in large networks with many clienst systems.
yp_svc.c: Be a little smarter about using sigaction() -- set the SA_RESTART
flag.
svc_run: Be doubly paranoid about killing off child processes. Do a flag
chack and a pid check before letting child 'threads' self-destruct.
the problem "when a file is truncated on the server after being written on
a client under NFSv3, the client doesn't see the size drop to zero".
(As you noted, the problem is that NMODIFIED wasn't being cleared by nfs_close
when it flushed the buffers. After checking through the code, the only place
where NMODIFIED was used to test for the possibility of dirty blocks was in
nfs_setattr(). The two cases are safe to do when there aren't dirty blocks,
so I just took out the tests. Unfortunately, testing for
v_dirtyblkhd.lh_first being non-null is not sufficient, since there are
times when the code moves blocks to the clean list and then back to the
dirty list.)
Submitted by: rick@snowhite.cis.uoguelph.ca
Future Domain TMC-885 controllers. These beasts were just different enough in
a number of perverse ways to be recognised but not work with the seagate
stuff. I also whacked in blind transfers for DATAIN and DATAOUT phases - this
more than doubles my throughput. If you're dubious about that, comment out the
definition of SEA_BLINDTRANSFER. Anyway if you're running an ST01 or TMC-950
controller, please give this a go, I'd like to see if anything's broken for
those beasts.
Submitted by: Stephen Hocking <sysseh@devetir.qld.gov.au>
what CSRG had, plus make things like, TYPE, REVISION, and BRANCH
easy to set, and derive RELEASE and VERSION from them.
Kill the JUST_TELL_ME hack, it is no longer needed.
Kill DISTNAME, I could find no reveference to it any place in the
source tree.
Now I just need to rework a few bits in release/Makefile, but want
to wait and talk to jkh about that.
Oh, and your now all running:
TYPE="FreeBSD"
REVISION="2.2"
BRANCH="CURRENT"
and the -BUILD-yymmdd is dead and gone. The date was already in the
version[] string, no need for it to be there in 2 formats!
NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
the wrong branch :-(]
Eliminate incorrect double negative logic Bruce has been gripping
about for a year now. Change = no_way to = true.
Submitted by: bde (sort of, patch by me :-))
proc or any VM system structure will have to be rebuilt!!!
Much needed overhaul of the VM system. Included in this first round of
changes:
1) Improved pager interfaces: init, alloc, dealloc, getpages, putpages,
haspage, and sync operations are supported. The haspage interface now
provides information about clusterability. All pager routines now take
struct vm_object's instead of "pagers".
2) Improved data structures. In the previous paradigm, there is constant
confusion caused by pagers being both a data structure ("allocate a
pager") and a collection of routines. The idea of a pager structure has
escentially been eliminated. Objects now have types, and this type is
used to index the appropriate pager. In most cases, items in the pager
structure were duplicated in the object data structure and thus were
unnecessary. In the few cases that remained, a un_pager structure union
was created in the object to contain these items.
3) Because of the cleanup of #1 & #2, a lot of unnecessary layering can now
be removed. For instance, vm_object_enter(), vm_object_lookup(),
vm_object_remove(), and the associated object hash list were some of the
things that were removed.
4) simple_lock's removed. Discussion with several people reveals that the
SMP locking primitives used in the VM system aren't likely the mechanism
that we'll be adopting. Even if it were, the locking that was in the code
was very inadequate and would have to be mostly re-done anyway. The
locking in a uni-processor kernel was a no-op but went a long way toward
making the code difficult to read and debug.
5) Places that attempted to kludge-up the fact that we don't have kernel
thread support have been fixed to reflect the reality that we are really
dealing with processes, not threads. The VM system didn't have complete
thread support, so the comments and mis-named routines were just wrong.
We now use tsleep and wakeup directly in the lock routines, for instance.
6) Where appropriate, the pagers have been improved, especially in the
pager_alloc routines. Most of the pager_allocs have been rewritten and
are now faster and easier to maintain.
7) The pagedaemon pageout clustering algorithm has been rewritten and
now tries harder to output an even number of pages before and after
the requested page. This is sort of the reverse of the ideal pagein
algorithm and should provide better overall performance.
8) Unnecessary (incorrect) casts to caddr_t in calls to tsleep & wakeup
have been removed. Some other unnecessary casts have also been removed.
9) Some almost useless debugging code removed.
10) Terminology of shadow objects vs. backing objects straightened out.
The fact that the vm_object data structure escentially had this
backwards really confused things. The use of "shadow" and "backing
object" throughout the code is now internally consistent and correct
in the Mach terminology.
11) Several minor bug fixes, including one in the vm daemon that caused
0 RSS objects to not get purged as intended.
12) A "default pager" has now been created which cleans up the transition
of objects to the "swap" type. The previous checks throughout the code
for swp->pg_data != NULL were really ugly. This change also provides
the rudiments for future backing of "anonymous" memory by something
other than the swap pager (via the vnode pager, for example), and it
allows the decision about which of these pagers to use to be made
dynamically (although will need some additional decision code to do
this, of course).
13) (dyson) MAP_COPY has been deprecated and the corresponding "copy
object" code has been removed. MAP_COPY was undocumented and non-
standard. It was furthermore broken in several ways which caused its
behavior to degrade to MAP_PRIVATE. Binaries that use MAP_COPY will
continue to work correctly, but via the slightly different semantics
of MAP_PRIVATE.
14) (dyson) Sharing maps have been removed. It's marginal usefulness in a
threads design can be worked around in other ways. Both #12 and #13
were done to simplify the code and improve readability and maintain-
ability. (As were most all of these changes)
TODO:
1) Rewrite most of the vnode pager to use VOP_GETPAGES/PUTPAGES. Doing
this will reduce the vnode pager to a mere fraction of its current size.
2) Rewrite vm_fault and the swap/vnode pagers to use the clustering
information provided by the new haspage pager interface. This will
substantially reduce the overhead by eliminating a large number of
VOP_BMAP() calls. The VOP_BMAP() filesystem interface should be
improved to provide both a "behind" and "ahead" indication of
contiguousness.
3) Implement the extended features of pager_haspage in swap_pager_haspage().
It currently just says 0 pages ahead/behind.
4) Re-implement the swap device (swstrategy) in a more elegant way, perhaps
via a much more general mechanism that could also be used for disk
striping of regular filesystems.
5) Do something to improve the architecture of vm_object_collapse(). The
fact that it makes calls into the swap pager and knows too much about
how the swap pager operates really bothers me. It also doesn't allow
for collapsing of non-swap pager objects ("unnamed" objects backed by
other pagers).
to su to root by authenticating as themselves (using a password or S/Key)
rather than by using the root password. This is useful in contexts like
ours, where a large group of people need root access to a set of machines.
(However, the security implications are such that this should not be
enabled by default.)
The code is conditionalized on WHEELSU.
isn't used in systat or in the kernel (it was replaced by a sysctl()
call involving VM_METER) and will go away when I clean up bogus
common variables in the kernel.
- There are two cases where the server can potentially block for a long
time while servicing a request: when handling a yp_all() request, which
could take a while to complete if the map being transfered is large
(e.g. 'ypcat passwd' where passwd.byname has 10,000 entries in it),
and while doing DNS lookups when in SunOS compat mode (with the -dns
flag), since some DNS lookups can take a long time to complete. While
ypserv is blocked, other clients making requests to the server will
also block. To fix this, we fork() ypall and DNS lookups into subprocesses
and let the parent ypserv process go on servicing other incoming
requests.
We place a cap on the number of simultaneous processes that ypserv can
fork (set at 20 for now) and go back to 'linear mode' if it hits the
limit (which just means it won't fork() anymore until the number of
simultaneous processes drops under 20 again). The cap does not apply
to fork()s done as a result of ypxfr calls, since we want to do our
best to insure that map transfers from master servers succeed.
To make this work, we need our own special copy of svc_run() so that
we can properly terminate child processes once the RPC dispatch
functions have run.
(I have no idea what SunOS does in this situation. The only other
possibility I can think of is async socket I/O, but that seems
like a headache and a half to implement.)
- Do the politically correct thing and use sigaction() instead of
signal() to install the SIGCHLD handler and to ignore SIGPIPEs.
- Doing a yp_all() is sometimes slow due to the way read_database() is
implemented. This is turn is due to a certain deficiency in the DB
hash method: the R_CURSOR flag doesn't work, which means that when
handed a key and asked to return the key/data pair for the _next_
key in the map, we have to reset the DB pointer to the start of the
database, step through until we find the requested key, step one
space ahead to the _next_ key, and then use that. (The original ypserv
code used GDBM has a function called gdbm_nextkey() that does
this for you.) This can get really slow for large maps. However,
when doing a ypall, it seems that all database access are sequential,
so we can forgo the first step (the 'search the database until we find
the key') since the database should remain open and the cursor
should be positioned at the right place until the yp_all() call
finishes. We can't make this assumption for arbitrary yp_first()s
and yp_next()s however (since we may have requests from several clients
for different maps all arriving at different times) so those we have
to handle the old way.
(This would be much easier if R_CURSOR really worked. Maybe I should
be using something other than the hash method.)
(on an i486, 10 cycles (+ cache misses) instead of 15). The
change should be a no-op if the compiler is any good. The best
possible i*86 code for the same algorithm is only 1 more cycle
faster on i486's so I don't want to bother implementing an
assembler version.
scanc() is a bottleneck for OPOST processing. It is naturally
about 4 times as slow as bcopy() on 32-bit systems.
were two races:
- q_to_b() might unexpectedly return 0 (e.g, after a keyboard signal
flushes the output queue and isn't echoed). ansi_put() interprets
0 bytes as 4GB...
- more output (e.g. for echoes) might arrive afer q_to_b() returns 0.
Then scstart() returns presumably and the new output might not be
handled for a long time.
Remove unused function scxint().
Fix prototypes (foo() isn't a prototype).
syscons' output is now only about 4-5 times slower than I want.
It loses a factor of 2 for scrolling output by unnecessarily copying
the screen buffer, a factor of 4/3 for dumb OPOST processing, and
a factor of 3/2 for clist processing.
fclosed twice and this didn't seem to cause any problems, but when
/var/crash was on an an unwritable nfs-mounted partition, fclose(NULL)
caused a core dump.