Supply the moduledata handle rather than the event dispatcher function.
This should explain the panic on boot problem that's been discussed in
-current at the moment. Both machines had GNU_MATH_EMULATE.
This is the bulk of the support for doing kld modules. Two linker_sets
were replaced by SYSINIT()'s. VFS's and exec handlers are self registered.
kld is now a superset of lkm. I have converted most of them, they will
follow as a seperate commit as samples.
This all still works as a static a.out kernel using LKM's.
but when i_effnlink was added to support soft updates, there was only
room for 4 spares. The number of spares was not reduced, so the inode
size became 260 (on i386's), or 512 after rounding up by malloc().
Use one spare field in `struct dinode' instead of the 5th spare field
in the inode and reduced to 4 spares in the inode so that the size is
256 again.
Changed the types of the spares in the inode from int to u_int32_t
so that the inode size has more chance of being <= 256 under other
arches, and downdated ext2fs to match (it was broken to use ints
before rev.1.1).
an ext2fs file system is mounted. The soft update changes added
a check for B_DELWRI buffers. This exposed the complete brokenness
of the previous quick fix for failing syncs (PR 3571, committed on
1997/08/04). Use a new buffer flag B_DIRTY and don't abuse B_DELWRI.
B_DIRTY buffers are still written too late, as broken in the previous
fix. This is fairly harmless, because B_DIRTY is only used for
bitmap buffers and fsck.ext2 can fix up the bitmaps perfectly.
Fixed a race in ULCK_BUF() (bremfree() was outside of the splbio()
section).
syncs weren't optimized properly (they probably still aren't, but are bug
for bug compatible with ffs). These fixes are mostly academic, since
ext2fs is too broken to mount read-write (it apparently doesn't clear
indirect blocks).
Obtained from: mostly from Lite2
Fixes for bugs not shared with ffs:
- don't mount unclean filesystems rw unless forced to.
- accept EXT2_ERROR_FS (treat it like !EXT2_VALID_FS). We still don't set
this or honour the maximal mount count.
- don't attempt to print the name of the mount point when mounting an
unclean file system, since the name of the previous mount point is
unknown and the name of the current mount point is still "".
Fixes for bugs shared with ffs until recently:
- don't set the clean flag on unmount of an initially-unclean filesystem
that was (forcibly) mounted rw.
- set the clean flag on rw -> ro update of a mounted initially-clean
filesystem.
- fixed some style bugs (mostly long lines).
The fixes are slightly simpler than for ffs, because the relevant on-disk
state is not a simple boolean variable, and the superblock has a core-only
extension.
Obtained from: parts from ffs_vfsops.c, parts from NetBSD
surprisingly few problems. Most fields were initialized to the
correct values by bzero(), but lk_prio was 0 instead of PINOD (=8),
the lk_wmsg was NULL instead of "ext2in", and lk_lockholder was 0
instead of -1.
Obtained from: Lite2 via the -current ffs_vfsops.c
references to them.
The change a couple of days ago to ignore these numbers in statically
configured vfsconf structs was slightly premature because the cd9660,
cfs, devfs, ext2fs, nfs vfs's still used MOUNT_* instead of the number
in their vfsconf struct.
for the Lite2 fix for always returning EIO in dead_read().
Cleaned up the cdevswitch initializers for all tty drivers.
Removed explicit calls to ttsetwater() from all (tty) drivers. ttsetwater()
is now called centrally for opens, not just for parameter changes.
clustering is obsolescent technology so hardly anyone noticed. On
a DORS 32160 SCSI drive with 4 tags, read clustering makes very
little difference even for huge sequential reads. However, on a
ZIP SCSI drive with 0 tags, the minimum overhead per block is about
40 msec, so very large clusters must be used to get anywhere near
the maximum transfer rate. Using clusters consisting of 1 8K block
reduces the transfer rate to about 250K/sec. Under msdosfs, missing
read clustering is normal and a cluster size of 1 512 byte block
reduces the transfer rate to about 25K/sec.
Broken in: rev.1.18
was broken), 1.30 (COMPAT_43 option header was missing), 1.31 (DEVFS
option header was missing), 1.33 (garbage pointers were followed
in debugging code). Cosmetic changes from 1.27, 1.32, 1.36, 1.37.
Of course, the DEVFS code didn't even compile. Fixed. Not tested.
Forgotten by: brian
This file should not exist. It is the same as dgb.c except for lots of
renamed variables, about 250 lines removed, and only about 100 lines of
real differences.
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
as possible (when the inode is reclaimed). Temporarily only do
this if option UFS_LAZYMOD configured and softupdates aren't enabled.
UFS_LAZYMOD is intentionally left out of /sys/conf/options.
This is mainly to avoid almost useless disk i/o on battery powered
machines. It's silly to write to disk (on the next sync or when the
inode becomes inactive) just because someone hit a key or something
wrote to the screen or /dev/null.
PR: 5577
Previous version reviewed by: phk
in ufs_setattr() so that there is no need to pass timestamps to
UFS_UPDATE() (everything else just needs the current time). Ignore
the passed-in timestamps in UFS_UPDATE() and always call ufs_itimes()
(was: itimes()) to do the update. The timestamps are still passed
so that all the callers don't need to be changed yet.
by hacking on locked buffers without getblk()ing them, and we didn't
even use splbio() to prevent biodone() changing the buffer underneath
use when a write completes. I think there was no problem in practice
on i386's because the operations on b_flags and numdirtybufs happen to
be atomic. We still depend on biodone()'s operations on b_flags not
interfering with ours. I think there is only interference for B_ERROR,
and this is harmless because errors for async writes are ignored anyway.
Don't use mark_buffer_dirty() except for superblock-related metadata.
It was used in just one case where ordinary BSD buffering is more
natural.
to not using splbio(), and has rotted a little. The races were
probably harmless in practice because this function was only used
for superblock updates, and separate superblock updates are probably
prevented from running into each other by doing part of the update
synchronously.
FreeBSD/alpha. The most significant item is to change the command
argument to ioctl functions from int to u_long. This change brings us
inline with various other BSD versions. Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.
The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
Don't forget to clear the inode hash lock before returning from ext2_vget()
after getnewvnode() fails. Obtained from: rev.1.24 of ffs_vfsops.c (the
original patch for the getnewvnode() race). Forgotten in: rev.1.4 here.
Removed a duplicate comment. Duplicated in: rev.1.4 here.
Fixed the MALLOC() vs getnewvnode() race in ext2_vget(). Obtained from:
rev.1.39 of ffs_vfsops.c.
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it. This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.
Quality testing was done with testvn, and lat_fs from the lmbench suite.
Some NFS client testing courtesy of Patrik Kudo.
vop_mknod and vop_symlink still release the returned vpp. vop_rename
still releases 4 vnode arguments before it returns. These remaining cases
will be corrected in the next set of patches.
---------
Submitted by: Michael Hancock <michaelh@cet.co.jp>
Reverse the VFS_VRELE patch. Reference counting of vnodes does not need
to be done per-fs. I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.
The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.
Submitted by: Michael Hancock <michaelh@cet.co.jp>
- pessimized i/o port types.
- other pessimized types.
- Don't use DEBUG (causes LINT warnings). Use DGB_DEBUG instead.
- commented out code.
- cloned code that doesn't apply ("Smarts" is for the cy driver only).
Submitted by: bde
dereference uninitialized pointers.
o Fix DEVFS permissions
o Fix DEVFS minor numbers
o Add initial & lock devices for cua device.
o Fix permissions in line with sio.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.
Most uses of time.tv_sec now uses the new variable time_second instead.
gettime() changed to getmicrotime(0.
Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).
A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.
Add a new nfs_curusec() function.
Mark a couple of bogosities involving the now disappeard time variable.
Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.
Change profiling in ncr.c to use ticks instead of time. Resolution is
the same.
Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.
Reviewed by: bde
They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().
Various patches to use the two new functions instead of the various
hacks used in their absence.
Some puntuation and grammer patches from Bruce.
A couple of XXX comments.
a complement to all ops that return a vpp, VFS_VRELE. This is
initially only for file systems that implement the following ops
that do a WILLRELE:
vop_create, vop_whiteout, vop_mknod, vop_remove, vop_link,
vop_rename, vop_mkdir, vop_rmdir, vop_symlink
This is initial DNA that doesn't do anything yet. VFS_VRELE is
implemented but not called.
A default vfs_vrele was created for fs implementations that use the
standard vnode management routines.
VFS_VRELE implementations were made for the following file systems:
Standard (vfs_vrele)
ffs mfs nfs msdosfs devfs ext2fs
Custom
union umapfs
Just EOPNOTSUPP
fdesc procfs kernfs portal cd9660
These implementations may change as VOP changes are implemented.
In the next phase, in the vop implementations calls to vrele and the vrele
part of vput will be moved to the top layer vfs_vnops and made visible
to all layers. vput will be replaced by unlock in these cases. Unlocking
will still be done in the per fs layer but the refcount decrement will be
triggered at the top because it doesn't hurt to hold a vnode reference a
little longer. This will have minimal impact on the structure of the
existing code.
This will only be done for vnode arguments that are released by the various
fs vop implementations.
Wider use of VFS_VRELE will likely require restructuring of the code.
Reviewed by: phk, dyson, terry et. al.
Submitted by: Michael Hancock <michaelh@cet.co.jp>
If you want to play with it, you can find the final version of the
code in the repository the tag LFS_RETIREMENT.
If somebody makes LFS work again, adding it back is certainly
desireable, but as it is now nobody seems to care much about it,
and it has suffered considerable bitrot since its somewhat haphazard
integration.
R.I.P
This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)
LFS is temporarily disabled, and will be re-enabled tomorrow.
Sorted the functions into the same order as in ufs_vnops.c so that this
can be compared with the latter without getting 2627 lines of diffs.
Now we get only 1920 lines of diffs.
triple indirect blocks only worked for block sizes of 4K, since
MNINDIR(ump)**3 overflows for larger block sizes (e.g.,
(8192/4)**3 = 2**33 > INT_MAX). This fix is not the obvious one of
changing some types to 64 bits. It rearranges the code to avoid some
unnecessary 64-bit calculations.
Reviewed by: Kirk McKusick <mckusick@McKusick.COM>
Ever since I first say the way the mount flags were used I've hated the
fact that modes, and events, internal and exported, and short-term
and long term flags are all thrown together. Finally it's annoyed me enough..
This patch to the entire FreeBSD tree adds a second mount flag word
to the mount struct. it is not exported to userspace. I have moved
some of the non exported flags over to this word. this means that we now
have 8 free bits in the mount flags. There are another two that might
well move over, but which I'm not sure about.
The only user visible change would have been in pstat -v, except
that davidg has disabled it anyhow.
I'd still like to move the state flags and the 'command' flags
apart from each other.. e.g. MNT_FORCE really doesn't have the
same semantics as MNT_RDONLY, but that's left for another day.
1. Remove VOP_UPDATE, it is (also) an UFS/{FFS,LFS,EXT2FS,MFS}
intereface function, and now lives in the ufsmount structure.
2. Remove VOP_SEEK, it was unused.
3. Add mode default vops:
VOP_ADVLOCK vop_einval
VOP_CLOSE vop_null
VOP_FSYNC vop_null
VOP_IOCTL vop_enotty
VOP_MMAP vop_einval
VOP_OPEN vop_null
VOP_PATHCONF vop_einval
VOP_READLINK vop_einval
VOP_REALLOCBLKS vop_eopnotsupp
And remove identical functionality from filesystems
4. Add vop_stdpathconf, which returns the canonical stuff. Use
it in the filesystems. (XXX: It's probably wrong that specfs
and fifofs sets this vop, shouldn't it come from the "host"
filesystem, for instance ufs or cd9660 ?)
5. Try to make system wide VOP functions have vop_* names.
6. Initialize the um_* vectors in LFS.
(Recompile your LKMS!!!)
1. Add new file "sys/kern/vfs_default.c" where default actions for
VOPs go. Implement proper defaults for ABORTOP, BWRITE, LEASE,
POLL, REVOKE and STRATEGY. Various stuff spread over the entire
tree belongs here.
2. Change VOP_BLKATOFF to a normal function in cd9660.
3. Kill VOP_BLKATOFF, VOP_TRUNCATE, VOP_VFREE, VOP_VALLOC. These
are private interface functions between UFS and the underlying
storage manager layer (FFS/LFS/MFS/EXT2FS). The functions now
live in struct ufsmount instead.
4. Remove a kludge of VOP_ functions in all filesystems, that did
nothing but obscure the simplicity and break the expandability.
If a filesystem doesn't implement VOP_FOO, it shouldn't have an
entry for it in its vnops table. The system will try to DTRT
if it is not implemented. There are still some cruft left, but
the bulk of it is done.
5. Fix another VCALL in vfs_cache.c (thanks Bruce!)
1. Use the default function to access all the specfs operations.
2. Use the default function to access all the fifofs operations.
3. Use the default function to access all the ufs operations.
4. Fix VCALL usage in vfs_cache.c
5. Use VOCALL to access specfs functions in devfs_vnops.c
6. Staticize most of the spec and fifofs vnops functions.
7. Make UFS panic if it lacks bits of the underlying storage handling.
1. Remove comment stating the blatantly obvious.
2. Align in two columns.
3. Sort all but the default element alphabetically.
4. Remove XXX comments pointing out entries not needed.
ip->i_flags.
Rev.1.18 completely broke ufs. My root directory went away about 10
seconds after booting. I think file system damage was null, since
IN_HASHED = 0x80 is not used in the disk flags (it would probably
be UF_SOMETHING if it were used).
Distribute all but the most fundamental malloc types. This time I also
remembered the trick to making things static: Put "static" in front of
them.
A couple of finer points by: bde
1. Clustered I/O is switched by the MNT_NOCLUSTERR and MNT_NOCLUSTERW
bits of the mnt_flag. The sysctl variables, vfs.foo.doclusterread
and vfs.foo.doclusterwrite are deleted. Only mount option can
control clustered I/O from userland.
2. When foofs_mount mounts block device, foofs_mount checks D_CLUSTERR
and D_CLUSTERW bits of the d_flags member in the block device switch
table. If D_NOCLUSTERR / D_NOCLUSTERW are set, MNT_NOCLUSTERR /
MNT_NOCLUSTERW bits will be set. In this case, MNT_NOCLUSTERR and
MNT_NOCLUSTERW cannot be cleared from userland.
3. Vnode driver disables both clustered read and write.
4. Union filesystem disables clutered write.
Reviewed by: bde
This unifies several times in theory indentical 50 lines of code.
The filesystems have a new method: vop_cachedlookup, which is the
meat of the lookup, and use vfs_cache_lookup() for their vop_lookup
method. vfs_cache_lookup() will check the namecache and pass on
to the vop_cachedlookup method in case of a miss.
It's still the task of the individual filesystems to populate the
namecache with cache_enter().
Filesystems that do not use the namecache will just provide the
vop_lookup method as usual.
doclusterread/doclusterwrite into ext2_doclusterread and
ext2_doclusterwrite, which are unique names. Moved #include of
<sys/sysctl.h> to the top of the file.
Pointed out by: Bruce Evans <bde@zeta.org.au>
socket addresses in mbufs. (Socket buffers are the one exception.) A number
of kernel APIs needed to get fixed in order to make this happen. Also,
fix three protocol families which kept PCBs in mbufs to not malloc them
instead. Delete some old compatibility cruft while we're at it, and add
some new routines in the in_cksum family.
keep ahold of buffers, and therefore leave filesystems dirty. I haven't
been able to test, but the code compiles. Those who run -current, please
test and report back!!! (Sorry :-)).
PR: kern/3571
Submitted by: Dirk Keunecke <dk@panda.rhein-main.de>
cause a problem of spiraling death due to buffer resource limitations.
The vfs_bio code in general had little ability to handle buffer resource
management, and now it does. Also, there are a lot more knobs for tuning the
vfs_bio code now. The knobs came free because of the need that there
always be some immediately available buffers (non-delayed or locked) for
use. Note that the buffer cache code is much less likely to get bogged
down with lots of delayed writes, even more so than before.
have successfully built, booted, and run a number of different ELF
kernel configurations, including GENERIC. LINT also builds and
links cleanly, though I have not tried to boot it.
The impact on developers is virtually nil, except for two things.
All linker sets that might possibly be present in the kernel must be
listed in "sys/i386/i386/setdefs.h". And all C symbols that are
also referenced from assembly language code must be listed in
"sys/i386/include/asnames.h". It so happens that failure to do
these things will have no impact on the a.out kernel. But it will
break the build of the ELF kernel.
The ELF bootloader works, but it is not ready to commit quite yet.
defining doff_t both here and in <ufs/ufs/dir.h> so that this file
is independent of <ufs/ufs/dir.h>. It still has old prerequisites
<sys/param.h> and <ufs/ufs/quota.h>, and a new Lite2 prerequisite of
<sys/lock.h>, sigh.
This might fix lsof, which was broken by namespace pollution giving
conflicting definitions of DIRBLKSIZ.
form `tv = time'. Use a new function gettime(). The current version
just forces atomicicity without fixing precision or efficiency bugs.
Simplified some related valid accesses by using the central function.
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.
The system boots and can mount UFS filesystems.
Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
Mount_std mounts will not work until the getfsent
library routine is changed.
Reviewed by: various people
Submitted by: Jeffery Hsu <hsu@freebsd.org>
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.
Boy, I'm glad we're not using sup anymore. This update would have been
insane otherwise.
Firstly, now our read-ahead clustering is on a file descriptor basis and not
on a per-vnode basis. This will allow multiple processes reading the
same file to take advantage of read-ahead clustering. Secondly, there
previously was a problem with large reads still using the ramp-up
algorithm. Of course, that was bogus, and now we read the entire
"chunk" off of the disk in one operation. The read-ahead clustering
algorithm should use less CPU than the previous also (I hope :-)).
directories to a different parent directory always failed. This bug
was caused by 4.4Lite2 changing the directory format and ext2fs not
keeping up.
Should be in 2.2.
This fixes several bugs and one missing feature:
- cluster_read() was needlessly used for reading files of size exactly 1
block.
- EFAULT errors for read didn't terminate the loop. This was probably
harmless.
- IO_VMIO handling was missing near line 275. I don't know what this does.
- B_CLUSTEROK was only set if (doclusterwrite) nead line 293. This was
harmless, if only because another bug prevents doclusterwrite from being
0.
- MNT_NOATIME wasn't implemented.
This should be in 2.2, of course.
Reviewed by: davidg
- don't include <sys/ioctl.h> in any header. Include <sys/ioccom.h>
instead. This was already done in 4.4Lite for the most important
ioctl headers. Header spam currently increases kernel build
times by 10-20%. There are more than 30000 #includes (not counting
duplicates) for compiling LINT.
- include <sys/types.h> if and only it is necessary to make the header
almost self-sufficient (some ioctl headers still need structs from
elsewhere).
- uniformized idempotency ifdefs. Copied the style in the 4.4Lite
ioctl headers.
/*
* Structure defined by POSIX.4 to be like a timeval.
*/
struct timespec {
time_t ts_sec; /* seconds */
long ts_nsec; /* and nanoseconds */
};
The correct names of the fields are tv_sec and tv_nsec.
Reminded by: James Drobina <jdrobina@infinet.com>