<sys/bio.h>.
<sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall
not be made a nested include according to bdes teachings on the
subject of nested includes.
Diskdrivers and similar stuff below specfs::strategy() should no
longer need to include <sys/buf.> unless they need caching of data.
Still a few bogus uses of struct buf to track down.
Repocopy by: peter
provide locking over extended attribute operations, requiring that
individual operations be atomic. Allowing non-zero starting offsets
permits applications/etc to put themselves at risk for inconsistent
behavior. As VOP_SETEXTATTR already prohibited non-zero write offsets,
this makes sense.
Suggested by: Andreas Gruenbacher <a.gruenbacher@bestbits.at>
reset their grace timer as their ownership crossed the soft limit
threshhold. Thus if they had been over their limit in the past,
they were suddenly penalized as if they had been over their limit
ever since. The fix is to check when root gives away files, that
when the receiving user crosses their soft limit, their grace timer
is reset. See the PR report for a detailed method of reproducing
the bug.
PR: kern/17128
Submitted by: Andre Albsmeier <andre.albsmeier@mchp.siemens.de>
Reviewed by: Kirk McKusick <mckusick@mckusick.com>
chances of consistency with other file/directory meta-data in a
write. In the current set of extended attribute applications,
this does not hurt much. This should be discussed again later when
it comes time to optimize performance of attributes.
o Include an inode generation number in the per-attribute header
information. This allows consistency verification to catch when
a crash occurs, or an inode is recycled while attributes are not
properly configured. For now, an irritating error message is
displayed when an inconsistency occurs. At some point, may introduce
an ``extattrctl check ...'' which catches these before attributes
are enabled. Not today. If you get this message, it means you
somehow managed to get your attribute backing file out of synch
with the file system. When this occurs, attribute not found is
returned (== undefined). Writes will overwrite the value there
correcting the problem. Might want to think about introducing
a new errno or two to handle this kind of situation.
Discussed with: kris
o Put back in {} removed during over-zealous cleanup of gratuitous
debugging output during preparation for the commit. Due to the
missing {}, writes on extended attributes always silently failed.
Doh.
o Don't unlock the target vnode if it's the backing vnode, as we
don't lock the target vnode if it's the backing vnode.
Exceptions:
Vinum untouched. This means that it cannot be compiled.
Greg Lehey is on the case.
CCD not converted yet, casts to struct buf (still safe)
atapi-cd casts to struct buf to examine B_PHYS
(name, value) pairs to be associated with inodes. This support is
used for ACLs, MAC labels, and Capabilities in the TrustedBSD
security extensions, which are currently under development.
In this implementation, attributes are backed to data vnodes in the
style of the quota support in FFS. Support for FFS extended
attributes may be enabled using the FFS_EXTATTR kernel option
(disabled by default). Userland utilities and man pages will be
committed in the next batch. VFS interfaces and man pages have
been in the repo since 4.0-RELEASE and are unchanged.
o ufs/ufs/extattr.h: UFS-specific extattr defines
o ufs/ufs/ufs_extattr.c: bulk of support routines
o ufs/{ufs,ffs,mfs}/*.[ch]: hooks and extattr.h includes
o contrib/softupdates/ffs_softdep.c: extattr.h includes
o conf/options, conf/files, i386/conf/LINT: added FFS_EXTATTR
o coda/coda_vfsops.c: XXX required extattr.h due to ufsmount.h
(This should not be the case, and will be fixed in a future commit)
Currently attributes are not supported in MFS. This will be fixed.
Reviewed by: adrian, bp, freebsd-fs, other unthanked souls
Obtained from: TrustedBSD Project
devstat_end_transaction_bio()
bioq_* versions of bufq_* incl bioqdisksort()
the corresponding "buf" versions will disappear when no longer used.
Move b_offset, b_data and b_bcount to struct bio.
Add BIO_FORMAT as a hack for fd.c etc.
We are now largely ready to start converting drivers to use struct
bio instead of struct buf.
(Much of this done by script)
Move B_ORDERED flag to b_ioflags and call it BIO_ORDERED.
Move b_pblkno and b_iodone_chain to struct bio while we transition, they
will be obsoleted once bio structs chain/stack.
Add bio_queue field for struct bio aware disksort.
Address a lot of stylistic issues brought up by bde.
async I/O's. The sequential read heuristic has been extended to
cover writes as well. We continue to call cluster_write() normally,
thus blocks in the file will still be reallocated for large (but still
random) I/O's, but I/O will only be initiated for truely sequential
writes.
This solves a number of annoying situations, especially with DBM (hash
method) writes, and also has the side effect of fixing a number of
(stupid) benchmarks.
Reviewed-by: mckusick
substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo)
substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo)
This patch is machine generated except for the ccd.c and buf.h parts.
field in struct buf: b_iocmd. The b_iocmd is enforced to have
exactly one bit set.
B_WRITE was bogusly defined as zero giving rise to obvious coding
mistakes.
Also eliminate the redundant struct buf flag B_CALL, it can just
as efficiently be done by comparing b_iodone to NULL.
Should you get a panic or drop into the debugger, complaining about
"b_iocmd", don't continue. It is likely to write on your disk
where it should have been reading.
This change is a step in the direction towards a stackable BIO capability.
A lot of this patch were machine generated (Thanks to style(9) compliance!)
Vinum users: Greg has not had time to test this yet, be careful.
the unwary if the code were called in slightly different ways.
1) In ufs_bmaparray() the code for calculating 'runb' will stop one block
short of the first entry in an indirect block. i.e. if an indirect block
contains N block numbers b[0]..b[N-1] then the code will never check if
b[0] and b[1] are sequential. For reference, compare with the equivalent
code that deals with direct blocks.
2) In ufs_lookup() there is an off-by-one error in the test that checks
if dp->i_diroff is outside the range of the the current directory size.
This is completely harmless, since the following while-loop condition
'dp->i_offset < endsearch' is never met, so the code immediately
does a second pass starting at dp->i_offset = 0.
3) Again in ufs_lookup(), the condition in a sanity check is wrong
for directories that are longer than one block. This bug means that
the sanity check is only effective for small directories.
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
being accessed after the bp had been releaed. A simple move of the
brelse() solves the problem.
Approved by: jkh
Submitted by: Ian Dowse <iedowse@maths.tcd.ie>
filesystem fills up. If the first indirect block exists and FFS is able
to allocate deeper indirect blocks, but is not able to allocate the
data block, FFS improperly unwinds the indirect blocks and leaves a
block pointer hanging to a freed block. This will cause a panic later
when the file is removed. The solution is to properly account for the
first block-pointer-to-an-indirect-block we had to create in a balloc
operation and then unwind it if a failure occurs.
Detective work by: Ian Dowse <iedowse@maths.tcd.ie>
Reviewed by: mckusick, Ian Dowse <iedowse@maths.tcd.ie>
Approved by: jkh
to the current jail/chflags interactions. This fix conditionalizes ``root
behavior'' in the chflags() case on not being in jail, so attempts to
perform a chflags in a jail are limited to what a normal user could do.
For example, this does allow setting of user flags as appropriate, but
prohibits changing of system flags.
Reviewed by: bde
a mess in securelevel environments. Results in one warning during
/etc/rc as it attempts to remove file flags, but this is harmless.
Approved by: High Lord Hubbard
have a write in progress. Otherwise one can get in an infinite loop
trying to get them all flushed.
Submitted by: Matthew Dillon <dillon@apollo.backplane.com>
set of restrictions for cancelling an inode dependency (inodedep)
is somewhat stronger than originally coded. Since this check appears
in two places, we codify it into the function check_inode_unwritten
which we then call from the two sites, one freeing blocks and the
other freeing directory entries.
Submitted by: Steinar Haug via Matthew Dillon
so that they never try to lock an inode corresponding to ".." as this
can lead to deadlock. We observe that any inode with an updated link count
is always pushed into its buffer at the time of the link count change, so
we do not need to do a VOP_UPDATE, but merely find its buffer and write it.
The only time we need to get the inode itself is from the result of a
mkdir whose name will never be ".." and hence locking such an inode will
never request a lock above us in the filesystem tree. Thanks to Brian
Fundakowski Feldman for providing the test program that tickled soft updates
into hanging in "inode" sleep.
Submitted by: Brian Fundakowski Feldman <green@FreeBSD.org>
to sleep). Locking 101, part 2: do not look at buffer contents after
you have been asleep. There is no telling what wonderous changes may
have occurred.
This seems to be responsible for a bunch of panics where the process
sleeps and something else finds softupdates "locked" when it shouldn't
be. This commit is unreviewed, but has been a big help here.
Previously my boxes would panic pretty much on the first fsync() that
wrote something to disk.
it is no longer sufficient to get a lock on a buffer to know
that its write has been completed. We have to first get the
lock on the buffer, then check to see if it is doing a
background write. If it is doing background write, we have
to wait for the background write to finish, then check to see
if that fullfilled our dependency, and if not to start another
write. Luckily the explanation is longer than the fix.
a vnode has not been written (which would clear certain of its
dependencies). The problems arises because fsync with MNT_NOWAIT
no longer pushes all the dirty blocks associated with a vnode. It
skips those that require rollbacks, since they will just get instantly
dirty again. Such skipped blocks are marked so that they will not be
skipped a second time (otherwise circular dependencies would never
clear). So, we fsync twice to ensure that everything will be written
at least once.
The problem occurs when an indirect block and a data block are
being allocated at the same time. For example when the 13th block
of the file is written, the filesystem needs to allocate the first
indirect block and a data block. If the indirect block allocation
succeeds, but the data block allocation fails, the error code
dellocates the indirect block as it has nothing at which to point.
Unfortunately, it does not deallocate the indirect block's associated
dependencies which then fail when they find the block unexpectedly
gone (ptr == 0 instead of its expected value). The fix is to fsync
the file before doing the block rollback, as the fsync will flush
out all of the dependencies. Once the rollback is done the file
must be fsync'ed again so that the soft updates code does not find
unexpected changes. This approach is much slower than writing the
code to back out the extraneous dependencies, but running out of
disk space is not expected to be a common occurence, so just getting
it right is the main criterion.
PR: kern/15063
Submitted by: Assar Westerlund <assar@stacken.kth.se>
have been cleaned up by deallocte_dependencies(). Once that is done, it
is safe to post the request to free the blocks. A similar change is also
needed for the freefile case.
1) Fastpath deletions. When a file is being deleted, check to see if it
was so recently created that its inode has not yet been written to
disk. If so, the delete can proceed to immediately free the inode.
2) Background writes: No file or block allocations can be done while the
bitmap is being written to disk. To avoid these stalls, the bitmap is
copied to another buffer which is written thus leaving the original
available for futher allocations.
3) Link count tracking. Constantly track the difference in i_effnlink and
i_nlink so that inodes that have had no change other than i_effnlink
need not be written.
4) Identify buffers with rollback dependencies so that the buffer flushing
daemon can choose to skip over them.
of dirrem structure rather than the collaterally created freeblks
and freefile structures. Limit the rate of buffer dirtying by the
syncer process during periods of intense file removal.
check before the inode is unlocked while grabbing its parent directory.
Once it is unlocked, other operations may slip in that could make
the inode-is-flushed check fail. Allowing other writes to the inode
before returning from fsync does not break the semantics of fsync
since we have flushed everything that was dirty at the time of the
fsync call.
is an application space macro and the applications are supposed to be free
to use it as they please (but cannot). This is consistant with the other
BSD's who made this change quite some time ago. More commits to come.
when I made the absence of the clean flag sticky in rev.1.88. This
was a problem main for "mount /". There is no way to mount "/" for
writing without using mount -u (normally implicitly), so after
"mount -f /" of an unclean filesystem, the absence of the clean flag
was sticky forever.