of sillyrenames (which were limited to 58 per pid per directory,
for no good reason). The new format of sillyrenames looks like
.nfs.0000b31a.00d24.4
^^^^^^^^ ^^^^^
ticks pid
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
- NFS direct IO completely bypasses the buffer and page caches.
If a file is open for direct IO all caching is disabled.
- Direct IO for Directories will be addressed later.
- 2 new NFS directio related sysctls are added. One is a knob to
disable NFS direct IO completely (direct IO is enabled by default).
The other is to disallow mmaped IO on a file that has at least one
O_DIRECT open (see the comment in nfs_vnops.c for more details).
The default is to allow mmaps on a file that has O_DIRECT opens.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
ia64) was not the result of a change in the vector operations. It
was caused by the NFS locking code using a FIFO and those bypassing
the vnode. This indirectly caused the panic. The NFS locking code has
been changed.
Requested by: phk
either src or dst) fails. This closes a potential data loss case
(where the fsync failed with ENOSPC, for example).
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
Kick off a readahead only when sequential access is detected. This
eliminates wasteful readaheads in random file access.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Obtained from: Yahoo!
upcalls which do RPC header parsing and match up the reply with the
request. NFS calls now sleep on the nfsreq structure. This enables
us to eliminate the NFS recvlock.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
- Change the cached mtime to a 'struct timespec' from a
time_t. Improving the precision of the cached mtime tightens up
NFS' "close-to-open" consistency considerably.
- Always force an over-the-wire consistency check from nfs_open()
(unless the file is marked modified). This further improves
NFS' "close-to-open" consistency.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
vnode EXCLUSIVE lock. This prevents threads from adding pages to
the vnode while an invalidation is in progress, closing potential
races. In the bioread() path, callers acquire the SHARED vnode lock
- so while an invalidate was in progress, it was possible to fault
in new pages onto the vnode causing the invalidation to take a while
or fail. We saw these races at Yahoo! with very large files+heavy
concurrent access. Forcing an upgrade to EXCLUSIVE lock before doing
the invalidation closes all these races.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
is safe to turn off the nfsnode's NMODIFIED flag.
- Move the check for signals to the top of the loop where we loop
around the dirty buffers on the vnode, scheduling writes. This
ensures that we'll break ouf of the flush operation on reception of
a signal.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
commit. In the new world order, the transitive closure on the vector
operations is not precomputed. As such, it's unsafe to actually use
any of the function pointers in an indirect function call. They can
be null, and we need to use the default vector in that case.
This is mostly a quick fix for the four function pointers that are
ed explicitly. A more generic or scalable solution is likely to see
the light of day.
No pathos on: current@
initializations but we did have lofty goals and big ideals.
Adjust to more contemporary circumstances and gain type checking.
Replace the entire vop_t frobbing thing with properly typed
structures. The only casualty is that we can not add a new
VOP_ method with a loadable module. History has not given
us reason to belive this would ever be feasible in the the
first place.
Eliminate in toto VOCALL(), vop_t, VNODEOP_SET() etc.
Give coda correct prototypes and function definitions for
all vop_()s.
Generate a bit more data from the vnode_if.src file: a
struct vop_vector and protype typedefs for all vop methods.
Add a new vop_bypass() and make vop_default be a pointer
to another struct vop_vector.
Remove a lot of vfs_init since vop_vector is ready to use
from the compiler.
Cast various vop_mumble() to void * with uppercase name,
for instance VOP_PANIC, VOP_NULL etc.
Implement VCALL() by making vdesc_offset the offsetof() the
relevant function pointer in vop_vector. This is disgusting
but since the code is generated by a script comparatively
safe. The alternative for nullfs etc. would be much worse.
Fix up all vnode method vectors to remove casts so they
become typesafe. (The bulk of this is generated by scripts)
a deadlock (with NFS exclusive vnode locks enabled). Lookup
grabs the parent's lock and wants to lock child. Readdirplus
locks the child and wants to lock parent (for loading the attrs
for ".."). The fix is to not load the attrs for ".." in
readdirplus.
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
This closes a major hole in close-to-open consistency support.
Added a new sysctl so that this can be disabled for single NFS
client applications with very large amounts of mmap'ed IO (for
performance).
Submitted by: Mohan Srinivasan mohans at yahoo-inc dot com
Reviewed by: rwatson
Extend it with a strategy method.
Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.
Rename ibwrite to bufwrite().
Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.
Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().
Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
Initialize b_bufobj for all buffers.
Make incore() and gbincore() take a bufobj instead of a vnode.
Make inmem() local to vfs_bio.c
Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),
Make buf_vlist_add() take a bufobj instead of a vnode.
Eliminate other uses of bp->b_vp where bp->b_bufobj will do.
Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj. Bufobj_wdrop() replaces vwakeup().
Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.
Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
Rebind the client socket when we experience a timeout. This fixes
the case where our IP changes for some reason.
Signal a VFS event when NFS transitions from up to down and vice
versa.
Add a placeholder vfs_sysctl where we will put status reporting
shortly.
Also:
Make down NFS mounts return EIO instead of EINTR when there is a
soft timeout or force unmount in progress.
NFSv3. It's likely that modifying the attributes will affect the
file's accessibility. This version of the patch is one suggested
by Ian Dowse after reviewing my original attempt in the PR
Reviewed By: iedowse
PR: kern/44336
MFC after: 3 days
The reason this was done was to avoid a race to the root when an
NFS server went down. However a semi-recent change to the way that
the kernel's lookup() routine traverses mount points prevents this.
Rev 1.39 of vfs_lookup.c changed the ordering of locks such that we
aquire a shared lock on the mount point being accessed and then drop
the directory vnode lock before requesting the target lock.
With that in place we no longer need shared locks for NFS to prevent
race to the root lockups.
to set np->n_size back to the desired size again after calling
nfs_meta_setsize(), since it could end up in nfs_loadattrcache() getting
called, which would change n_size back to the value it had before the
truncate request was issued. The result of this bug is that the size info
cached in the nfsnode becomes incorrect, lseek(fd, ofs, SEEK_END) seeks
past the end of the file, stat() returns the wrong size, etc.
PR: 41792
MFC after: 2 weeks
has not been cleaned in the meantime, since this can happen during
a forced unmount. Also add a comment that nfs_removeit() should
really be locking the directory vnode before calling nfs_removerpc().
Reported by: mbr
Tested by: mbr
MFC after: 1 week
VOP_SETATTR() or VOP_GETATTR(); without these locks (a) VFS_DEBUG_LOCKS
will panic, and (b) it may be possible to corrupt entries in the cached
vnode attributes in the nfsnode, since nfsnode attribute cache data is
also protected by the vnode lock.
Approved by: re (jhb)
Pointed out by: VFS_DEBUG_LOCKS
Instead, use the generic vaccess() operation to determine whether
an operation is permitted. This avoids embedding knowledge on
vnode permission bits such as VAPPEND in the NFS client.
PR: kern/46515
vaccess() patch submitted by: "Peter Edwards" <pmedwards@eircom.net>
Approved by: tjr, roberto (mentor)
Remove extraneous uses of vop_null, instead defering to the default op.
Rename vnode type "vfs" to the more descriptive "syncer".
Fix formatting for various filesystems that use vop_print.
an if clause was true. Break the two clauses out into seperate statements
since they require different actions.
Reported/Tested by: jake
Spotted by: jhb
- Remove the buftimelock mutex and acquire the buf's interlock to protect
these fields instead.
- Hold the vnode interlock while locking bufs on the clean/dirty queues.
This reduces some cases from one BUF_LOCK with a LK_NOWAIT and another
BUF_LOCK with a LK_TIMEFAIL to a single lock.
Reviewed by: arch, mckusick