freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-25 11:37:56 +00:00

Author	SHA1	Message	Date
Kirk McKusick	e04a020067	Unconditionally reset vp->v_vnlock back to the default in the vclean() function (e.g., vp->v_vnlock = &vp->v_lock) rather than requiring filesystems that use alternate locks to do so in their vop_reclaim functions. This change is a further cleanup of the vop_stdlock interface. Submitted by: Poul-Henning Kamp <phk@critter.freebsd.dk> Sponsored by: DARPA & NAI Labs.	2002-10-14 19:44:51 +00:00
Kirk McKusick	a5b65058d5	Regularize the vop_stdlock'ing protocol across all the filesystems that use it. Specifically, vop_stdlock uses the lock pointed to by vp->v_vnlock. By default, getnewvnode sets up vp->v_vnlock to reference vp->v_lock. Filesystems that wish to use the default do not need to allocate a lock at the front of their node structure (as some still did) or do a lockinit. They can simply start using vn_lock/VOP_UNLOCK. Filesystems that wish to manage their own locks, but still use the vop_stdlock functions (such as nullfs) can simply replace vp->v_vnlock with a pointer to the lock that they wish to have used for the vnode. Such filesystems are responsible for setting the vp->v_vnlock back to the default in their vop_reclaim routine (e.g., vp->v_vnlock = &vp->v_lock). In theory, this set of changes cleans up the existing filesystem lock interface and should have no function change to the existing locking scheme. Sponsored by: DARPA & NAI Labs.	2002-10-14 03:20:36 +00:00
Kirk McKusick	192e439ed4	When considering a vnode for reuse in getnewvnode, we call vcanrecycle to check a free vnode's availability. If it is available, vcanrecycle returns an error code of zero and the vnode in question locked. The getnewvnode routine then used to call vn_start_write with the V_NOWAIT flag. If the filesystem was suspended while taking a snapshot, the vn_start_write would fail but getnewvnode would fail to unlock the vnode, instead leaving it locked on the freelist. The result would be that the vnode would be locked forever and would eventually hang the system with a race to the root when it was attempted to recycle it. This fix moves the vn_start_write check into vcanrecycle where it will properly unlock the vnode if it is unavailable for recycling due to filesystem suspension. Sponsored by: DARPA & NAI Labs.	2002-10-11 01:04:14 +00:00
Maxim Sobolev	790a8088d0	Fix problem introduced in rev.1.406, which can cause already unlocked mutex being unlocked again causing system panic.	2002-10-05 12:56:10 +00:00
Poul-Henning Kamp	8d3574c7a4	Fix some harmless mis-indents. Spotted by: FlexeLint	2002-10-01 15:48:31 +00:00
Robert Watson	0626774f08	Move vnode MAC label initialization to after the release of the vnode interlock in getnewvnode() to avoid possible sleeps while holding the mutex. Note that the warning from Witness is a slight false positive since we know there will be no contention on the interlock since we haven't made the vnode available for use yet, but the theory is not a bad one. Obtained from: TrustedBSD Project Sponsored by: DARPA, Network Associates Laboratories	2002-09-30 20:51:48 +00:00
Poul-Henning Kamp	37c841831f	Be consistent about "static" functions: if the function is marked static in its prototype, mark it static at the definition too. Inspired by: FlexeLint warning #512	2002-09-28 17:15:38 +00:00
Jeff Roberson	6423c9433c	- Move ASSERT_VOP_LOCK functionality into functions in vfs_subr.c - Make the VI asserts more orthogonal to the rest of the asserts by using a new, common vfs_badlock() function and adding a 'str' arg. - Adjust generated ASSERTS to match the new prototype. - Adjust explicit ASSERTS to match the new prototype.	2002-09-26 04:48:44 +00:00
Jeff Roberson	6cb8bf2027	- Lock down the syncer with sync_mtx. - Enable vfs_badlock_mutex by default. - Assert that the vp is locked in VOP_UNLOCK. - Use standard interlock macros in remaining code. - Correct a race in getnewvnode(). - Lock access to v_numoutput with interlock. - Lock access to buf lists and splay tree with interlock. - Add VOP and VI asserts. - Lock b_vnbufs with the vnode interlock. - Add vrefcnt() for callers who want to retreive the vnode ref without holding a lock. Add a comment that describes when this is safe. - Add vholdl() and vdropl() so that callers who already own the interlock can avoid race conditions and unnecessary unlocking. - Move the VOP_GETATTR() in vflush() into the WRITECLOSE conditional case. - Hold the interlock before droping the mntlist_mtx in vflush() to avoid a race. - Fix locking in vfs_msync().	2002-09-25 02:22:21 +00:00
Nate Lawson	86ed6d45ac	Remove any VOP_PRINT that redundantly prints the tag. Move lockmgr_printinfo() into vprint() for everyone's benefit. Suggested by: bde	2002-09-18 20:42:04 +00:00
Nate Lawson	06be2aaa83	Remove all use of vnode->v_tag, replacing with appropriate substitutes. v_tag is now const char * and should only be used for debugging. Additionally: 1. All users of VT_NTS now check vfsconf->vf_type VFCF_NETWORK 2. The user of VT_PROCFS now checks for the new flag VV_PROCDEP, which is propagated by pseudofs to all child vnodes if the fs sets PFS_PROCDEP. Suggested by: phk Reviewed by: bde, rwatson (earlier version)	2002-09-14 09:02:28 +00:00
Julian Elischer	85e40eaf26	Indentation does not make a block.. need curly braces too. Submitted by: Eagle-eyes evans <bde@freebsd.org>	2002-09-11 18:15:26 +00:00
Julian Elischer	71fad9fdee	Completely redo thread states. Reviewed by: davidxu@freebsd.org	2002-09-11 08:13:56 +00:00
Poul-Henning Kamp	f8b663614d	Fix an inherited style bug: compare with NOCRED instead of NULL. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:46:19 +00:00
Poul-Henning Kamp	c1a925a637	Introduce new extattr_check_cred() function which implements the canonical crential washing for extended attributes. Sponsored by: DARPA & NAI Labs.	2002-09-05 20:38:57 +00:00
Philippe Charnier	93b0017f88	Replace various spelling with FALLTHROUGH which is lint()able	2002-08-25 13:23:09 +00:00
Jeff Roberson	ad32f726db	- Fix a mistake in my last few commits. The PDROP flag stops msleep from re-acquiring the mutex. Pointy hat to: me Noticed by: tegge	2002-08-23 00:32:03 +00:00
Jeff Roberson	9abf54f032	- Make vn_lock() vget() and VOP_LOCK() all behave the same way WRT LK_INTERLOCK. The interlock will never be held on return from these functions even when there is an error. Errors typically only occur when the XLOCK is held which means this isn't the vnode we want anyway. Almost all users of these interfaces expected this behavior even though it was not provided before.	2002-08-22 07:44:45 +00:00
Jeff Roberson	183158485a	- Fix interlock handling in vn_lock(). Previously, vn_lock() could return with interlock held in error conditions when the caller did not specify LK_INTERLOCK. - Add several comments to vn_lock() describing the rational behind the code flow since it was not immediately obvious.	2002-08-22 06:51:06 +00:00
Jeff Roberson	0b600db425	- Document two cases, one in vget and the other in vn_lock, where the state of interlock on exit is not consistent. There are probably several bugs relating to this.	2002-08-21 08:34:48 +00:00
Jeff Roberson	88cf6b94bd	- If vn_lock fails with the LK_INTERLOCK flag set, interlock will not be released. vcanrecycle() failed to unlock interlock under this condition. - Remove an extra VOP_UNLOCK from a failure case in vcanrecycle(). Pointed out by: rwatson	2002-08-21 06:40:34 +00:00
Jeff Roberson	71ea4ba57c	- Add two new debugging macros: ASSERT_VI_LOCKED and ASSERT_VI_UNLOCKED - Use the new VI asserts in place of the old mtx_assert checks. - Add the VI asserts to the automated lock checking in the VOP calls. The interlock should not be held across vops with a few exceptions. - Add the vop_(un)lock_{pre,post} functions to assert that interlock is held when LK_INTERLOCK is set.	2002-08-21 06:19:29 +00:00
Jeff Roberson	055c012332	- Extend the vnode_free_list_mtx to cover numvnodes and freevnodes. This was done only some of the time before, and now it is uniformly applied.	2002-08-13 05:29:48 +00:00
Maxime Henrion	5965373e69	- Introduce a new struct xvfsconf, the userland version of struct vfsconf. - Make getvfsbyname() take a struct xvfsconf *. - Convert several consumers of getvfsbyname() to use struct xvfsconf. - Correct the getvfsbyname.3 manpage. - Create a new vfs.conflist sysctl to dump all the struct xvfsconf in the kernel, and rewrite getvfsbyname() to use this instead of the weird existing API. - Convert some {set,get,end}vfsent() consumers to use the new vfs.conflist sysctl. - Convert a vfsload() call in nfsiod.c to kldload() and remove the useless vfsisloadable() and endvfsent() calls. - Add a warning printf() in vfs_sysctl() to tell people they are using an old userland. After these changes, it's possible to modify struct vfsconf without breaking the binary compatibility. Please note that these changes don't break this compatibility either. When bp will have updated mount_smbfs(8) with the patch I sent him, there will be no more consumers of the {set,get,end}vfsent(), vfsisloadable() and vfsload() API, and I will promptly delete it.	2002-08-10 20:19:04 +00:00
Jeff Roberson	8947be9ba0	- Move some logic from getnewvnode() to a new function vcanrecycle() - Unlock the free list mutex around vcanrecycle to prevent a lock order reversal.	2002-08-05 10:15:56 +00:00
Jeff Roberson	e6e370a7fe	- Replace v_flag with v_iflag and v_vflag - v_vflag is protected by the vnode lock and is used when synchronization with VOP calls is needed. - v_iflag is protected by interlock and is used for dealing with vnode management issues. These flags include X/O LOCK, FREE, DOOMED, etc. - All accesses to v_iflag and v_vflag have either been locked or marked with mp_fixme's. - Many ASSERT_VOP_LOCKED calls have been added where the locking was not clear. - Many functions in vfs_subr.c were restructured to provide for stronger locking. Idea stolen from: BSD/OS	2002-08-04 10:29:36 +00:00
Robert Watson	f9d0d52459	Include file cleanup; mac.h and malloc.h at one point had ordering relationship requirements, and no longer do. Reminded by: bde	2002-08-01 17:47:56 +00:00
Dag-Erling Smørgrav	3072197229	Nit in previous commit: the correct sysctl type is "S,xvnode"	2002-07-31 12:25:28 +00:00
Dag-Erling Smørgrav	217b2a0b61	Initialize v_cachedid to -1 in getnewvnode(). Reintroduce the kern.vnode sysctl and make it export xvnodes rather than vnodes. Sponsored by: DARPA, NAI Labs	2002-07-31 12:24:35 +00:00
Robert Watson	07bdba7e2d	Note that the privilege indicating flag to vaccess() originally used by the process accounting system is now deprecated.	2002-07-31 02:05:12 +00:00
Robert Watson	a0ee6ed1c0	Introduce support for Mandatory Access Control and extensible kernel access control. Invoke the necessary MAC entry points to maintain labels on vnodes. In particular, initialize the label when the vnode is allocated or reused, and destroy the label when the vnode is going to be released, or reused. Wow, an object where there really is exactly one place where it's allocated, and one other where it's freed. Amazing. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-31 02:03:46 +00:00
Jeff Roberson	a562685f65	- Backout the patch made in revision 1.75 of vfs_mount.c. The vputs here were hiding the real problem of the missing unlock in sync_inactive. - Add the missing unlock in sync_inactive. Submitted by: iedowse	2002-07-29 06:26:55 +00:00
Don Lewis	5c38b6dbce	Wire the sysctl output buffer before grabbing any locks to prevent SYSCTL_OUT() from blocking while locks are held. This should only be done when it would be inconvenient to make a temporary copy of the data and defer calling SYSCTL_OUT() until after the locks are released.	2002-07-28 19:59:31 +00:00
Robert Watson	b02aac465d	Teach discretionary access control methods for files about VAPPEND and VALLPERM. Obtained from: TrustedBSD Project Sponsored by: DARPA, NAI Labs	2002-07-22 03:57:07 +00:00
Kirk McKusick	7aca6291e3	Add support to UFS2 to provide storage for extended attributes. As this code is not actually used by any of the existing interfaces, it seems unlikely to break anything (famous last words). The internal kernel interface to manipulate these attributes is invoked using two new IO_ flags: IO_NORMAL and IO_EXT. These flags may be specified in the ioflags word of VOP_READ, VOP_WRITE, and VOP_TRUNCATE. Specifying IO_NORMAL means that you want to do I/O to the normal data part of the file and IO_EXT means that you want to do I/O to the extended attributes part of the file. IO_NORMAL and IO_EXT are mutually exclusive for VOP_READ and VOP_WRITE, but may be specified individually or together in the case of VOP_TRUNCATE. For example, when removing a file, VOP_TRUNCATE is called with both IO_NORMAL and IO_EXT set. For backward compatibility, if neither IO_NORMAL nor IO_EXT is set, then IO_NORMAL is assumed. Note that the BA_ and IO_ flags have been `merged' so that they may both be used in the same flags word. This merger is possible by assigning the IO_ flags to the low sixteen bits and the BA_ flags the high sixteen bits. This works because the high sixteen bits of the IO_ word is reserved for read-ahead and help with write clustering so will never be used for flags. This merge lets us get away from code of the form: if (ioflags & IO_SYNC) flags \|= BA_SYNC; For the future, I have considered adding a new field to the vattr structure, va_extsize. This addition could then be exported through the stat structure to allow applications to find out the size of the extended attribute storage and also would provide a more standard interface for truncating them (via VOP_SETATTR rather than VOP_TRUNCATE). I am also contemplating adding a pathconf parameter (for concreteness, lets call it _PC_MAX_EXTSIZE) which would let an application determine the maximum size of the extended atribute storage. Sponsored by: DARPA & NAI Labs.	2002-07-19 07:29:39 +00:00
Kirk McKusick	fb36a3d847	Change utimes to set the file creation time (for filesystems that support creation times such as UFS2) to the value of the modification time if the value of the modification time is older than the current creation time. See utimes(2) for further details. Sponsored by: DARPA & NAI Labs.	2002-07-17 02:03:19 +00:00
Matthew Dillon	d331c5d43f	Replace the global buffer hash table with per-vnode splay trees using a methodology similar to the vm_map_entry splay and the VM splay that Alan Cox is working on. Extensive testing has appeared to have shown no increase in overhead. Disadvantages Dirties more cache lines during lookups. Not as fast as a hash table lookup (but still N log N and optimal when there is locality of reference). Advantages vnode->v_dirtyblkhd is now perfectly sorted, making fsync/sync/filesystem syncer operate more efficiently. I get to rip out all the old hacks (some of which were mine) that tried to keep the v_dirtyblkhd tailq sorted. The per-vnode splay tree should be easier to lock / SMPng pushdown on vnodes will be easier. This commit along with another that Alan is working on for the VM page global hash table will allow me to implement ranged fsync(), optimize server-side nfs commit rpcs, and implement partial syncs by the filesystem syncer (aka filesystem syncer would detect that someone is trying to get the vnode lock, remembers its place, and skip to the next vnode). Note that the buffer cache splay is somewhat more complex then other splays due to special handling of background bitmap writes (multiple buffers with the same lblkno in the same vnode), and B_INVAL discontinuities between the old hash table and the existence of the buffer on the v_cleanblkhd list. Suggested by: alc	2002-07-10 17:02:32 +00:00
Jeff Roberson	25b286d6db	- Use standard locking functions in syncer's opv - vput instead of vrele syncer vnodes in vfs_mount - Add vop_lookup_{pre,post} to verify locking in VOP_LOOKUP	2002-07-09 19:54:20 +00:00
Jeff Roberson	18c48f437f	- Don't hold the vn lock while calling VOP_CLOSE in vclean().	2002-07-07 06:38:22 +00:00
Jeff Roberson	bed75d4627	- BUF_REFCNT() seems to be the preferred method for verifying a locked buf. Tell vop_strategy_pre() to use this instead. - Ignore B_CLUSTER bufs. Their components are locked but they don't really exist so they don't have to be. This isn't ideal but it is safe.	2002-07-07 05:29:45 +00:00
Jeff Roberson	c031d11bb4	Fix a mistake in my last commit. Don't grab an extra reference to the object in bp->b_object.	2002-07-06 21:27:20 +00:00
Jeff Roberson	9a236af3ad	Fixup uses of GETVOBJECT. - Cache a pointer to the vnode's object in the buf. - Hold a reference to that object in addition to the vnode's reference just to be consistent. - Cleanup code that got the object indirectly through the vp and VOP calls. This fixes at least one case where we were calling GETVOBJECT without a lock. It also avoids an expensive layered call at the cost of another pointer in struct buf.	2002-07-06 08:59:52 +00:00
Jeff Roberson	302c7aaab9	- Add vop_strategy_pre to validate VOP_STRATEGY locking. - Disable original vop_strategy lock specification. - Switch to the new vop_strategy_pre for lock validation. VOP_STRATEGY requires only that the buf is locked UNLESS the block numbers need to be translated. There may be other reasons, but as long as the underlying layer uses a VOP to perform the operations they will be caught later.	2002-07-06 05:21:12 +00:00
Jeff Roberson	cc8662b0f9	Add "vop_rename_pre" to do pre rename lock verification. This is enabled only with DEBUG_VFS_LOCKS.	2002-07-06 04:39:48 +00:00
Maxime Henrion	d7f9ecc86b	Move vfs_rootmountalloc() in vfs_mount.c and remove lite2_vfs_mountroot() which was #if 0'd and is not likely to be used now.	2002-07-03 09:27:24 +00:00
Maxime Henrion	2b4edb69f1	Move every code related to mount(2) in a new file, vfs_mount.c. The file vfs_conf.c which was dealing with root mounting has been repo-copied into vfs_mount.c to preserve history. This makes nmount related development easier, and help reducing the size of vfs_syscalls.c, which is still an enormous file. Reviewed by: rwatson Repo-copy by: peter	2002-07-02 17:09:22 +00:00
Ian Dowse	6bd521df93	Use indirect function pointer hooks instead of #ifdef SOFTUPDATES direct calls for the two places where the kernel calls into soft updates code. Set up the hooks in softdep_initialize() and NULL them out in softdep_uninitialize(). This change allows soft updates to function correctly when ufs is loaded as a module. Reviewed by: mckusick	2002-07-01 17:59:40 +00:00
David E. O'Brien	87e1503e2c	Rename the db command lockedvnodes to lockedvnods so that it fits on the help screen and one doens't think we have a lockedvnodesmap command.	2002-06-29 04:45:09 +00:00
Alfred Perlstein	210a5a7169	nuke caddr_t.	2002-06-28 23:17:36 +00:00
Jeff Roberson	90769c9ed0	Improve the VOP locking asserts - Add vfs_badlock_print to control whether or not we print lock violations - Add vfs_badlock_panic to control whether we panic on lock violations Both default to on to mimic the original behavior if DEBUG_VFS_LOCKS is on.	2002-06-28 20:58:14 +00:00
Brian Feldman	aac12bcfbc	Fix a case where a vnode got explicitly unlocked after the pointer to it got set to NULL. Revision 1.355: in the box	2002-06-28 16:17:47 +00:00
Maxime Henrion	7d2d440991	Change the way we internally store the mount options to a linked list. This is to allow the merging of the mount options in the MNT_UPDATE case, as the current data structure is unsuitable for this. There are no functional differences in this commit. Reviewed by: phk	2002-06-20 20:03:42 +00:00
Maxime Henrion	fe93750656	Change vfs_copyopt() so that the length argument passed to it must be the exact same size as the mount option. This makes vfs_copyopt() much more useful.	2002-06-14 20:04:21 +00:00
Dag-Erling Smørgrav	edad3af28d	Move some sysctls from the debug tree to the vfs tree.	2002-06-06 15:50:22 +00:00
Dag-Erling Smørgrav	4a357a32e0	Gratuitous whitespace cleanup.	2002-06-06 15:46:38 +00:00
Tom Rhodes	d394511de3	More s/file system/filesystem/g	2002-05-16 21:28:32 +00:00
Maxime Henrion	34e53231d0	o Fix vfs_copyopt(), the first argument to bcopy() is the source, not the destination. o Remove some code from vfs_getopt() which was making the interface more complicated to use for a very slight gain.	2002-05-16 17:09:41 +00:00
Jeff Roberson	f0d73b3e5f	Switch from just holding the interlock to holding the standard lock throughout getnewvnode(). This is safer. In the future, we should investigate requiring only the interlock to get the vnode object.	2002-05-07 02:44:06 +00:00
Jeff Roberson	6953f5da1a	Hold the currently selected vnode's lock across the call to VOP_GETVOBJECT. Don't try to create a vm object before the file system has a chance to finish initializing it. This is incorrect for a number of reasons. Firstly, that VOP requires a lock which the file system may not have initialized yet. Also, open and others will create a vm object if it is necessary later.	2002-05-06 04:47:43 +00:00
Poul-Henning Kamp	81e017430a	Expand the one-line function pbreassignbuf() the only place it is or could be used.	2002-05-05 20:37:08 +00:00
Matthew Dillon	9f9435545b	Remove obsolete code (that was already #if 0'd out). Requested by: Hiten Pandya <hitmaster2k@yahoo.com>	2002-05-04 17:10:15 +00:00
John Baldwin	6008862bc2	Change callers of mtx_init() to pass in an appropriate lock type name. In most cases NULL is passed, but in some cases such as network driver locks (which use the MTX_NETWORK_LOCK macro) and UMA zone locks, a name is used. Tested on: i386, alpha, sparc64	2002-04-04 21:03:38 +00:00
John Baldwin	44731cab3b	Change the suser() API to take advantage of td_ucred as well as do a general cleanup of the API. The entire API now consists of two functions similar to the pre-KSE API. The suser() function takes a thread pointer as its only argument. The td_ucred member of this thread must be valid so the only valid thread pointers are curthread and a few kernel threads such as thread0. The suser_cred() function takes a pointer to a struct ucred as its first argument and an integer flag as its second argument. The flag is currently only used for the PRISON_ROOT flag. Discussed on: smp@	2002-04-01 21:31:13 +00:00
Maxime Henrion	17594b936b	As discussed in -arch, add the new nmount(2) system call and the new vfs_getopt()/vfs_copyopt() API. This is intended to be used later, when there will be filesystems implementing the VFS_NMOUNT operation. The mount(2) system call will disappear when all filesystems will be converted to the new API. Documentation will be committed in a while. Reviewed by: phk	2002-03-26 15:33:44 +00:00
Jeff Roberson	c897b81311	Remove references to vm_zone.h and switch over to the new uma API. Also, remove maxsockets. If you look carefully you'll notice that the old zone allocator never honored this anyway.	2002-03-20 04:09:59 +00:00
Alfred Perlstein	4d77a549fe	Remove __P.	2002-03-19 21:25:46 +00:00
Robert Watson	89e1164ee2	Three p_ucred -> td_ucred's missed in jhb's earlier pass; all appear to be safe.	2002-03-05 19:45:45 +00:00
John Baldwin	a854ed9893	Simple p_ucred -> td_ucred changes to start using the per-thread ucred reference.	2002-02-27 18:32:23 +00:00
Poul-Henning Kamp	68edc1b939	Make v_addpollinfo() visible and non-inline. Have callers only call it as needed. Add necessary call in ufs_kqfilter(). Test-case found by: Andrew Gallatin <gallatin@cs.duke.edu>	2002-02-18 16:18:02 +00:00
Poul-Henning Kamp	90737495aa	Remove yet a redundant VN_KNOTE() macro.	2002-02-18 08:24:48 +00:00
Poul-Henning Kamp	4b55dbe36b	Move the stuff related to select and poll out of struct vnode. The use of the zone allocator may or may not be overkill. There is an XXX: over in ufs/ufs/ufs_vnops.c that jlemon may need to revisit. This shaves about 60 bytes of struct vnode which on my laptop means 600k less RAM used for vnodes.	2002-02-17 21:15:36 +00:00
Peter Wemm	2b8a08af6b	Fix a couple of style bugs introduced (or touched by) previous commit.	2002-02-07 23:06:26 +00:00
Julian Elischer	079b7badea	Pre-KSE/M3 commit. this is a low-functionality change that changes the kernel to access the main thread of a process via the linked list of threads rather than assuming that it is embedded in the process. It IS still embeded there but remove all teh code that assumes that in preparation for the next commit which will actually move it out. Reviewed by: peter@freebsd.org, gallatin@cs.duke.edu, benno rice,	2002-02-07 20:58:47 +00:00
Kirk McKusick	64011154e5	In the routines vrele() and vput(), we must lock the vnode and call VOP_INACTIVE before placing the vnode back on the free list. Otherwise there is a race condition on SMP machines between getnewvnode() locking the vnode to reclaim it and vrele() locking the vnode to inactivate it. This window of vulnerability becomes exaggerated in the presence of filesystems that have been suspended as the inactive routine may need to temporarily release the lock on the vnode to avoid deadlock with the syncer process.	2002-02-02 01:49:18 +00:00
Matthew Dillon	c73df808a0	Remove 'VXLOCK: interlock avoided' warnings. This can now occur in normal operation. The vgonel() code has always called vclean() but until we started proactively freeing vnodes it would never actually be called with a dirty vnode, so this situation did not occur prior to the vnlru() code. Now that we proactively free vnodes when kern.maxvnodes is hit, however, vclean() winds up with work to do and improperly generates the warnings. Reviewed by: peter Approved by: re (for MFC) MFC after: 1 day	2002-01-19 02:14:45 +00:00
Kirk McKusick	cd6005961f	When downgrading a filesystem from read-write to read-only, operations involving file removal or file update were not always being fully committed to disk. The result was lost files or corrupted file data. This change ensures that the filesystem is properly synced to disk before the filesystem is down-graded. This delta also fixes a long standing bug in which a file open for reading has been unlinked. When the last open reference to the file is closed, the inode is reclaimed by the filesystem. Previously, if the filesystem had been down-graded to read-only, the inode could not be reclaimed, and thus was lost and had to be later recovered by fsck. With this change, such files are found at the time of the down-grade. Normally they will result in the filesystem down-grade failing with `device busy'. If a forcible down-grade is done, then the affected files will be revoked causing the inode to be released and the open file descriptors to begin failing on attempts to read. Submitted by: "Sam Leffler" <sam@errno.com>	2002-01-15 07:17:12 +00:00
Matthew Dillon	e61ab5fce9	Add vlruvp() routine - implements LRU operation for vnode recycling. We calculate a trigger point that both guarentees we will find a sufficient number of vnodes to recycle and prevents us from recycling vnodes with lots of resident pages. This particular section of code is designed to recycle vnodes, not do unnecessary frees of cached VM pages.	2002-01-10 18:31:53 +00:00
Matthew Dillon	9dd4281db8	Fix type-o in previous commit (tsleep was using wrong rendezvous point)	2001-12-25 01:23:25 +00:00
Matthew Dillon	23b590188f	Fix a BUF_TIMELOCK race against BUF_LOCK and fix a deadlock in vget() against VM_WAIT in the pageout code. Both fixes involve adjusting the lockmgr's timeout capability so locks obtained with timeouts do not interfere with locks obtained without a timeout. Hopefully MFC: before the 4.5 release	2001-12-20 22:42:27 +00:00
Peter Wemm	9f2f52d695	Do not initialize static/global variables to 0. Use bss instead of taking up space in the data section.	2001-12-19 01:35:18 +00:00
Peter Wemm	8f0d41d324	Use a different mechanism to get the vnlru process to wake up and notice the shutdown request at reboot/halt time. Disable the printf 'vnlru process getting nowhere, pausing...' and instead export the count to the debug.vnlru_nowhere sysctl.	2001-12-19 01:31:12 +00:00
Matthew Dillon	fdb33f08ef	This is a forward port of Peter's vlrureclaim() fix, with some minor mods by me to make it more efficient. The original code had serious balancing problems and could also deadlock easily. This code relegates the vnode reclamation to its own kproc and relaxes the vnode reclamation requirements to better maintain kern.maxvnodes. This code still doesn't balance as well as it could, but it does a much better job then the original code. Approved by: re@freebsd.org Obtained from: ps, peter, dillon MFS Assuming: Assuming no problems crop up in Yahoo testing MFC after: 7 days	2001-12-18 20:48:54 +00:00
Matthew Dillon	873a490449	A slightly different version of the vlrureclaim fix. Reported by: peter, ps	2001-12-14 07:18:31 +00:00
Peter Wemm	9446b36bab	If we were called to allocate a vnode that is not associated with a mount point, do not dereference the NULL mp argument.	2001-12-13 23:46:01 +00:00
Matthew Dillon	6b8bd2efc1	Add mnt_reservedvnlist so we can MFC to 4.x, in order to make all mount structure changes now rather then piecemeal later on. mnt_nvnodelist currently holds all the vnodes under the mount point. This will eventually be split into a 'dirty' and 'clean' list. This way we only break kld's once rather then twice. nvnodelist will eventually turn into the dirty list and should remain compatible with the klds.	2001-11-04 18:55:42 +00:00
Robert Watson	bcc0dc3dc7	Merge from POSIX.1e Capabilities development tree: o POSIX.1e capabilities authorize overriding of VEXEC for VDIR based on CAP_DAC_READ_SEARCH, but of !VDIR based on CAP_DAC_EXECUTE. Add appropriate conditionals to vaccess() to take that into account. o Synchronization cap_check_xxx() -> cap_check() change. Obtained from: TrustedBSD Project	2001-11-02 15:16:59 +00:00
Matthew Dillon	4ffa210b94	syncdelay, filedelay, dirdelay, metadelay are ints, not time_t's, and can also be made static.	2001-10-27 19:58:56 +00:00
Matthew Dillon	245df27cee	Implement kern.maxvnodes. adjusting kern.maxvnodes now actually has a real effect. Optimize vfs_msync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. Improves looping case by 500%. Optimize ffs_sync(). Avoid having to continually drop and re-obtain mutexes when scanning the vnode list. This makes a couple of assumptions, which I believe are ok, in regards to vnode stability when the mount list mutex is held. Improves looping case by 500%. (more optimization work is needed on top of these fixes) MFC after: 1 week	2001-10-26 00:08:05 +00:00
Matthew Dillon	f92dcd3e4a	Add missing TAILQ_INSERT_TAIL's which somehow didn't get comitted with the recent vnode cleanup.	2001-10-25 23:13:56 +00:00
Matthew Dillon	c72ccd014d	Change the vnode list under the mount point from a LIST to a TAILQ in preparation for an implementation of limiting code for kern.maxvnodes. MFC after: 3 days	2001-10-23 01:21:29 +00:00
Matthew Dillon	2210e5d9fa	fix minor bug in kern.minvnodes sysctl. Use OID_AUTO.	2001-10-16 23:08:09 +00:00
Matthew Dillon	917efbaaba	WS Cleanup	2001-10-08 19:51:13 +00:00
Matthew Dillon	845bd795c9	vinvalbuf() was only waiting for write-I/O to complete. It really has to wait for both read AND write I/O to complete. Only NFS calls vinvalbuf() on an active vnode (when the server indicates that the file is stale), so this bug fix only effects NFS clients. MFC after: 3 days	2001-10-05 20:10:32 +00:00
Matthew Dillon	b5810bab2d	After extensive testing it has been determined that adding complexity to avoid removing higher level directory vnodes from the namecache has no perceivable effect and will be removed. This is especially true when vmiodirenable is turned on, which it is by default now. ( vmiodirenable makes a huge difference in directory caching ). The vfs.vmiodirenable and vfs.nameileafonly sysctls have been left in to allow further testing, but I expect to rip out vfs.nameileafonly soon too. I have also determined through testing that the real problem with numvnodes getting too large is due to the VM Page cache preventing the vnode from being reclaimed. The directory stuff made only a tiny dent relative to Poul's original code, enough so that some tests succeeded. But tests with several million small files show that the bigger problem is the VM Page cache. This will have to be addressed by a future commit. MFC after: 3 days	2001-10-01 04:33:35 +00:00
Julian Elischer	b40ce4165d	KSE Milestone 2 Note ALL MODULES MUST BE RECOMPILED make the kernel aware that there are smaller units of scheduling than the process. (but only allow one thread per process at this time). This is functionally equivalent to teh previousl -current except that there is a thread associated with each process. Sorry john! (your next MFC will be a doosie!) Reviewed by: peter@freebsd.org, dillon@freebsd.org X-MFC after: ha ha ha ha	2001-09-12 08:38:13 +00:00
Peter Wemm	0f7289022b	If a file has been completely unlinked, stop automatically syncing the file. ffs will discard any pending dirty pages when it is closed, so we may as well not waste time trying to clean them. This doesn't stop other things from writing it out, eg: pageout, fsync(2) etc.	2001-08-27 06:09:56 +00:00
Peter Wemm	b219758f94	Revert previous accidental commit. FWIW, it was part of enabling VM caching of disks through mmap() and stopping syncing of open files that had their last reference in the fs removed (ie: their unsync'ed pages get discarded on close already, so I made it stop syncing too).	2001-07-27 15:57:17 +00:00
Peter Wemm	24a590a074	Fix cut/paste blunder. Serves me right for doing a last minute tweak to what I had for some time. Submitted by: bde	2001-07-27 15:52:49 +00:00
Matthew Dillon	0cddd8f023	With Alfred's permission, remove vm_mtx in favor of a fine-grained approach (this commit is just the first stage). Also add various GIANT_ macros to formalize the removal of Giant, making it easy to test in a more piecemeal fashion. These macros will allow us to test fine-grained locks to a degree before removing Giant, and also after, and to remove Giant in a piecemeal fashion via sysctl's on those subsystems which the authors believe can operate without Giant.	2001-07-04 16:20:28 +00:00
John Baldwin	cd2f721557	- Fix a mntvnode and vnode interlock reversal. - Protect the mnt_vnode list with the mntvnode lock.	2001-06-28 04:05:54 +00:00
Alfred Perlstein	2395531439	Introduce a global lock for the vm subsystem (vm_mtx). vm_mtx does not recurse and is required for most low level vm operations. faults can not be taken without holding Giant. Memory subsystems can now call the base page allocators safely. Almost all atomic ops were removed as they are covered under the vm mutex. Alpha and ia64 now need to catch up to i386's trap handlers. FFS and NFS have been tested, other filesystems will need minor changes (grabbing the vm lock when twiddling page properties). Reviewed (partially) by: jake, jhb	2001-05-19 01:28:09 +00:00
Ian Dowse	0864ef1e8a	Change the second argument of vflush() to an integer that specifies the number of references on the filesystem root vnode to be both expected and released. Many filesystems hold an extra reference on the filesystem root vnode, which must be accounted for when determining if the filesystem is busy and then released if it isn't busy. The old `skipvp' approach required individual filesystem xxx_unmount functions to re-implement much of vflush()'s logic to deal with the root vnode. All 9 filesystems that hold an extra reference on the root vnode got the logic wrong in the case of forced unmounts, so `umount -f' would always fail if there were any extra root vnode references. Fix this issue centrally in vflush(), now that we can. This commit also fixes a vnode reference leak in devfs, which could result in idle devfs filesystems that refuse to unmount. Reviewed by: phk, bp	2001-05-16 18:04:37 +00:00
Ian Dowse	1feb7a6efa	In vrele() and vput(), avoid triggering the confusing "missed vn_close" KASSERT when vp->v_usecount is zero or negative. In this case, the "v*: negative ref cnt" panic that follows is much more appropriate. Reviewed by: mckusick	2001-05-11 20:42:41 +00:00
Poul-Henning Kamp	8ee8b21b48	vfs_subr.c is getting rather fat. The underlying repocopy and this commit moves the filesystem export handling code to vfs_export.c	2001-04-26 20:47:14 +00:00
Poul-Henning Kamp	a13234bb35	Move the netexport structure from the fs-specific mountstructure to struct mount. This makes the "struct netexport *" paramter to the vfs_export and vfs_checkexport interface unneeded. Consequently that all non-stacking filesystems can use vfs_stdcheckexp(). At the same time, make it a pointer to a struct netexport in struct mount, so that we can remove the bogus AF_MAX and #include <net/radix.h> from <sys/mount.h>	2001-04-25 07:07:52 +00:00
Greg Lehey	d98dc34f52	Correct #includes to work with fixed sys/mount.h.	2001-04-23 09:05:15 +00:00
Seigo Tanimura	759cb26335	Reclaim directory vnodes held in namecache if few free vnodes are available. Only directory vnodes holding no child directory vnodes held in v_cache_src are recycled, so that directory vnodes near the root of the filesystem hierarchy remain in namecache and directory vnodes are not reclaimed in cascade. The period of vnode reclaiming attempt and the number of vnodes attempted to reclaim can be tuned via sysctl(2). Suggested by: tegge Approved by: phk	2001-04-18 11:19:50 +00:00
Poul-Henning Kamp	f84e29a06c	This patch removes the VOP_BWRITE() vector. VOP_BWRITE() was a hack which made it possible for NFS client side to use struct buf with non-bio backing. This patch takes a more general approach and adds a bp->b_op vector where more methods can be added. The success of this patch depends on bp->b_op being initialized all relevant places for some value of "relevant" which is not easy to determine. For now the buffers have grown a b_magic element which will make such issues a tiny bit easier to debug.	2001-04-17 08:56:39 +00:00
Jonathan Lemon	7df2842dee	Add a NOTE_REVOKE flag for vnodes, which is triggered from within vclean(). Use this to tell a filter attached to a vnode that the underlying vnode is no longer valid, by returning EV_EOF. PR: kern/25309, kern/25206	2001-02-23 20:06:01 +00:00
Brian Feldman	c0511d3b58	Switch to using a struct xucred instead of a struct xucred when not actually in the kernel. This structure is a different size than what is currently in -CURRENT, but should hopefully be the last time any application breakage is caused there. As soon as any major inconveniences are removed, the definition of the in-kernel struct ucred should be conditionalized upon defined(_KERNEL). This also changes struct export_args to remove dependency on the constantly-changing struct ucred, as well as limiting the bounds of the size fields to the correct size. This means: a) mountd and friends won't break all the time, b) mountd and friends won't crash the kernel all the time if they don't know what they're doing wrt actual struct export_args layout. Reviewed by: bde	2001-02-18 13:30:20 +00:00
Bosko Milekic	9ed346bab0	Change and clean the mutex lock interface. mtx_enter(lock, type) becomes: mtx_lock(lock) for sleep locks (MTX_DEF-initialized locks) mtx_lock_spin(lock) for spin locks (MTX_SPIN-initialized) similarily, for releasing a lock, we now have: mtx_unlock(lock) for MTX_DEF and mtx_unlock_spin(lock) for MTX_SPIN. We change the caller interface for the two different types of locks because the semantics are entirely different for each case, and this makes it explicitly clear and, at the same time, it rids us of the extra `type' argument. The enter->lock and exit->unlock change has been made with the idea that we're "locking data" and not "entering locked code" in mind. Further, remove all additional "flags" previously passed to the lock acquire/release routines with the exception of two: MTX_QUIET and MTX_NOSWITCH The functionality of these flags is preserved and they can be passed to the lock/unlock routines by calling the corresponding wrappers: mtx_{lock, unlock}_flags(lock, flag(s)) and mtx_{lock, unlock}_spin_flags(lock, flag(s)) for MTX_DEF and MTX_SPIN locks, respectively. Re-inline some lock acq/rel code; in the sleep lock case, we only inline the _obtain_lock()s in order to ensure that the inlined code fits into a cache line. In the spin lock case, we inline recursion and actually only perform a function call if we need to spin. This change has been made with the idea that we generally tend to avoid spin locks and that also the spin locks that we do have and are heavily used (i.e. sched_lock) do recurse, and therefore in an effort to reduce function call overhead for some architectures (such as alpha), we inline recursion for this case. Create a new malloc type for the witness code and retire from using the M_DEV type. The new type is called M_WITNESS and is only declared if WITNESS is enabled. Begin cleaning up some machdep/mutex.h code - specifically updated the "optimized" inlined code in alpha/mutex.h and wrote MTX_LOCK_SPIN and MTX_UNLOCK_SPIN asm macros for the i386/mutex.h as we presently need those. Finally, caught up to the interface changes in all sys code. Contributors: jake, jhb, jasone (in no particular order)	2001-02-09 06:11:45 +00:00
Poul-Henning Kamp	fc2ffbe604	Mechanical change to use <sys/queue.h> macro API instead of fondling implementation details. Created with: sed(1) Reviewed by: md5(1)	2001-02-04 13:13:25 +00:00
Boris Popov	f3f1af390d	Properly lock new vnode. Reminded by: tegge	2001-01-31 04:54:23 +00:00
Jason Evans	1b367556b5	Convert all simplelocks to mutexes and remove the simplelock implementations.	2001-01-24 12:35:55 +00:00
Robert Watson	02b65ffb64	o The move to using VADMIN under vaccess() resulted in some system calls returning EACCES instead of EPERM. This patch modifies vaccess() to return EPERM instead of EACCES if VADMIN is among the requested rights. This affects functions normally limited to the owners of a file, such as chmod(), as EPERM is the error indicating that privilege would allow the operation, rather than a chance in mandatory or discretionary rights. Reported by: bde	2001-01-23 04:15:19 +00:00
John Baldwin	ffc831da27	Stick the kthread API in a kthread_* namespace, and the specialized kproc functions in a kproc_* namespace. Reviewed by: -arch	2000-12-15 20:08:20 +00:00
Kirk McKusick	0bf3b91d8a	Use proper mutex locking when calling setrunnable from speedup_syncer(). Submitted by: Tor.Egge@fast.no	2000-12-13 01:06:53 +00:00
David Malone	7cc0979fd6	Convert more malloc+bzero to malloc+M_ZERO. Submitted by: josh@zipperup.org Submitted by: Robert Drehmel <robd@gmx.net>	2000-12-08 21:51:06 +00:00
Peter Wemm	138e514cb5	Untangle vfsinit() a bit. Use seperate sysinit functions rather than having a super-function calling bits all over the place.	2000-12-06 07:09:08 +00:00
Andrew Gallatin	19f085228f	Correct int/long type mismatch in the proper place this time. freevnodes and numvnodes are longs in the kernel. They should remain longs in systat, what really needs to change is that they should be using SYSCTL_LONG rather than SYSCTL_INT. I also changed wantfreevnodes to SYSCTL_LONG because I happened to notice it. I wish there was a way to find all of these automatically.. Pointed out by: bde	2000-12-02 20:08:33 +00:00
John Baldwin	2191340786	Use msleep() instead of mtx_exit()/tsleep() so that we release the lock and go to sleep as an "atomic" operation.	2000-12-01 03:43:33 +00:00
Kirk McKusick	6d984dfa6a	Get rid of a bogus mtx_exit (it was attempting to release an already released mutex). Submitted by: "Chris Knight" <chris@aims.com.au>	2000-11-30 19:09:29 +00:00
Matthew Dillon	936524aa02	Implement a low-memory deadlock solution. Removed most of the hacks that were trying to deal with low-memory situations prior to now. The new code is based on the concept that I/O must be able to function in a low memory situation. All major modules related to I/O (except networking) have been adjusted to allow allocation out of the system reserve memory pool. These modules now detect a low memory situation but rather then block they instead continue to operate, then return resources to the memory pool instead of cache them or leave them wired. Code has been added to stall in a low-memory situation prior to a vnode being locked. Thus situations where a process blocks in a low-memory condition while holding a locked vnode have been reduced to near nothing. Not only will I/O continue to operate, but many prior deadlock conditions simply no longer exist. Implement a number of VFS/BIO fixes (found by Ian): in biodone(), bogus-page replacement code, the loop was not properly incrementing loop variables prior to a continue statement. We do not believe this code can be hit anyway but we aren't taking any chances. We'll turn the whole section into a panic (as it already is in brelse()) after the release is rolled. In biodone(), the foff calculation was incorrectly clamped to the iosize, causing the wrong foff to be calculated for pages in the case of an I/O error or biodone() called without initiating I/O. The problem always caused a panic before. Now it doesn't. The problem is mainly an issue with NFS. Fixed casts for ~PAGE_MASK. This code worked properly before only because the calculations use signed arithmatic. Better to properly extend PAGE_MASK first before inverting it for the 64 bit masking op. In brelse(), the bogus_page fixup code was improperly throwing away the original contents of 'm' when it did the j-loop to fix the bogus pages. The result was that it would potentially invalidate parts of the WRONG page(!), leading to corruption. There may still be cases where a background bitmap write is being duplicated, causing potential corruption. We have identified a potentially serious bug related to this but the fix is still TBD. So instead this patch contains a KASSERT to detect the problem and panic the machine rather then continue to corrupt the filesystem. The problem does not occur very often.. it is very hard to reproduce, and it may or may not be the cause of the corruption people have reported. Review by: (VFS/BIO: mckusick, Ian Dowse <iedowse@maths.tcd.ie>) Testing by: (VM/Deadlock) Paul Saab <ps@yahoo-inc.com>	2000-11-18 23:06:26 +00:00
Tor Egge	a2d1480cf8	Clear the VFREE flag when the vnode is removed from the free list in getnewvnode(). Otherwise routines called from VOP_INACTIVE() might attempt to remove the vnode from a free list the vnode isn't on, causing corruption. PR: 18012	2000-11-02 21:42:54 +00:00
Poul-Henning Kamp	1d7e3e42e7	Take VBLK devices further out of their missery. This should fix the panic I introduced in my previous commit on this topic.	2000-11-02 21:14:13 +00:00
John Baldwin	35e0e5b311	Catch up to moving headers: - machine/ipl.h -> sys/ipl.h - machine/mutex.h -> sys/mutex.h	2000-10-20 07:58:15 +00:00
Robert Watson	47460a23a0	o Introduce new VOP_ACCESS() flag VADMIN, allowing file systems to perform "administrative" authorization checks. In most cases, the VADMIN test checks to make sure the credential effective uid is the same as the file owner. o Modify vaccess() to set VADMIN as an available right if the uid is appropriate. o Modify references to uid-based access control operations such that they now always invoke VOP_ACCESS() instead of using hard-coded policy checks. o This allows alternative UFS policies to be implemented by replacing only ufs_access() (such as mandatory system policies). o VOP_ACCESS() requires the caller to hold an exclusive vnode lock on the vnode: I believe that new invocations of VOP_ACCESS() are always called with the lock held. o Some direct checks of the uid remain, largely associated with the QUOTA and SUIDDIR code. Reviewed by: eivind Obtained from: TrustedBSD Project	2000-10-19 07:53:59 +00:00
Eivind Eklund	7eb9fca557	Blow away the v_specmountpoint define, replacing it with what it was defined as (rdev->si_mountpoint)	2000-10-09 17:31:39 +00:00
Jason Evans	39df86086f	Do not call lockdestroy() for v_vnlock, which may point to a lock in a deeper vfs stacking layer. Submitted by: bp	2000-10-06 08:04:48 +00:00
Eivind Eklund	a863c0fb2f	Style fixes based on comments by bde	2000-10-05 18:22:46 +00:00
Jason Evans	a18b1f1d4d	Convert lockmgr locks from using simple locks to using mutexes. Add lockdestroy() and appropriate invocations, which corresponds to lockinit() and must be called to clean up after a lockmgr lock is no longer needed.	2000-10-04 01:29:17 +00:00
Boris Popov	f8be809e0f	Move KASSERTs which checks value of v_usecount after vnode locking, so it will not produce wrong alarms.	2000-10-02 09:57:06 +00:00
Kirk McKusick	02a1e48f02	Do the right thing if bdevvp is called twice for the same device. Obtained from: Poul-Henning Kamp <phk@freebsd.org>	2000-09-27 18:03:17 +00:00
Boris Popov	67e871664b	Add a lock structure to vnode structure. Previously it was either allocated separately (nfs, cd9660 etc) or keept as a first element of structure referenced by v_data pointer(ffs). Such organization leads to known problems with stacked filesystems. From this point vop_nolock() functions maintain only interlock lock. vop_stdlock() functions maintain built-in v_lock structure using lockmgr(). vop_sharedlock() is compatible with vop_stdunlock(), but maintains a shared lock on vnode. If filesystem wishes to export lockmgr compatible lock, it can put an address of this lock to v_vnlock field. This indicates that the upper filesystem can take advantage of it and use single lock structure for entire (or part) of stack of vnodes. This field shouldn't be examined or modified by VFS code except for initialization purposes. Reviewed in general by: mckusick	2000-09-25 15:24:04 +00:00
Eivind Eklund	453aaa0dff	Style fixes: * Add lots of comments * Convert a couple of assertions to KASSERT() * Minimal whitespace & misapplied {} fixes * Convert #if 0 to #if COMPILING_LINT for code we presently do not support, but want to keep available. Reviewed by: adrian, markm	2000-09-22 12:22:36 +00:00
Eivind Eklund	bba25953af	Staticize addalias()	2000-09-22 11:54:48 +00:00
Alfred Perlstein	21a9039725	comment vfs_export functions, requested by: eivind	2000-09-21 15:55:55 +00:00
Robert Watson	e084835893	o Add additional comment describing vaccess() behavior. Requested by: eivind Reviewed by: eivind, adrian	2000-09-20 17:18:12 +00:00
Poul-Henning Kamp	b0d17ba69e	Rename lminor() to dev2unit(). This function gives a linear unit number which hides the 'hole' in the minor bits. Introduce unit2minor() to do the reverse operation. Fix some some make_dev() calls which didn't use UID_* or GID_* macros. Kill the v_hashchain alias macro, it hides the real relationship. Introduce experimental SI_CHEAPCLONE flag set it on cloned bpfs.	2000-09-19 10:28:44 +00:00
Boris Popov	9ff5ce6baf	Add three new VOPs: VOP_CREATEVOBJECT, VOP_DESTROYVOBJECT and VOP_GETVOBJECT. They will be used by nullfs and other stacked filesystems to support full cache coherency. Reviewed in general by: mckusick, dillon	2000-09-12 09:49:08 +00:00
Jason Evans	0384fff8c5	Major update to the way synchronization is done in the kernel. Highlights include: * Mutual exclusion is used instead of spl(). See mutex(9). (Note: The alpha port is still in transition and currently uses both.) Per-CPU idle processes. * Interrupts are run in their own separate kernel threads and can be preempted (i386 only). Partially contributed by: BSDi (BSD/OS) Submissions by (at least): cp, dfr, dillon, grog, jake, jhb, sheldonh	2000-09-07 01:33:02 +00:00
Robert Watson	728783c27a	o Synchronize vaccess() capability access control checks with TrustedBSD tree. Obtained from: TrustedBSD Project	2000-09-06 12:18:24 +00:00
Poul-Henning Kamp	64dc16df4a	Move extern declaration of dead_vnodeop_p to a .h file. Remove race condition in vn_isdisk().	2000-09-05 21:09:56 +00:00
Robert Watson	012c643d3e	o Restructure vaccess() so as to check for DAC permission to modify the object before falling back on privilege. Make vaccess() accept an additional optional argument, privused, to determine whether privilege was required for vaccess() to return 0. Add commented out capability checks for reference. Rename some variables to make it more clear which modes/uids/etc are associated with the object, and which with the access mode. o Update file system use of vaccess() to pass NULL as the optional privused argument. Once additional patches are applied, suser() will no longer set ASU, so privused will permit passing of privilege information up the stack to the caller. Reviewed by: bde, green, phk, -security, others Obtained from: TrustedBSD Project	2000-08-29 14:45:49 +00:00
Poul-Henning Kamp	4fe6d43729	Fix typo in last commit.	2000-08-20 11:46:39 +00:00
Poul-Henning Kamp	e39c53eda5	Centralize the canonical vop_access user/group/other check in vaccess(). Discussed with: bde	2000-08-20 08:36:26 +00:00
Kirk McKusick	9b97113391	This patch corrects the first round of panics and hangs reported with the new snapshot code. Update addaliasu to correctly implement the semantics of the old checkalias function. When a device vnode first comes into existence, check to see if an anonymous vnode for the same device was created at boot time by bdevvp(). If so, adopt the bdevvp vnode rather than creating a new vnode for the device. This corrects a problem which caused the kernel to panic when taking a snapshot of the root filesystem. Change the calling convention of vn_write_suspend_wait() to be the same as vn_start_write(). Split out softdep_flushworklist() from softdep_flushfiles() so that it can be used to clear the work queue when suspending filesystem operations. Access to buffers becomes recursive so that snapshots can recursively traverse their indirect blocks using ffs_copyonwrite() when checking for the need for copy on write when flushing one of their own indirect blocks. This eliminates a deadlock between the syncer daemon and a process taking a snapshot. Ensure that softdep_process_worklist() can never block because of a snapshot being taken. This eliminates a problem with buffer starvation. Cleanup change in ffs_sync() which did not synchronously wait when MNT_WAIT was specified. The result was an unclean filesystem panic when doing forcible unmount with heavy filesystem I/O in progress. Return a zero'ed block when reading a block that was not in use at the time that a snapshot was taken. Normally, these blocks should never be read. However, the readahead code will occationally read them which can cause unexpected behavior. Clean up the debugging code that ensures that no blocks be written on a filesystem while it is suspended. Snapshots must explicitly label the blocks that they are writing during the suspension so that they do not cause a `write on suspended filesystem' panic. Reorganize ffs_copyonwrite() to eliminate a deadlock and also to prevent a race condition that would permit the same block to be copied twice. This change eliminates an unexpected soft updates inconsistency in fsck caused by the double allocation. Use bqrelse rather than brelse for buffers that will be needed soon again by the snapshot code. This improves snapshot performance.	2000-07-24 05:28:33 +00:00
Kirk McKusick	f2a2857bb3	Add snapshots to the fast filesystem. Most of the changes support the gating of system calls that cause modifications to the underlying filesystem. The gating can be enabled by any filesystem that needs to consistently suspend operations by adding the vop_stdgetwritemount to their set of vnops. Once gating is enabled, the function vfs_write_suspend stops all new write operations to a filesystem, allows any filesystem modifying system calls already in progress to complete, then sync's the filesystem to disk and returns. The function vfs_write_resume allows the suspended write operations to begin again. Gating is not added by default for all filesystems as for SMP systems it adds two extra locks to such critical kernel paths as the write system call. Thus, gating should only be added as needed. Details on the use and current status of snapshots in FFS can be found in /sys/ufs/ffs/README.snapshot so for brevity and timelyness is not included here. Unless and until you create a snapshot file, these changes should have no effect on your system (famous last words).	2000-07-11 22:07:57 +00:00
Boris Popov	3660ebc2c0	Fix support for more than 256 simultaneous mounts. Theoretical limit is 2^16 mounts per fs type. Reported by: Troy Arie Cobb <tcobb@staff.circle.net> via phk Reviewed by: bde	2000-07-07 14:01:08 +00:00
Poul-Henning Kamp	77978ab8bc	Previous commit changing SYSCTL_HANDLER_ARGS violated KNF. Pointed out by: bde	2000-07-04 11:25:35 +00:00
Kirk McKusick	c904bbbdd8	Simplify and rationalise the management of the vnode free list (preparing the code to add snapshots).	2000-07-04 04:32:40 +00:00
Kirk McKusick	3764219663	If a buffer flush fails when trying to reclaim a vnode, it is too late to save the vnode, so just toss any remaining unwritten buffers rather than leaving them lying around to make trouble in the future.	2000-07-04 03:23:29 +00:00
Poul-Henning Kamp	3275cf7379	Make the two calls from kern/* into softupdates #ifdef SOFTUPDATES, that is way cleaner than using the softupdates_stub stunt, which should be killed when convenient. Discussed with: mckusick	2000-07-03 13:26:54 +00:00
Poul-Henning Kamp	82d9ae4e32	Style police catches up with rev 1.26 of src/sys/sys/sysctl.h: Sanitize SYSCTL_HANDLER_ARGS so that simplistic tools can grog our sources: -sysctl_vm_zone SYSCTL_HANDLER_ARGS +sysctl_vm_zone (SYSCTL_HANDLER_ARGS)	2000-07-03 09:35:31 +00:00
Poul-Henning Kamp	a8b1f9d2c9	Move prtactive to vfs from ufs. It is used all over the place.	2000-06-27 07:46:22 +00:00
Poul-Henning Kamp	a2e7a027a7	Virtualizes & untangles the bioops operations vector. Ref: Message-ID: <18317.961014572@critter.freebsd.dk> To: current@	2000-06-16 08:48:51 +00:00
Jake Burkholder	e39756439c	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
Jake Burkholder	740a1973a6	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
Jeroen Ruigrok van der Werven	01f76720fb	Fix the rootmount code for now. This function will probably rewritten/renamed to devpp. Submitted by: Assar Westerlund <assar@sics.se> on -current Confirmed to work: Steinar Haug <sthaug@nethelp.no>, Manfred Antar <mantar@pacbell.net> Reviewed by: phk	2000-05-14 07:43:12 +00:00
Poul-Henning Kamp	9626b608de	Separate the struct bio related stuff out of <sys/buf.h> into <sys/bio.h>. <sys/bio.h> is now a prerequisite for <sys/buf.h> but it shall not be made a nested include according to bdes teachings on the subject of nested includes. Diskdrivers and similar stuff below specfs::strategy() should no longer need to include <sys/buf.> unless they need caching of data. Still a few bogus uses of struct buf to track down. Repocopy by: peter	2000-05-05 09:59:14 +00:00
Poul-Henning Kamp	b99c307a21	Rename the existing BUF_STRATEGY() to DEV_STRATEGY() substitute BUF_WRITE(foo) for VOP_BWRITE(foo->b_vp, foo) substitute BUF_STRATEGY(foo) for VOP_STRATEGY(foo->b_vp, foo) This patch is machine generated except for the ccd.c and buf.h parts.	2000-03-20 11:29:10 +00:00
Chris Costello	b081a64afb	In vn_isdisk(), check whether vp->v_rdev is NULL. If it is, then return ENXIO (Device not configured). Without this, vn_isdisk() could (and did in the case of lstat() under fdesc) pass a NULL pointer to devsw(), which caused a page fault. Reviewed by: alfred	2000-03-18 01:27:44 +00:00
Poul-Henning Kamp	db5f635acc	Eliminate the undocumented, experimental, non-delivering and highly dangerous MAX_PERF option.	2000-03-16 08:51:55 +00:00
Bruce Evans	05ecdd7037	Don't try so hard to make the lower 16 bits of fsids unique. It tended to recycle full fsids after only 16 mount/unmount's. This is probably too often for exported fsids. Now we recycle the full fsids only after 2^16 mount/ umount's and only ensure uniqueness in the lower 16 bits if there have been <= 256 calls to vfs_getnewfsid() since the system started.	2000-03-14 14:19:49 +00:00
Bruce Evans	61214975da	Try harder to make the lower 16 bits of fsids unique. The vfs type number was packed very wastefully, giving perfect non-uniqeness in the lower 16 bits of fsids for filesystems with the same vfs type. This made linux_stat() return perfectly non-unique (broken) 16-bit st_dev's for nfs mount points, and effectively reduced mntid_base to 8 bits so that the vfs_getnewfsid() looped endlessly when there are already 256 mounted filesystems with the required vfs type. Approved by: jkh	2000-03-12 14:23:21 +00:00
Søren Schmidt	e8359a57de	Do refcounting of open devices (more) correctly. count_dev funtion by phk.	2000-02-07 23:05:40 +00:00
Robert Watson	b7a5f3ca1b	Remove static qualifier from vgonel, as it is needed by the Arla folk outside of vfs_subr.c. Submitted by: Assar Westerlund <assar@sics.se> Reviewed by: rwatson Approved by: jkh	2000-02-02 07:07:17 +00:00
Robert Watson	9a2b8fca80	This patch fixes a locking bug that can result in deadlock if the codepath is followed. From the PR: vclean calls vrele leading to deadlock (if usecount > 0) vclean() calls vrele() if v_usecount of the node was higher than one. But before calling it, it sets the VXLOCK flag, which will make vn_lock called from vrele dead-lock. PR: kern/15117 Submitted by: Assar Westerlund <assar@stacken.kth.se> Reviewed by: rwatson Obtained from: NetBSD	2000-01-29 15:22:58 +00:00
Poul-Henning Kamp	ba4ad1fcea	Give vn_isdisk() a second argument where it can return a suitable errno. Suggested by: bde	2000-01-10 12:04:27 +00:00
Kirk McKusick	411e1480fd	Remove the P_BUFEXHAUST flag from the syncer process (leaving it only on the buf_daemon process). The problem is that when the syncer process starts running the worklist, it wants to delete lots of files. It does this by VFS_VGET'ing the vnodes, clearing the blocks in them and bdwrite'ing the buffer. It can process close to a thousand files per second which generates a large number of dirty buffers. So, giving it special priviledge at the buffer trough leads to trouble as the buf_daemon does occationally need a free buffer to proceed and if the syncer has used every last one up, we are toast.	2000-01-10 00:07:24 +00:00
Eivind Eklund	e12d97d239	Change NDFREE() from a macro to a function for the time being; the macro version caused intolerable bloat (30k). I'm likely to revisit this with an attempt at a smarter macro. Bloat noticed by: bde	2000-01-08 16:20:06 +00:00
Luoqi Chen	5e95083920	Introduce a mechanism to suspend/resume system processes. Suspend syncer and bufdaemon prior to disk sync during system shutdown.	2000-01-07 08:36:44 +00:00
Matthew Dillon	c37c9620cd	Enhance reassignbuf(). When a buffer cannot be time-optimally inserted into vnode dirtyblkhd we append it to the list instead of prepend it to the list in order to maintain a 'forward' locality of reference, which is arguably better then 'reverse'. The original algorithm did things this way to but at a huge time cost. Enhance the append interlock for NFS writes to handle intr/soft mounts better. Fix the hysteresis for NFS async daemon I/O requests to reduce the number of unnecessary context switches. Modify handling of NFS mount options. Any given user option that is too high now defaults to the kernel maximum for that option rather then the kernel default for that option. Reviewed by: Alfred Perlstein <bright@wintelcom.net>	2000-01-05 05:11:37 +00:00
Kirk McKusick	02b0085406	Prettyness police: Identify flags in b_xflags with BX_ to distinguish them from flags in b_flags which are prefixed with B_	1999-12-22 03:11:04 +00:00
Matthew Dillon	4f79d873c1	Add MAP_NOSYNC feature to mmap(), and MADV_NOSYNC and MADV_AUTOSYNC to madvise(). This feature prevents the update daemon from gratuitously flushing dirty pages associated with a mapped file-backed region of memory. The system pager will still page the memory as necessary and the VM system will still be fully coherent with the filesystem. Modifications made by other means to the same area of memory, for example by write(), are unaffected. The feature works on a page-granularity basis. MAP_NOSYNC allows one to use mmap() to share memory between processes without incuring any significant filesystem overhead, putting it in the same performance category as SysV Shared memory and anonymous memory. Reviewed by: julian, alc, dg	1999-12-12 03:19:33 +00:00
Eivind Eklund	6bdfe06ad9	Lock reporting and assertion changes. * lockstatus() and VOP_ISLOCKED() gets a new process argument and a new return value: LK_EXCLOTHER, when the lock is held exclusively by another process. * The ASSERT_VOP_(UN)LOCKED family is extended to use what this gives them * Extend the vnode_if.src format to allow more exact specification than locked/unlocked. This commit should not do any semantic changes unless you are using DEBUG_VFS_LOCKS. Discussed with: grog, mch, peter, phk Reviewed by: peter	1999-12-11 16:13:02 +00:00
Matthew Dillon	245efbba4d	Remove vfs_getrootfsid() function (a temporary hack added a few months ago to make BOOTP work again). It is no longer required by BOOTP and no longer used.	1999-11-29 22:25:36 +00:00
Poul-Henning Kamp	38224dcd59	Convert various pieces of code to use vn_isdisk() rather than checking for vp->v_type == VBLK. In ccd: we don't need to call VOP_GETATTR to find the type of a vnode. Reviewed by: sos	1999-11-22 10:33:55 +00:00
Poul-Henning Kamp	0429e37ade	struct mountlist and struct mount.mnt_list have no business being a CIRCLEQ. Change them to TAILQ_HEAD and TAILQ_ENTRY respectively. This removes ugly mp != (void*)&mountlist comparisons. Requested by: phk Submitted by: Jake Burkholder jake@checker.org PR: 14967	1999-11-20 10:00:46 +00:00
Poul-Henning Kamp	1b7277516b	Commit the remaining part of PR14914: Alot of the code in sys/kern directly accesses the Q_HEAD and Q_ENTRY structures for list operations. This patch makes all list operations in sys/kern use the queue(3) macros, rather than directly accessing the *Q_{HEAD,ENTRY} structures. Reviewed by: phk Submitted by: Jake Burkholder <jake@checker.org> PR: 14914	1999-11-16 16:28:58 +00:00
Poul-Henning Kamp	698f9cf828	Next step in the device cleanup process. Correctly lock vnodes when calling VOP_OPEN() from filesystem mount code. Unify spec_open() for bdev and cdev cases. Remove the disabled bdev specific read/write code.	1999-11-09 14:15:33 +00:00
Poul-Henning Kamp	923502ff91	useracc() the prequel: Merge the contents (less some trivial bordering the silly comments) of <vm/vm_prot.h> and <vm/vm_inherit.h> into <vm/vm.h>. This puts the #defines for the vm_inherit_t and vm_prot_t types next to their typedefs. This paves the road for the commit to follow shortly: change useracc() to use VM_PROT_{READ\|WRITE} rather than B_{READ\|WRITE} as argument.	1999-10-29 18:09:36 +00:00
Peter Wemm	d1f088dab5	Trim unused options (or #ifdef for undoc options). Submitted by: phk	1999-10-11 15:19:12 +00:00
Poul-Henning Kamp	aa4f4b695e	Move the buffered read/write code out of spec_{read\|write} and into two new functions spec_buf{read\|write}. Add sysctl vfs.bdev_buffered which defaults to 1 == true. This sysctl can be used to experimentally turn buffered behaviour for bdevs off. I should not be changed while any blockdevices are open. Remove the misplaced sysctl vfs.enable_userblk_io. No other changes in behaviour.	1999-10-04 11:23:10 +00:00
Poul-Henning Kamp	1b5464ef9d	Remove v_maxio from struct vnode. Replace it with mnt_iosize_max in struct mount. Nits from: bde	1999-09-29 20:05:33 +00:00
Matthew Dillon	40360b1bbb	Final commit to remove vnode->v_lastr. vm_fault now handles read clustering issues (replacing code that used to be in ufs/ufs/ufs_readwrite.c). vm_fault also now uses the new VM page counter inlines. This completes the changeover from vnode->v_lastr to vm_entry_t->v_lastr for VM, and fp->f_nextread and fp->f_seqcount (which have been in the tree for a while). Determination of the I/O strategy (sequential, random, and so forth) is now handled on a descriptor-by-descriptor basis for base I/O calls, and on a memory-region-by-memory-region and process-by-process basis for VM faults. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>	1999-09-21 00:36:16 +00:00
Poul-Henning Kamp	552f337f1f	Initialize vp->v_maxio to its default in getnetvnode() rather than four different places in vfs_cluster.c	1999-09-20 19:53:23 +00:00
Matthew Dillon	e6f7111170	Fix BOOTP root FS mounts. Also cleanup vfs_getnewfsid() and collapse addaliasu() into addalias() (no operational change) and clarify comments relating to a trick that vclean() uses. The fix to BOOTP is yet another hack. Actually, rootfsid handling is already a major hack. The whole thing needs to be cleaned up. Reviewed by: David Greenman <dg@root.com>, Alan Cox <alc@cs.rice.edu>	1999-09-19 06:24:21 +00:00
Matthew Dillon	bb01f28e97	Add vfs.enable_userblk_io sysctl to control whether user reads and writes to buffered block devices are allowed. The default is to be backwards compatible, i.e. reads and writes are allowed. The idea is for a larger crowd to start running with this disabled and see what problems, if any, crop up, and then to change the default to off and see if any problems crop up in the next 6 months prior to potentially removing support entirely. There are still a few people, Julian and myself included, who believe the buffered block device access from usermode to be useful. Remove use of vnode->v_lastr from buffered block device I/O in preparation for removal of vnode->v_lastr field, replacing it with the already existing seqcount metric to detect sequential operation. Reviewed by: Alan Cox <alc@cs.rice.edu>, David Greenman <dg@root.com>	1999-09-17 06:10:27 +00:00
Poul-Henning Kamp	d137accc89	Add dev_t freeing code. Controlled by sysctl debug.free_devt, default is off.	1999-08-29 09:09:12 +00:00
Poul-Henning Kamp	9626728875	remove unused variables.	1999-08-28 19:21:03 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Poul-Henning Kamp	dbafb3660f	Simplify the handling of VCHR and VBLK vnodes using the new dev_t: Make the alias list a SLIST. Drop the "fast recycling" optimization of vnodes (including the returning of a prexisting but stale vnode from checkalias). It doesn't buy us anything now that we don't hardlimit vnodes anymore. Rename checkalias2() and checkalias() to addalias() and addaliasu() - which takes dev_t and udev_t arg respectively. Make the revoke syscalls use vcount() instead of VALIASED. Remove VALIASED flag, we don't need it now and it is faster to traverse the much shorter lists than to maintain the flag. vfs_mountedon() can check the dev_t directly, all the vnodes point to the same one. Print the devicename in specfs/vprint(). Remove a couple of stale LFS vnode flags. Remove unimplemented/unused LK_DRAINED;	1999-08-26 14:53:31 +00:00
Poul-Henning Kamp	41d2e3e09e	Introduce vn_isdisk(struct vnode *vp) function, and use it to test for diskness.	1999-08-25 12:24:39 +00:00
Julian Elischer	0ff7b13acd	Make DEVFS use PHK's specinfo struct as the source of dev_t and devsw. In lookup() however it's the other way around as we need to supply the dev_t for the vnode, so devfs still has a copy of it stashed away. Sourcing it from the vnode in the vnops however is useful as it makes a lot of the code almost the same as that in specfs.	1999-08-25 04:55:20 +00:00
John Polstra	a2801b7731	Support full-precision file timestamps. Until now, only the seconds have been maintained, and that is still the default. A new sysctl variable "vfs.timestamp_precision" can be used to enable higher levels of precision: 0 = seconds only; nanoseconds zeroed (default). 1 = seconds and nanoseconds, accurate within 1/HZ. 2 = seconds and nanoseconds, truncated to microseconds. >=3 = seconds and nanoseconds, maximum precision. Level 1 uses getnanotime(), which is fast but can be wrong by up to 1/HZ. Level 2 uses microtime(). It might be desirable for consistency with utimes() and friends, which take timeval structures rather than timespecs. Level 3 uses nanotime() for the higest precision. I benchmarked levels 0, 1, and 3 by copying a 550 MB tree with "cpio -pdu". There was almost negligible difference in the system times -- much less than 1%, and less than the variation among multiple runs at the same level. Bruce Evans dreamed up a torture test involving 1-byte reads with intervening fstat() calls, but the cpio test seems more realistic to me. This feature is currently implemented only for the UFS (FFS and MFS) filesystems. But I think it should be easy to support it in the others as well. An earlier version of this was reviewed by Bruce. He's not to blame for any breakage I've introduced since then. Reviewed by: bde (an earlier version of the code)	1999-08-22 00:15:16 +00:00
Poul-Henning Kamp	7dc5cd047f	The bdevsw() and cdevsw() are now identical, so kill the former.	1999-08-13 10:29:38 +00:00
Poul-Henning Kamp	4d4f932326	s/v_specinfo/v_rdev/	1999-08-13 10:10:12 +00:00
Poul-Henning Kamp	0ef1c82630	Decommision miscfs/specfs/specdev.h. Most of it goes into <sys/conf.h>, a few lines into <sys/vnode.h>. Add a few fields to struct specinfo, paving the way for the fun part.	1999-08-08 18:43:05 +00:00
Alan Cox	6745299365	Add sysctl and support code to allow directories to be VMIO'd. The default setting for the sysctl is OFF, which is the historical operation. Submitted by: dillon	1999-07-26 06:25:53 +00:00

... 2 3 4 5 6 ...

563 Commits