1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-18 10:35:55 +00:00
Commit Graph

8432 Commits

Author SHA1 Message Date
Poul-Henning Kamp
ff7284eeb4 Remove the global cdev hash and use the cdevsw list instead.
Don't remove the now unused element from cdev yet, wait until
we have a better reason to bump the version.
2005-03-29 09:56:21 +00:00
Poul-Henning Kamp
fd5f6f4cf2 Privatize major(). 2005-03-29 08:13:17 +00:00
Poul-Henning Kamp
97eb8cfae0 Print name of device instead of useless major/minor numbers. 2005-03-29 08:13:01 +00:00
Jeff Roberson
ea9aa09dd1 - Remove an unused variable from relookup().
- Assert that REMOVE, CREATE, and RENAME callers have WANTPARENT
   or LOCKPARENT set.  You can't complete any of these operations without
   at least a reference to the parent.  Many filesystems check for this case
   even though it isn't possible in the current system.
2005-03-28 13:56:56 +00:00
Jeff Roberson
f7b404d88f - Remove an unused variable.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 13:29:48 +00:00
Jeff Roberson
9d65cdf6ff - Rev 1.83 of kern_lock.c fixes the td_locks assert, reenable it here.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 12:52:46 +00:00
Jeff Roberson
bf5c2a1940 - Don't bump the count twice in the LK_DRAIN case.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 12:52:10 +00:00
Jeff Roberson
9dcc5da318 - Move code that should probably be an assert above the main body of
vrele so that we can decrease the indentation of the real work and
   make things slightly more clear.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 11:18:47 +00:00
Jeff Roberson
ee5a0a2d7c - We no longer have to bother with PDIRUNLOCK, lookup() handles it for us.
Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:26:17 +00:00
Jeff Roberson
d36f0a4ff8 - Adjust asserts in vop_lookup_post() to match the new post PDIRUNLOCK
vfs.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:25:25 +00:00
Jeff Roberson
1e38e08e76 - Get rid of PDIRUNLOCK, instead, we fixup the lock state immediately after
calling VOP_LOOKUP().  Rather than having each filesystem check the
   LOCKPARENT flag, we simply check it once here and unlock as required.
   The only unusual case is ISDOTDOT, where we require an unlocked vnode
   on return.  Relocking this vnode with the child locked is allowed since
   the child is actually its parent.
 - Add a few asserts for some unusual conditions that I do not believe can
   happen.  These will later go away and turn into implementations for these
   conditions.

Sponsored by:	Isilon Systems, Inc.
2005-03-28 09:24:50 +00:00
Poul-Henning Kamp
3b73a3c079 Remove another ';' after if().
Also spotted by:	bz
2005-03-27 07:53:13 +00:00
Poul-Henning Kamp
2d8dfb2836 Remove extra ; at end of if().
Found by:	bz
2005-03-27 07:52:12 +00:00
Poul-Henning Kamp
4a650cc291 Make (some) serial ports implement the PPS-API again. This change
appearantly fell out during the tty code cleanup.
2005-03-26 20:12:39 +00:00
Poul-Henning Kamp
f83856243d s/ENOTTY/ENOIOCTL/ 2005-03-26 20:04:28 +00:00
Jeff Roberson
eb8d0e01c0 - The td_locks check is currently broken with snapshots and possibly
some case in unmount.  Disable the KASSERT until these problems can
   be diagnosed.

Sponsored by:	Isilon Systems, Inc.
2005-03-25 09:56:56 +00:00
Jeff Roberson
228ea9d212 - Don't recycle vnodes anymore. Free them once they are dead. getnewvnode
now always allocates a new vnode.
 - Define a new function, vnlru_free, which frees vnodes from the free list.
   It takes as a parameter the number of vnodes to free, which is
   wantfreevnodes - freevnodes when called from vnlru_proc or 1 when
   called from getnewvnode().  For now, getnewvnode() still tries to reclaim
   a free vnode before creating a new one when we are near the limit.
 - Define a function, vdestroy, which handles the actual release of memory
   and teardown of locks, etc.  This could become a uma_dtor() routine.
 - Get rid of minvnodes.  Now wantfreevnodes is 1/4th the max vnodes.  This
   keeps more unreferenced vnodes around so that files which have only
   been stat'd are less likely to be kicked out of the system before we
   have a chance to read them, etc.  These vnodes may still be freed via
   the normal vnlru_proc() routines which may some day become a real lru.
2005-03-25 05:34:39 +00:00
Marcel Moolenaar
379ba85322 Fix inittodr() invocation. Now that devfs is mounted before the
actual root file system is mounted, the first entry on the mountlist
is not the root file system and the timestamp for that entry is
typically 0. Passing that to inittodr() caused annoying errors on
alpha and ia64.
So, call inittodr() for all file systems on mountlist, but only when
the timestamp (mnt_time) is non-zero.
2005-03-25 01:56:12 +00:00
Jeff Roberson
6c759f3558 - Add information about the buf lock to db_show_buffer.
- Add a 'show lockedbufs' command that is similar to show lockedvnods.

Sponsored by:	Isilon Systems, Inc.
2005-03-25 00:20:37 +00:00
Jeff Roberson
f158df07ab - Restore COUNT() in all of its original glory. Don't make it dependent
on DEBUG as ufs will soon grow a dependency on this count.

Discussed with:	bde
Sponsored by:	Isilon Systems, Inc.
2005-03-25 00:00:44 +00:00
John Baldwin
85c36f3d25 Don't set ret_namelen and ret_resnamelen in res_find() unless both the
corresponding pointer to the buffer (ret_name and ret_resname) is non-NULL
to avoid possible NULL pointer derefs.

Reported by:	Coverity via sam
2005-03-24 21:20:25 +00:00
Poul-Henning Kamp
eb0d6cde00 Move implementation of hw.bus.rman sysctl to subr_rman.c so that
subr_bus.c doesn't need to peek inside struct resource.

OK from:	imp
2005-03-24 18:13:11 +00:00
Jeff Roberson
61ef09d118 - Fail an assert if we attempt to return with any lockmgr locks held in
userret().

Sponsored by:	Isilon Systems, Inc.
2005-03-24 09:35:38 +00:00
Jeff Roberson
92e251caf7 - Complete the implementation of td_locks. Track the number of outstanding
lockmgr locks that this thread owns.  This is complicated due to
   LK_KERNPROC and because lockmgr tolerates unlocking an unlocked lock.

Sponsored by:	Isilon Systes, Inc.
2005-03-24 09:35:06 +00:00
Jeff Roberson
d830f82824 - Pass LK_EXCLUSIVE to VFS_ROOT() to satisfy the new flags argument. For
now, all calls to VFS_ROOT() should still acquire exclusive locks.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:31:38 +00:00
Jeff Roberson
aabb175391 - Fixup the default vfs_root function to match the new prototype.
Sponsored by:	Isilon Systems, Inc.
2005-03-24 07:30:00 +00:00
Jeff Roberson
5d14d29912 - Grab the lock type that the caller requests in vfs_hash_insert().
Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:16:27 +00:00
Jeff Roberson
c167961e27 - If vput() is called with a shared lock it must upgrade to an exclusive
before it can call VOP_INACTIVE().  This must use the EXCLUPGRADE path
   because we may violate some lock order with another locked vnode if
   we drop and reacquire the lock.  If EXCLUPGRADE fails, we mark the
   vnode with VI_OWEINACT.  This case should be very rare.
 - Clear VI_OWEINACT in vinactive() and vbusy().
 - If VI_OWEINACT is set in vgone() do the VOP_INACTIVE call here as well.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:08:58 +00:00
Jeff Roberson
3e6bcad375 - Remove some long dead LOOKUP_SHARED code that tracked the lock state.
- Always pass LOCKSHARED and rely on namei() to ignore it when
   LOOKUP_SHARED is not set.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:04:35 +00:00
Jeff Roberson
ae88db8a72 - Remove the #ifdef LOOKUP_SHARED from some calls to NDINIT. The
LOCKSHARED flag is simply ignored in namei() if LOOKUP_SHARED is not
   enabled.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:03:31 +00:00
Jeff Roberson
ad09e57f41 - Clear LOCKSHARED if LOOKUP_SHARED is not enabled. This is not strictly
necessary since we disable the shared locks in vfs_cache, but it is
   prefered that the option not leak out into filesystems when it is
   disabled.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:02:37 +00:00
Jeff Roberson
fdd6a3ff3c - All of the bugs which lead to the complication of the LOOKUP_SHARED
config option have now been fixed.  All filesystems are properly locked
   and checked via DEBUG_VFS_LOCKS.  Remove the workaround code.

Sponsored by:	Isilon Systems, Inc.
2005-03-24 06:00:45 +00:00
Julian Elischer
b75b03116f Fix code freeing wrong cred pointer.
Submitted by:	das
Noticed by: Coverity tool
MFC after:	3 days

Note: usually the two pointers point to the same
thing but it was still a bug.
2005-03-21 22:55:38 +00:00
Robert Watson
6220dcba84 Add a read-only kern.sched.preemption sysctl so that user space can tell
if "options PREEMPTION" is compiled into the kernel.
2005-03-20 17:05:12 +00:00
Pawel Jakub Dawidek
c78941e69e Add ki_jid field to the kinfo_proc structure and store jail ID there.
Reviewed by:	gad
MFC after:	3 days
2005-03-20 10:35:23 +00:00
Poul-Henning Kamp
773eff9d97 Sleeping is not allowed in uma->fini 2005-03-19 08:22:13 +00:00
Sam Leffler
b53d6ac575 check copyin return value
Noticed by:	Coverity Prevent analysis tool
2005-03-19 04:34:23 +00:00
David Schultz
f7fdcd45f0 Add missing cases for PT_SYSCALL.
Found by:	Coverity Prevent analysis tool
2005-03-18 21:22:28 +00:00
Maxim Sobolev
2322a0a77d Impose the upper limit on signals that are allowed between kernel threads
in set[ug]id program for compatibility with Linux. Linuxthreads uses
4 signals from SIGRTMIN to SIGRTMIN+3.

Pointed out by:		rwatson
2005-03-18 13:33:18 +00:00
Poul-Henning Kamp
1ea7a6f806 Use subr_unit to allocate thread ID's with.
Tested by:	davidxu
2005-03-18 12:34:14 +00:00
Maxim Sobolev
f9cd63d436 Linuxthreads uses not only signal 32 but several signals >= 32.
PR:		kern/72922
Submitted by:	Andriy Gapon <avg@icyb.net.ua>
2005-03-18 11:08:55 +00:00
Poul-Henning Kamp
a1e1d551d8 Fix a bad copy&paste mistake I made.
Spotted by:	truckman
2005-03-18 06:01:21 +00:00
Warner Losh
36fed96550 Use STAILQ in preference to SLIST for the resources. Insert new resources
last in the list rather than first.

This makes the resouces print in the 4.x order rather than the 5.x order
(eg fdc0 at 0x3f0-0x3f5,0x3f7 is 4.x, but 0x3f7,0x3f0-0x3f5 is 5.x).  This
also means that the pci code will once again print the resources in BAR
ascending order.
2005-03-18 05:19:50 +00:00
John-Mark Gurney
c4c44d2935 fix aio+kq... I've been running ambrisko's test program for much longer
w/o problems than I was before...  This simply brings back the knote_delete
as knlist_delete which will also drop the knote's, instead of just clearing
the list and seeing _ONESHOT...

Fix a race where if a note was _INFLUX and _DETACHED, it could end up being
modified... whoopse..

MFC after:	1 week
Prodded by:	ambrisko and dwhite
2005-03-18 01:11:39 +00:00
John-Mark Gurney
7ac139a904 add m_copyup function.. This can be used to help make our ip stack less
alignment restrictive, and help performance on some ethernet cards which
currently copy the entire packet a couple bytes to get the packet aligned
properly...

Wordsmithing by:	dwhite
Obtained from:	NetBSD (code only)
I'll clean it up later:	rwatson
2005-03-17 19:34:57 +00:00
Robert Watson
bc60830675 A further step on the journey of meaking panics and debugging more reliable:
in the window between the beginning of panic() and entering the debugger,
it's possible to receive interrupts.  If we receive an interrupt, don't
preempt if panicstr != NULL, as the system is in the process of failing, and
the preempting thread is likely to stumble over the failure.  The typical
scenario is during the printf() in panic() prior to entering the debugger,
but when running with a slower console type such as serial console.

It could be that the panic string should be passed to the debugger to print,
so that it can run from the debugger's environment rather than a regular
kernel printf.

Glanced at by:	jhb
2005-03-17 15:18:01 +00:00
Poul-Henning Kamp
bde1a9c98b Kill MAJOR_AUTO 2005-03-17 13:37:28 +00:00
Poul-Henning Kamp
800b42bde0 Prepare for the final onslaught on devices:
Move uid/gid/mode from cdev to cdevsw.

Add kind field to use for devd(8) later.

Bump both D_VERSION and __FreeBSD_version
2005-03-17 12:07:00 +00:00
Poul-Henning Kamp
572b4402d1 In stange circumstances we may end up being the last reference to a
session in tprintf().   SESSRELE() needs to properly dispose of the
sessions mutex.

Add sessrele() which does the proper cleanup and have SESSRELE() call it.

Use SESSRELE also in pgdelete().

Found by:	Coverity (ID:526)
2005-03-17 08:44:41 +00:00
Poul-Henning Kamp
51f5ce0c8c Add two arguments to the vfs_hash() KPI so that filesystems which do
not have unique hashes (NFS) can also use it.
2005-03-16 11:20:51 +00:00
Poul-Henning Kamp
9068e77689 Fix a memoryleak in case of failed root filesystem mount.
Spotted by:     Coverity via sam
2005-03-16 11:06:49 +00:00
John-Mark Gurney
2a77000b75 MFp4: print a more useful error when we don't have a /dev to mount devfs
on..
2005-03-16 08:04:39 +00:00
Poul-Henning Kamp
78bb3c21ed Add mnt_hashseed to struct mount and initialize it witn PRNG bits, use
it to get better hashing in vfs_hash.

In case of an insert collision in vfs_hash_insert(), put the loosing vnode
on a special list so that vfs_hash_remove() can just assume that it is on
a list.

Drop the VI_HASHED flag.
2005-03-16 07:35:06 +00:00
Warner Losh
358fef538f Sometimes, when asked to return region A..C, we'd return A+N..C+N
instead of failing.

When looking for a region to allocate, we used to check to see if the
start address was < end.  In the case where A..B is allocated already,
and one wants to allocate A..C (B < C), then this test would
improperly fail (which means we'd examine that region as a possible
one), and we'd return the region B+1..C+(B-A+1) rather than NULL.
Since C+(B-A+1) is necessarily larger than C (end argument), this is
incorrect behavior for rman_reserve_resource_bound().

The fix is to exclude those regions where r->r_start + count - 1 > end
rather than r->r_start > end.  This bug has been in this code for a
very long time.  I believe that all other tests against end are
correctly done.

This is why sio0 generated a message about interrupts not being
enabled properly for the device.  When fdc had a bug that allocated
from 0x3f7 to 0x3fb, sio0 was then given 0x3fc-0x404 rather than the
0x3f8-0x3ff that it wanted.  Now when fdc has the same bug, sio0 fails
to allocate its ports, which is the proper behavior.  Since the probe
failed, we never saw the messed up resources reported.

I suspect that there are other places in the tree that have weird
looping or other odd work arounds to try to cope with the observed
weirdness this bug can introduce.  These workarounds should be located
and eliminated.

Minor debug write fix to match the above test done as well.

'nice' by: mdodd
Sponsored by: timing solutions (http://www.timing.com/)
2005-03-15 20:28:51 +00:00
Warner Losh
a33ab77447 Fix a debugging printf. The order of start/end was inconsistant with
all the other start/end debugs, causing momentary confusion when the
output was examined.
2005-03-15 20:15:15 +00:00
Poul-Henning Kamp
45c26fa2b6 Improve the vfs_hash() API: vput() the unneeded vnode centrally to
avoid replicating the vput in all the filesystems.
2005-03-15 20:00:03 +00:00
Jeff Roberson
b172f6c5f9 - Now that there are no external users of vfree() make it static.
- Move VSHOULDBUSY, VSHOULDFREE, and VTRYRECYCLE into vfs_subr.c so
   no one else attempts to grow a dependency on them.
 - Now that objects with pages hold the vnode we don't have to do unlocked
   checks for the page count in the vm object in VSHOULDFREE.  These three
   macros could simply check for holdcnt state transitions to determine
   whether the vnode is on the free list already, but the extra safety
   the flag affords us is probably worth the minimal cost.
 - The leafonly sysctl and code have been dead for several years now,
   remove the sysctl and the code that employed it from vtryrecycle().
 - vtryrecycle() also no longer has to check the object's page count as
   the object holds the vnode until it reaches 0.

Sponsored by:	Isilon Systems, Inc.
2005-03-15 14:38:16 +00:00
Poul-Henning Kamp
7933351a28 Fix a debug message to print a usable device name rather than useless
major+minor tupple.
2005-03-15 14:08:10 +00:00
Jeff Roberson
c178628d6e - Expose vholdl() so it may be used outside of vfs_subr.c 2005-03-15 13:43:10 +00:00
Poul-Henning Kamp
4ba679d6d0 Remove findcdev(). 2005-03-15 12:58:08 +00:00
Poul-Henning Kamp
0a2e49f1f8 Rename cdev->si_udev to cdev->si_drv0 to reflect the new nature of
the field.
2005-03-15 11:33:28 +00:00
Jeff Roberson
f5f0da0a0e - transferlockers() requires the interlock to be SMP safe.
Sponsored by:	Isilon Systems, Inc.
2005-03-15 09:27:45 +00:00
Poul-Henning Kamp
e82ef95c11 Simplify the vfs_hash calling convention. 2005-03-15 08:07:07 +00:00
Poul-Henning Kamp
ee148e2606 Cleanup accidentally include #if 0 section. 2005-03-14 10:25:09 +00:00
Poul-Henning Kamp
6c325a2a21 Currently (almost) all filesystems maintain a local inode hash table
to get from (mount + inode) to vnode.  These tables are mostly
copy&pasted from UFS, sized based on desiredvnodes and therefore
quite large (128K-512K).  Several filesystems are buggy enough that
they allocate the hash table even before they know if they will
ever be used or not.

Add "vfs_hash", a system wide hash table, which will replace all
the per-filesystem hash-tables.

The fields we add to struct vnode will more or less be saved in
the respective filesystems inodes.

Having one central implementation will save code and will allow us
to justify the complexity of code to dynamically (re)size the hash
at a later point.
2005-03-14 10:01:29 +00:00
Jeff Roberson
8045557f2b - Increment the holdcnt once for each usecount reference. This allows us
to use only the holdcnt to determine whether a vnode may be recycled,
   simplifying the V* macros as well as vtryrecycle(), etc.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 09:25:19 +00:00
Jeff Roberson
159b454819 - We do not have to check the object's ref_count in VSHOULDFREE or
vtryrecycle().  All obj refs also ref the vnode.
 - Consistently use v_incr_usecount() to increment the usecount.  This will
   be more important later.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 08:30:31 +00:00
Jeff Roberson
8f13a540ed - Slightly rearrange vrele() to move the common case in one indentation
level.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:16:55 +00:00
Jeff Roberson
6fc16a838c - Rework vget() so we drop the usecount in two failure cases that were
missed by my last commit.

Sponsored by:	Isilon Systems, Inc.
2005-03-14 07:11:19 +00:00
Poul-Henning Kamp
93f6c81e25 Remove debugging printfs. 2005-03-14 06:51:29 +00:00
Jeff Roberson
0463dc9ef1 - Do a vn_start_write in vn_close, we may write if this is the last ref
on an unlinked file.  We can't know if this is the case until after we
   have the lock.
 - Lock the vnode in vn_close, many filesystems had code which was unsafe
   without the lock held, and holding it greatly simplifies vgone().
 - Adjust vn_lock() to check for the VI_DOOMED flag where appropriate.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:56:28 +00:00
Jeff Roberson
6703c30bb5 - Remove vx_lock, vx_unlock, vx_wait, etc.
- Add a vn_start_write/vn_finished_write around vlrureclaim so we don't do
   writing ops without suspending.  This could suspend the vlruproc which
   should not be a problem under normal circumstances.
 - Manually implement VMIGHTFREE in vlrureclaim as this was the only instance
   where it was used.
 - Acquire a lock before calling vgone() as it now requires it.
 - Move the acquisition of the vnode interlock from vtryrecycle() to
   getnewvnode() so that if it fails we don't drop and reacquire the
   vnode_free_list_mtx.
 - Check for a usecount or holdcount at the end of vtryrecycle() in case
   someone grabbed a ref while we were recycling.  Abort the recycle, and
   on the final ref drop this vnode will be placed on the head of the free
   list.
 - Move the redundant VOP_INACTIVE protection code into the local
   vinactive() routine to avoid code bloat.
 - Keep the vnode lock held across calls to vgone() in several places.
 - vgonel() no longer uses XLOCK, instead callers must hold an exclusive
   vnode lock.  The VI_DOOMED flag is set to allow other threads to detect
   a vnode which is no longer valid.  This flag is set until the last
   reference is gone, and there are no chances for a new ref.  vgonel()
   holds this lock across the entire function, which greatly simplifies
   logic.
 _ Only vfree() in one place in vgone() not three.
 - Adjust vget() to check the VI_DOOMED flag prior to waiting on the lock
   in the LK_NOWAIT case.  In other cases, check after we have slept and
   acquired an exlusive lock.  This will simulate the old vx_wait()
   behavior.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:54:28 +00:00
Jeff Roberson
2b3183a8b7 - A lock is required before calling VOP_REVOKE. Our reference protects us
from accessing another vnode so a naked VOP_LOCK is sufficient.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:47:04 +00:00
Jeff Roberson
9331fd135b - Don't VOP_UNLOCK prior to VOP_REVOKE. The lock is required now.
Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:45:51 +00:00
Jeff Roberson
23f2513a4e - Don't drop the lock in the default inactive handler anymore, VOP_NULL
will do for vop_stdinactive now.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:45:01 +00:00
Jeff Roberson
4e6746965e - CLOSE, REVOKE, INACTIVE, and RECLAIM are not L L L, that's a locked vnode
on enter, exit, error.  This allows for the removal of the XLOCK.

Sponsored by:	Isilon Systems, Inc.
2005-03-13 11:42:16 +00:00
Pawel Jakub Dawidek
cefcecbefd Function jailed() looks into ucred strcture, so be sure ucred is not NULL.
Reviewed by:	rwatson
MFC after:	1 week
2005-03-12 14:31:04 +00:00
Pawel Jakub Dawidek
d079d0a0d2 Clean up a bit.
Reviewed by:	rwatson
MFC after:	1 week
2005-03-12 14:28:34 +00:00
Robert Watson
59f21d5ab1 Extend the coverage of the accept and socket mutexes in soisconnected()
so that the socket lock is held over the test-and-set removal of the
accept filter option during connect, and the two socket mutex regions
(transition to connected, perform accept filter) are combined.
2005-03-12 13:39:39 +00:00
Robert Watson
a59f81d263 Move the logic implementing retrieval of the SO_ACCEPTFILTER socket option
from uipc_socket.c to uipc_accf.c in do_getopt_accept_filter(), so that it
now matches do_setopt_accept_filter().  Slightly reformulate the logic to
match the optimistic allocation of storage for the argument in advance,
and slightly expand the coverage of the socket lock.
2005-03-12 12:57:18 +00:00
Robert Watson
92081a8344 Part two of post-SMPng cleanup of accept filter registration: perform all
allocation up front before grabbing the socket mutex and doing the
registration work.  The result is a lot cleaner.
2005-03-12 12:27:47 +00:00
Peter Wemm
f71692e9be Replace my previous change for 32 bit systems with hz > 169 with Bruce's
simpler one.
2005-03-12 00:13:45 +00:00
Peter Wemm
2afec87508 Make the tty vmin/vtime timeouts work for hz > 169 on 32 bit machines. 2005-03-12 00:10:23 +00:00
Robert Watson
64c238075f First step in simplifying accept filter socket option logic in the
post-SMPng world order.  Centralize handling of the socket option
clear case in do_setopt_accept_filter().
2005-03-11 21:37:45 +00:00
Robert Watson
56856fbfb4 Remove an additional commented out reference to a possible future sx
lock.
2005-03-11 19:16:02 +00:00
Robert Watson
2b37548a71 When setting up a socket in socreate(), there's no need to lock the
socket lock around knlist_init(), so don't.

Hard code the setting of the socket reference count to 1 rather than
using soref() to avoid asserting the socket lock, since we've not yet
exposed the socket to other threads.

This removes two mutex operations from each socket allocation.
2005-03-11 16:30:02 +00:00
Robert Watson
5fab68b19e Remove suggestive sx_init() comment in soalloc(). We will have something
like this at some point, but for now it clutters the source.
2005-03-11 16:26:33 +00:00
Robert Watson
35a196154f The SO_NOSIGPIPE socket option allows a user process to mark a socket
so that the socket does not generate SIGPIPE, only EPIPE, when a write
is attempted after socket shutdown.  When the option was introduced in
2002, this required the logic for determining whether SIGPIPE was
generated to be pushed down from dofilewrite() to the socket layer so
that the socket options could be considered.  However, the change in
2002 omitted modification to soo_write() required to add that logic,
resulting in SIGPIPE not being generated even without SO_NOSIGPIPE when
the socket was written to using write() or related generic system calls.

This change adds the EPIPE logic to soo_write(), generating a SIGPIPE
signal to the process associated with the passed uio in the event that
the SO_NOSIGPIPE option is not set.

Notes:

- The are upsides and downsides to placing this logic in the socket
  layer as opposed to the file descriptor layer.  This is really fd
  layer logic, but because we need so_options, we have a choice of
  layering violations and pick this one.

- SIGPIPE possibly should be delivered to the thread performing the
  write, not the process performing the write.

- uio->uio_td and the td argument to soo_write() might potentially
  differ; we use the thread in the uio argument.

- The "sigpipe" regression test in src/tools/regression/sockets/sigpipe
  tests for the bug.

Submitted by:		Mikko Tyolajarvi <mbsd at pacbell dot net>
Talked with:		glebius, alfred
PR:			78478
MFC after:		1 week
2005-03-11 15:06:16 +00:00
John-Mark Gurney
74e620476c fix spelling of match in comment...
MFC after:	3 days
2005-03-10 21:23:06 +00:00
Poul-Henning Kamp
b43ab0e378 Try to fix the mess I made of devname, with the minimal subset of the
larger minor/major patch which was posted for testing.
2005-03-10 18:21:34 +00:00
Robert Watson
53358cc907 Document, via WITNESS, that the NFS server mutex falls ahead of the socket
buffer mutexes.
2005-03-09 21:38:53 +00:00
Dag-Erling Smørgrav
628b83cd08 My addled brains didn't realize that since vtp points into value, we
can't freeenv(value) before we're done inspecting vtp[0].

Tested by:	Anish Mistry <mistry.7@osu.edu>
2005-03-09 12:16:45 +00:00
Stefan Farfeleder
b26244446b Fix typo in comment. 2005-03-09 11:50:55 +00:00
Sam Leffler
a4e714295a allow the destination of m_move_pkthdr to have external
storage (e.g. a cluster)

Glanced at by:	rwatson, silby
2005-03-08 17:52:01 +00:00
Giorgos Keramidas
0a11e99990 Remove redundant initialization that is repeated in the for() loop
right below it.

Approved by:	jhb
2005-03-08 16:57:20 +00:00
Maxim Sobolev
8d6e40c3f1 Add kernel-only flag MSG_NOSIGNAL to be used in emulation layers to surpress
SIGPIPE signal for the duration of the sento-family syscalls. Use it to
replace previously added hack in Linux layer based on temporarily setting
SO_NOSIGPIPE flag.

Suggested by:	alfred
2005-03-08 16:11:41 +00:00
Poul-Henning Kamp
d9a54d5c23 Reengineer subr_unit
Add support for passing in a mutex.  If NULL is passed a global
	subr_unit mutex is used.

	Add alloc_unrl() which expects the mutex to be held.

	Allocating a unit will never sleep as it does not need to allocate
	memory.

	Cut possible range in half so we can use -1 to mean "out of number".

	Collapse first and last runs into the head by means of counters.
	This saves memory in the common case(s).
2005-03-08 10:40:48 +00:00
Poul-Henning Kamp
3238ec33e1 Fix signedness of minor2unit(). 2005-03-08 10:40:03 +00:00
Jeff Roberson
ec346d1040 - Lock access to the buffer_map with the vm_map lock. In 4.x this was
done with splbio, in 5.x this was done with Giant.

Discussed with:		alc
Reported by:		julian, pho
2005-03-08 09:34:54 +00:00
Giorgos Keramidas
46da8bf8fb Typo & grammar fixes in comments. 2005-03-08 00:58:50 +00:00
Robert Watson
9bfb7389bc When upcalling from a socket in soisconnected() for an accept filter,
call with flag M_DONTWAIT rather than M_TRYWAIT, as we don't want to
do blocking memory allocation (etc) in the netisr.

MFC after:	3 days
2005-03-07 13:50:16 +00:00
Poul-Henning Kamp
3b3f38ed7d Add placeholder mutex argument to new_unrhdr(). 2005-03-07 11:05:47 +00:00
Bill Paul
58a6edd121 When you call MiniportInitialize() for an 802.11 driver, it will
at some point result in a status event being triggered (it should
be a link down event: the Microsoft driver design guide says you
should generate one when the NIC is initialized). Some drivers
generate the event during MiniportInitialize(), such that by the
time MiniportInitialize() completes, the NIC is ready to go. But
some drivers, in particular the ones for Atheros wireless NICs,
don't generate the event until after a device interrupt occurs
at some point after MiniportInitialize() has completed.

The gotcha is that you have to wait until the link status event
occurs one way or the other before you try to fiddle with any
settings (ssid, channel, etc...). For the drivers that set the
event sycnhronously this isn't a problem, but for the others
we have to pause after calling ndis_init_nic() and wait for the event
to arrive before continuing. Failing to wait can cause big trouble:
on my SMP system, calling ndis_setstate_80211() after ndis_init_nic()
completes, but _before_ the link event arrives, will lock up or
reset the system.

What we do now is check to see if a link event arrived while
ndis_init_nic() was running, and if it didn't we msleep() until
it does.

Along the way, I discovered a few other problems:

- Defered procedure calls run at PASSIVE_LEVEL, not DISPATCH_LEVEL.
  ntoskrnl_run_dpc() has been fixed accordingly. (I read the documentation
  wrong.)

- Similarly, the NDIS interrupt handler, which is essentially a
  DPC, also doesn't need to run at DISPATCH_LEVEL. ndis_intrtask()
  has been fixed accordingly.

- MiniportQueryInformation() and MiniportSetInformation() run at
  DISPATCH_LEVEL, and each request must complete before another
  can be submitted. ndis_get_info() and ndis_set_info() have been
  fixed accordingly.

- Turned the sleep lock that guards the NDIS thread job list into
  a spin lock. We never do anything with this lock held except manage
  the job list (no other locks are held), so it's safe to do this,
  and it's possible that ndis_sched() and ndis_unsched() can be
  called from DISPATCH_LEVEL, so using a sleep lock here is
  semantically incorrect. Also updated subr_witness.c to add the
  lock to the order list.
2005-03-07 03:05:31 +00:00
Alan Cox
2b2c7a6b40 The m_ext reference counts are potentially shared and modified
asynchronously by different threads.  Thus, declare as volatile the
reference count that is accessed through m_ext's pointer, ref_cnt.
Revert the previous change, revision 1.144, that casts as volatile a
single dereference of ref_cnt.

Reviewed by: bmilekic, dwhite
Problem reported by: kris
MFC after: 3 days
2005-03-06 20:09:00 +00:00
Dag-Erling Smørgrav
f3301d15f1 Teach getenv_quad() to recognize k/m/g/t suffixes in both lower- and
upper-case.  This means (almost) all tunables now support those suffixes.
2005-03-05 15:52:12 +00:00
David Xu
bc8e6d817d Allocate umtx_q from heap instead of stack, this avoids
page fault panic in kernel under heavy swapping.
2005-03-05 09:15:03 +00:00
David Xu
627451c1d9 The td_waitset is pointing to a stack address when thread is waiting
for a signal, because kernel stack is swappable, this causes page fault
in kernel under heavy swapping case. Fix this bug by eliminating unneeded
code.
2005-03-04 22:46:31 +00:00
Maxim Sobolev
4b1783363f In linux emulation layer try to detect attempt to use linux_clone() to
create kernel threads and call rfork(2) with RFTHREAD flag set in this case,
which puts parent and child into the same threading group. As a result
all threads that belong to the same program end up in the same threading
group.

This is similar to what linuxthreads port does, though in this case we don't
have a luxury of having access to the source code and there is no definite
way to differentiate linux_clone() called for threading purposes from other
uses, so that we have to resort to heuristics.

Allow SIGTHR to be delivered between all processes in the same threading
group previously it has been blocked for s[ug]id processes.

This also should improve locking of the same file descriptor from different
threads in programs running under linux compat layer.

PR:			kern/72922
Reported by:		Andriy Gapon <avg@icyb.net.ua>
Idea suggested by:	rwatson
2005-03-03 16:57:55 +00:00
Doug White
a1d0c3f203 Insert volatile cast to discourage gcc from optimizing the read outside
of the while loop.

Suggested by:	alc
MFC after:	1 day
2005-03-03 02:41:37 +00:00
Joerg Wunsch
a5f50ef9e4 netchild's mega-patch to isolate compiler dependencies into a central
place.

This moves the dependency on GCC's and other compiler's features into
the central sys/cdefs.h file, while the individual source files can
then refer to #ifdef __COMPILER_FEATURE_FOO where they by now used to
refer to #if __GNUC__ > 3.1415 && __BARC__ <= 42.

By now, GCC and ICC (the Intel compiler) have been actively tested on
IA32 platforms by netchild.  Extension to other compilers is supposed
to be possible, of course.

Submitted by:	netchild
Reviewed by:	various developers on arch@, some time ago
2005-03-02 21:33:29 +00:00
David Xu
6675b36ec5 In kern_sigtimedwait, remove waitset bits for td_sigmask before
sleeping, so in do_tdsignal, we no longer need to test td_waitset.
now td_waitset is only used to give a thread higher priority when
delivering signal to multithreads process.
This also fixes a bug:
when a thread in sigwait states was suspended and later resumed
by SIGCONT, it can no longer receive signals belong to waitset.
2005-03-02 13:43:51 +00:00
Paul Saab
b8a4edc17e Use kern_kevent instead of the stackgap for 32bit syscall wrapping.
Submitted by:	jhb
Tested on:	amd64
2005-03-01 17:45:55 +00:00
Paul Saab
c1aa81b6d9 regen 2005-03-01 17:44:34 +00:00
Paul Saab
96d31285fe Change the prototype of kevent to remove the const from the changelist.
Reviewed by:	jhb
2005-03-01 17:43:08 +00:00
Robert Watson
081322613b When mac_check_system_acct() fails, make sure to unlock as well as close
the vnode.

Pointed out by:	jeff
2005-03-01 08:56:13 +00:00
Wes Peters
a09150446d Add a sysctl that records the amount of physical memory in the machine.
Submitted by:	Nicko Dehaine <nicko@stbernard.com>
MFC after:	1 day
2005-02-28 21:42:56 +00:00
Poul-Henning Kamp
8045ce213d Also handle d_maj hints from cloning drivers correctly. 2005-02-27 22:57:32 +00:00
Poul-Henning Kamp
84f580a093 Whine about any drivers which hardcode the device major number. 2005-02-27 22:41:07 +00:00
Poul-Henning Kamp
acd102e64b Use dynamic major number allocation. 2005-02-27 22:02:03 +00:00
Poul-Henning Kamp
78e253c8d5 Use dynamic major number allocation. 2005-02-27 22:00:45 +00:00
Poul-Henning Kamp
89685e2269 Use dynamic major number allocation for /dev/console, there is no
longer any benefit from hard wiring it.

Remove special hack used to wire major to zero despite zero having a
different magic meaning as well.
2005-02-27 21:52:42 +00:00
Nate Lawson
789f03ceb4 Add locking to handle multiple threads getting/setting frequencies at the
same time.  We use an sx lock and serialize the cpufreq device's
get/set/levels methods.
2005-02-27 01:34:08 +00:00
Nate Lawson
b070969b48 Allow users to reject levels below a given frequency (in MHz) via the
debug.cpufreq.lowest tunable and sysctl.  Some systems seem to have problems
with the lowest frequencies so setting this prevents them from being
available or used.
2005-02-26 22:37:49 +00:00
Tom Rhodes
183a16a3ec Remove recently added note about DEVICE_POLLING not working with SMP.
Remove warning from kern_poll.c to allow DEVICE_POLLING to be built with SMP.

Discussed with:	ru, glebius
2005-02-25 22:07:51 +00:00
Robert Watson
fa6fc5b819 Insert missing increment of (i) when walking the temporary semaphore
vector during fork.

Fix assertion which contained an off-by-one error.

Submitted by:	Antoine Brodin < antoine dot brodin at laposte dot net >
2005-02-25 21:00:14 +00:00
Robert Watson
590f242cc0 Add an exit hook, sem_forkhook(), which walks the list of POSIX semaphores
owned by a process when it forks, and creates a matching set of references
for the child process, as prescribed by POSIX.

In order to avoid races with other threads in the parent process during
fork(), it is necessary to allocate a temporary reference list while
holding the sem_lock, then transfer those references to the new process
once the sem_lock is released.  The implementation is inefficient but
appears functional; in order to improve the efficiency, it will be
necessary to modify the existing structures and logic, which generally
rely on O(n) operations over the global set of semaphores.
2005-02-25 19:10:51 +00:00
Robert Watson
955ec4156c Assert sem_lock in id_to_sem() and sem_lookup_byname(), since these
functions iterate over the global POSIX semaphore lists.

MFC after:	3 days
2005-02-25 17:01:35 +00:00
Maxim Sobolev
90dc539be0 Welcome to the 21st century: increase MAXSHELLCMDLEN from 128 bytes to
PAGE_SIZE.

Unlike originator of the PR suggests retain MAXSHELLCMDLEN definition
(he has been proposing to replace it with PAGE_SIZE everywhere), not only
this reduced the diff significantly, but prevents code obfuscation and also
allows to increase/decrease this parameter easily if needed.

PR:		kern/64196
Submitted by:	Magnus Bäckström <b@etek.chalmers.se>
2005-02-25 11:49:42 +00:00
Maxim Sobolev
6916a1da50 o Replace two while {} do loops with more appropriate do {} while loops. This
doesn't change functionality, but makes code more logical.

Obtained from:	DrafonFlyBSD

o Use VOP_GETATTR() to obtain actual size of file and parse no more than that.
  Previously, we parsed MAXSHELLCMDLEN characters regardless of the actual file
  size. This makes the following working:

$ printf '#!/bin/echo' > /tmp/test.sh
$ chmod 755 /tmp/test.sh
$ /tmp/test.sh

Previously, attempts to execve() that shell script has been failing with bogus
ENAMETOOLONG.

PR:		kern/64196
Submitted by:	Magnus B.ckstr.m <b@etek.chalmers.se>
2005-02-25 10:17:53 +00:00
Maxim Sobolev
b4305f8d91 Try harder to not exceed MAXSHELLCMDLEN when parsing first line of shell
script. Otherwise it's possible to panic kernel by constructing a shell
script with first line not ending in '\n'.

Also, treat '\0' as line terminating character, which may me useful in
some situations.

Submitted by:	gad
2005-02-25 08:42:04 +00:00
Nate Lawson
d269386a24 Bump the maximum number of levels to 64 and add warning messages about
what to do to fix reduced functionality if the number of levels is too low.
2005-02-24 20:21:41 +00:00
Sam Leffler
59d8b31002 change m_adj to reclaim unused mbufs instead of zero'ing m_len
when trim'ing space off the back of a chain; this is indirect
solution to a potential null ptr deref

Noticed by:	Coverity Prevent analysis tool (null ptr deref)
Reviewed by:	dg, rwatson
2005-02-24 00:40:33 +00:00
Christian S.J. Peron
cd13819433 Add locking assertions into vn_extattr_set, vn_extattr_get and
vn_extattr_rm. This is meant to catch conditions where IO_NODELOCKED
has been specified without the vnode being locked.

Discussed with:	rwatson
MFC after:	1 week
2005-02-24 00:13:16 +00:00
Christian S.J. Peron
df579737e5 Drop bzero and shove the responsibility of zeroing the kse upcall
object on to the zone allocator. It should be noted that uma_zalloc(9)
uses bzero to zero out the object so there probably wont be any
real performance benefit. If UMA grows the ability to supply
zeroed zones more efficiently in the future, we will not have to
modify all the existing consumers.

Discussed with:	rwatson,julian
MFC after:	1 week
2005-02-24 00:05:50 +00:00
Sam Leffler
9d8993bbc5 remove dead code
Noticed by:	Coverity Prevent analysis tool
Reviewed by:	silby
2005-02-23 19:34:44 +00:00
Sam Leffler
15ecf3968d eliminate potential null deref
Noticed by:	Coverity Prevent analysis tool
Reviewed by:	jhb
2005-02-23 19:32:29 +00:00
Jeff Roberson
d9a9c2c22c - Enable SMP VFS by default on current. More users are needed to turn up
any remaining bugs.  Anyone inconvenienced by this can still disable it
   in the loader.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 10:05:43 +00:00
Jeff Roberson
7a9507b60e - A test in sched_switch() is no longer necessary and it is incorrect
when td0 is preempted before it voluntarily switches.

Discovered by:	Arjan Van Leeuwen <avleeuwen@gmail.com>
2005-02-23 00:50:26 +00:00
Sam Leffler
3e55226c46 kill dead code
Noticed by:	Coverity Prevent analysis tool
2005-02-23 00:43:00 +00:00
Jeff Roberson
d8a7c99a1c - Only the xlock holder should be calling VOP_LOCK on a vp once VI_XLOCK
has been set.  Assert that this is the case so that we catch filesystems
   who are using naked VOP_LOCKs in illegal cases.

Sponsored by:	Isilon Systems, Inc.
2005-02-23 00:11:14 +00:00
Jeff Roberson
4c11620bb9 - Add a check for xlock in vop_lock_assert. Presently the xlock is
considered to be as good as an exclusive lock, although there is still a
   possibility of someone acquiring a VOP LOCK while xlock is held.

Sponsored by:	Isilon Systems, Inc.
2005-02-22 23:59:11 +00:00
Poul-Henning Kamp
767056c0e8 Zero the v_un container field to make sure everything is gone. 2005-02-22 18:56:18 +00:00
Poul-Henning Kamp
aa2f6ddc3f Reap more benefits from DEVFS:
List devfs_dirents rather than vnodes off their shared struct cdev, this
saves a pointer field in the vnode at the expense of a field in the
devfs_dirent.  There are often 100 times more vnodes so this is bargain.
In addition it makes it harder for people to try to do stypid things like
"finding the vnode from cdev".

Since DEVFS handles all VCHR nodes now, we can do the vnode related
cleanup in devfs_reclaim() instead of in dev_rel() and vgonel().
Similarly, we can do the struct cdev related cleanup in dev_rel()
instead of devfs_reclaim().

	rename idestroy_dev() to destroy_devl() for consistency.

	Add LIST_ENTRY de_alias to struct devfs_dirent.
	Remove v_specnext from struct vnode.
	Change si_hlist to si_alist in struct cdev.
	String new devfs vnodes' devfs_dirent on si_alist when
	we create them and take them off in devfs_reclaim().

	Fix devfs_revoke() accordingly.  Also don't clear fields
	devfs_reclaim() will clear when called from vgone();

	Let devfs_reclaim() call dev_rel() instead of vgonel().

	Move the usecount tracking from dev_rel() to devfs_reclaim(),
	and let dev_rel() take a struct cdev argument instead of vnode.

	Destroy SI_CHEAPCLONE devices in dev_rel() (instead of
	devfs_reclaim()) when they are no longer used.   (This
	should maybe happen in devfs_close() instead.)
2005-02-22 15:51:07 +00:00
Poul-Henning Kamp
1a1457d427 Make dev_ref() require the dev_lock() to be held and use it from
devfs instead of directly frobbing the si_refcount.
2005-02-22 14:41:04 +00:00
Poul-Henning Kamp
7fc940b266 Remove vfinddev(), it is generally bogus when faced with jails and
chroot and has no legitimate use(r)s in the tree.
2005-02-22 14:11:47 +00:00
Robert Watson
4f7fd28ee1 When invoking callout_init(), spell '1' as "CALLOUT_MPSAFE".
MFC after:	3 days
2005-02-22 13:11:33 +00:00
Robert Watson
0daccb9c94 In the current world order, solisten() implements the state transition of
a socket from a regular socket to a listening socket able to accept new
connections.  As part of this state transition, solisten() calls into the
protocol to update protocol-layer state.  There were several bugs in this
implementation that could result in a race wherein a TCP SYN received
in the interval between the protocol state transition and the shortly
following socket layer transition would result in a panic in the TCP code,
as the socket would be in the TCPS_LISTEN state, but the socket would not
have the SO_ACCEPTCONN flag set.

This change does the following:

- Pushes the socket state transition from the socket layer solisten() to
  to socket "library" routines called from the protocol.  This permits
  the socket routines to be called while holding the protocol mutexes,
  preventing a race exposing the incomplete socket state transition to TCP
  after the TCP state transition has completed.  The check for a socket
  layer state transition is performed by solisten_proto_check(), and the
  actual transition is performed by solisten_proto().

- Holds the socket lock for the duration of the socket state test and set,
  and over the protocol layer state transition, which is now possible as
  the socket lock is acquired by the protocol layer, rather than vice
  versa.  This prevents additional state related races in the socket
  layer.

This permits the dual transition of socket layer and protocol layer state
to occur while holding locks for both layers, making the two changes
atomic with respect to one another.  Similar changes are likely require
elsewhere in the socket/protocol code.

Reported by:		Peter Holm <peter@holm.cc>
Review and fixes from:	emax, Antoine Brodin <antoine.brodin@laposte.net>
Philosophical head nod:	gnn
2005-02-21 21:58:17 +00:00
Robert Watson
c364c823d0 When aborting a UNIX domain socket bind() because VOP_CREATE() failed,
make sure to call vn_finished_write(mp) before returning.

MFC after:	3 days
2005-02-21 14:21:50 +00:00
Robert Watson
892af6b930 style(9)-ize function headers, remove use of 'register'.
MFC after:	3 days
2005-02-20 23:22:13 +00:00
David Schultz
e8ed933099 Remove VFS_START(). Its original purpose involved the mfs filesystem,
which is long gone.

Discussed with:	mckusick
Reviewed by:	phk
2005-02-20 23:02:20 +00:00