1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-20 11:11:24 +00:00
Commit Graph

7938 Commits

Author SHA1 Message Date
Marcel Moolenaar
02a20471ab Increase default HZ for ia64 to 1000. 2004-11-08 04:50:02 +00:00
Alan Cox
d3cb0d99e0 Introduce two new options, "CPU private" and "no wait", to sf_buf_alloc().
Change the spelling of the "catch" option to be consistent with the new
options.  Implement the "no wait" option.  An implementation of the "CPU
private" for i386 will be committed at a later date.
2004-11-08 00:43:46 +00:00
Robert Watson
f42a43fa2d Add basic critical section tracing to KTR using event type KTR_CRITICAL.
This generates a KTR event for each critical section entered and exited.

It would be desirable to also log the filename and line number of the
source entering or exiting the critical section, but this requires
hacking up the critical section API, so I've not done that yet.
2004-11-07 23:11:32 +00:00
Poul-Henning Kamp
ef11fbd7c4 Introduce fdclose() which will clean an entry in a filedesc.
Replace homerolled versions with call to fdclose().

Make fdunused() static to kern_descrip.c
2004-11-07 22:16:07 +00:00
Poul-Henning Kamp
b1fa752732 Use fget_locked() instead of homerolled 2004-11-07 16:09:56 +00:00
Poul-Henning Kamp
2f5a40aa3f Move fdinit() related stuff from .h to .c 2004-11-07 15:34:45 +00:00
Poul-Henning Kamp
8ec21e3a68 Allow fdinit() to be called with a NULL fdp argument so we can use
it when setting up init.

Make fdinit() lock the fdp argument as needed.
2004-11-07 12:39:28 +00:00
Nate Lawson
70ce93f4c5 Add comments to clarify why we need to run shutdown code on the BSP, update
an old comment about boot() being MI, and note that splhigh() no longer
disables interrupts.
2004-11-07 06:58:45 +00:00
Poul-Henning Kamp
3b19b5af3a When we open /dev/null for stdin/out/err for safety reasons, do it right:
we should preserve f_data and f_ops if they are already set.
2004-11-06 23:36:09 +00:00
Poul-Henning Kamp
5349c79d75 Properly implement a default version of VOP_GETWRITEMOUNT.
Remove improper access to vop_stdgetwritemount() which should and
will instead rely on the VOP default path.
2004-11-06 11:41:22 +00:00
Poul-Henning Kamp
0c7d0f9639 Increase default HZ for i386 to 1000 2004-11-06 11:33:43 +00:00
Alan Cox
19187819b7 Move a call to wakeup() from vm_object_terminate() to vnode_pager_dealloc()
because this call is only needed to wake threads that slept when they
discovered a dead object connected to a vnode.  To eliminate unnecessary
calls to wakeup() by vnode_pager_dealloc(), introduce a new flag,
OBJ_DISCONNECTWNT.

Reviewed by: tegge@
2004-11-06 05:33:02 +00:00
David Xu
8acf605790 Respect TDF_SINTR, don't suspend uninterruptible thread. 2004-11-05 22:40:33 +00:00
David Xu
64895117a0 Backout previous commit, the P_STOPPED_BOUNDARY flag was already
cleared at the begin of thread_single() when needed.
2004-11-05 22:31:20 +00:00
John Baldwin
bd1d11f5dc - Store threads on sleep queues in FIFO order rather than sorted by
priority.  The sleep queues don't get updated when the priority of
  threads changes, so sleepq_signal() might not always wakeup the
  highest priority thread.  Updating the queues when thread priorities
  change cannot be easily done due to lock orders, so instead we do an
  O(n) walk of the queue for a sleepq_signal() operation instead of O(1).
  On the other hand, adding a thread to a sleep queue now goes from O(n)
  to O(1) so it ends up as an even tradeoff.  The correctness here with
  regards to priorities is actually fairly important.  msleep() gives
  interactive threads their priority "boost" after they are placed on the
  queue, but before this fix that "boost" wasn't used to determine the
  highest priority thread that sleepq_signal() awoke.
- Fix up some comments.

Inspired by:	ups, bde
2004-11-05 20:19:58 +00:00
John Baldwin
0811d60abc - Make setting of IT_ENTROPY a bit simpler in ithread_update().
- Tweak the updating of the ithread name in ithread_update() so that the
  '+' and '*' characters for device names that were too short only get
  added at the end after as many device names as possible were fit into
  the allocated space.  Prior to this, some long devices would result
  in '+' chars showing up between two different devices rather than at the
  end.
2004-11-05 19:11:24 +00:00
Peter Wemm
0de3e7280f Restrict the sched_bind to cpu 0 to i386 and amd64 for now. I forgot that
alpha still doesn't use logical cpu id's.
2004-11-05 19:00:23 +00:00
Peter Wemm
20e25d7de5 Bind to cpu0 for boot() processing. (Note this is reboot, not startup)
This means we'll always call the event hooks, device_shutdown etc on the
BSP and theoretically means we can de-cruftify the cpu_reset_proxy stuff.
2004-11-05 18:29:10 +00:00
Alan Cox
1ac60dbce9 Two changes to vm_pgmoveco():
- Eliminate an initialized but unused variable.
 - Eliminate an unnecessary call to clear the page's PG_BUSY flag.  (The
   call to vm_page_rename() already clears the page's PG_BUSY flag through
   its call to vm_page_remove().)
2004-11-05 06:52:29 +00:00
David Xu
cefe021b6c Don't forget to turn off P_SINGLE_BOUNDARY for thread_single(SINGLE_EXIT),
otherwise a threaded process which calls execv() will hang in kernel and
may can not be killed!
2004-11-04 22:13:16 +00:00
Poul-Henning Kamp
6e67e2a710 Retire b_magic now, we have the bufobj containing the same hint. 2004-11-04 09:48:18 +00:00
Poul-Henning Kamp
9f7a3028d5 Change buf->b_object to buf->b_bufobj->bo_object
some whitespace fixes.
2004-11-04 09:06:54 +00:00
Poul-Henning Kamp
9bc4d9a495 whitespace 2004-11-04 08:25:52 +00:00
Poul-Henning Kamp
c569065139 Remove buf->b_dev field. 2004-11-04 07:59:57 +00:00
John Baldwin
c957c14d05 Revert most of 1.109. Although it improved the situation on one particular
motherboard, in practice the changes resulted in many false positives for
heavy network loads, etc. resulting in poor performance.  Also, the
motherboard referenced in the 1.109 log has other problems and simply does
not seem to work with the APIC enabled even with the changes in 1.109.  The
correct fix for that board seems to be to not use the APIC at all.  One
thing kept from 1.109 is that throttled interrupts are now effectively
polled on every clock tick rather than just 10 times per second.

MFC after:	1 month
Tested by:	Shunsuke SHINOMIYA shino at fornext dot org
2004-11-03 22:11:20 +00:00
Poul-Henning Kamp
e0b687d33b Always initialize bo_private along with bo_ops in getnewvnode().
Spotted by:	tegge
2004-11-03 21:09:23 +00:00
Alan Cox
d19ef81437 The synchronization provided by vm object locking has eliminated the
need for most calls to vm_page_busy().  Specifically, most calls to
vm_page_busy() occur immediately prior to a call to vm_page_remove().
In such cases, the containing vm object is locked across both calls.
Consequently, the setting of the vm page's PG_BUSY flag is not even
visible to other threads that are following the synchronization
protocol.

This change (1) eliminates the calls to vm_page_busy() that
immediately precede a call to vm_page_remove() or functions, such as
vm_page_free() and vm_page_rename(), that call it and (2) relaxes the
requirement in vm_page_remove() that the vm page's PG_BUSY flag is
set.  Now, the vm page's PG_BUSY flag is set only when the vm object
lock is released while the vm page is still in transition.  Typically,
this is when it is undergoing I/O.
2004-11-03 20:17:31 +00:00
Poul-Henning Kamp
51f83da622 Restore TTYDEF_LFLAG to set echo bits. 2004-11-03 19:16:55 +00:00
Poul-Henning Kamp
f71a47143f Don't print the singularly unhelpful message:
unknown: not probled (disabled)

During verbose boot.
2004-11-03 09:06:45 +00:00
Robert Watson
aae2782bff Acquire the accept mutex in soabort() before calling sotryfree(), as
that is now required.

RELENG_5_3 candidate.

Foot provided by:	Dikshie <dikshie at ppk dot itb dot ac dot id>
2004-11-02 17:15:13 +00:00
John Baldwin
d39d4a6e64 - Change the ddb paging "support" to use a variable (db_lines_per_page) to
control the number of lines per page rather than a constant.  The variable
  can be examined and changed in ddb as '$lines'.  Setting the variable to
  0 will effectively turn off paging.
- Change db_putchar() to force out pending whitespace before outputting
  newlines and carriage returns so that one can rub out content on the
  current line via '\r     \r' type strings.
- Change the simple pager to rub out the --More-- prompt explicitly when
  the routine exits.
- Add some aliases to the simple pager to make it more compatible with
  more(1): 'e' and 'j' do a single line.  'd' does half a page, and
  'f' does a full page.

MFC after:	1 month
Inspired by:	kris
2004-11-01 22:15:15 +00:00
Dag-Erling Smørgrav
b0e1e474f7 Add TUNABLE_LONG and TUNABLE_ULONG, and use the latter for the
hw.pci.host_mem_start tunable.  Add comments to TUNABLE_INT and
TUNABLE_QUAD recommending against their use.

MFC after:	3 weeks
2004-10-31 15:50:33 +00:00
Pawel Jakub Dawidek
7579614b6d Don't treat # as a comment in interpreter specification line.
This is magic and no other operating system do so (i.e. Solaris, Tru64,
Linux, AIX, HP-UX, Irix, MacOS X, NetBSD).

Discussed on:	current@
Reported by:	S³awek ¯ak <zaks@prioris.mini.pw.edu.pl>
2004-10-31 11:12:59 +00:00
Robert Watson
1e4cadcb14 Disable use of synchronization early in the boot by the MAC Framework;
for modules linked into the kernel or loaded very early, panics will
result otherwise, as the CV code it calls will panic due to its use
of a mutex before it is initialized.
2004-10-30 14:20:59 +00:00
Jeff Roberson
0516c8dd4a - When choosing a thread on the run queue, check to see if its nice is
outside of the nice threshold due to a recently awoken thread with a
   lower nice value.  This further reduces the amount of time a positively
   niced thread gets while running in conjunction with a workload that has
   many short sleeps (ie buildworld).
2004-10-30 12:19:15 +00:00
Jeff Roberson
6bd0c7fd53 - In sched_prio() check to see if the kse is assigned to a runq as the
check for TD_ON_RUNQ() no longer means the thread is really on a run-
   queue.  I suspect this state should be re-evaluated as it must mean
   something else now.  This fixes ULE+KSE+PREEMPTION on UP x86.
2004-10-30 07:35:53 +00:00
Alfred Perlstein
90d75f785a Allow kill -9 to kill processes stuck in procfs STOPEVENTs. 2004-10-30 02:56:22 +00:00
Poul-Henning Kamp
996b2c82ca Loose vfs_mountedon() 2004-10-29 11:15:08 +00:00
Poul-Henning Kamp
c108bb741c Remove VOP_SPECSTRATEGY() from the system. 2004-10-29 10:59:28 +00:00
Poul-Henning Kamp
0cbda9dfd5 Remove the last call in the system to VOP_SPECSTRATEGY(): We can no
longer come through the VNODE layer to the disks since all the filesystems
now go via geom_vfs to GEOM.
2004-10-29 10:52:31 +00:00
Poul-Henning Kamp
e1f355fe4e Give the bufobj a private __bo_vnode for now to keep the syncer floating [1]
At some point later the syncer will unlearn about vnodes and the filesystems
method called by the syncer will know enough about what's in bo_private to
do the right thing.

[1] Ok, I know, but I couldn't resist the pun.
2004-10-29 09:33:32 +00:00
Alfred Perlstein
cd71c41476 Backout 1.291.
re doesn't seem to think this fixes:
  Desired features for 5.3-RELEASE "More truss problems"
2004-10-29 08:24:41 +00:00
Poul-Henning Kamp
6afb3b1c37 Give dev_strategy() an explict cdev argument in preparation for removing
buf->b-dev.

Put a bio between the buf passed to dev_strategy() and the device driver
strategy routine in order to not clobber fields in the buf.

Assert copyright on vfs_bio.c and update copyright message to canonical
text.  There is no legal difference between John Dysons two-clause
abbreviated BSD license and the canonical text.
2004-10-29 07:16:37 +00:00
Poul-Henning Kamp
c5995e45eb Lock bp->b_bufobj->b_object instead of bp->b_object 2004-10-28 08:38:46 +00:00
Robert Watson
df970488b3 Move the 'debug' sysctl tree under options SYSCTL_DEBUG. It generates
an inordinate amount of synchronous console output that is fairly
undesirable on slower serial console.  It's easily hit by accident
when frobbing other sysctls late at night.
2004-10-27 19:26:01 +00:00
Poul-Henning Kamp
20eba72f53 Move the syncer linkage from vnode to bufobj.
This is not quite a perfect separation: the syncer still think it knows
that everything is a vnode.
2004-10-27 08:05:02 +00:00
Poul-Henning Kamp
5b285effb0 Eliminate unnecessary KASSERT.
Eliminate a printf which would never tell us anything anyway because the
KASSERT would have triggered.
2004-10-27 06:47:00 +00:00
Poul-Henning Kamp
f6b855f60b Avoid using bp->b_vp when we already have the vnode by other means. 2004-10-27 06:45:52 +00:00
Maxim Konovalov
2899faf96b Fix a typo in a comparison appeared in rev. 1.125.
Submitted by:	JINMEI Tatuya
2004-10-27 05:37:58 +00:00
Alan Cox
df9f17c3df Synchronize access to the vm page's PG_BUSY flag using the containing vm
object's lock.  In the same place, eliminate unnecessary checks for a NULL
vm object pointer.
2004-10-27 02:05:00 +00:00
Poul-Henning Kamp
6e77a04170 The island council met and voted buf_prewrite() home.
Give ffs it's own bufobj->bo_ops vector and create a private strategy
routine, (currently misnamed for forwards compatibility), which is
just a copy of the generic bufstrategy routine except we call
softdep_disk_prewrite() directly instead of through the buf_prewrite()
indirection.

Teach UFS about the need for softdep_disk_prewrite() and call the
function directly in FFS.

Remove buf_prewrite() from the default bufstrategy() and from the
global bio_ops method vector.
2004-10-26 10:44:10 +00:00
Poul-Henning Kamp
5d9d81e7ea Put the I/O block size in bufobj->bo_bsize.
We keep si_bsize_phys around for now as that is the simplest way to pull
the number out of disk device drivers in devfs_open().  The correct solution
would be to do an ioctl(DIOCGSECTORSIZE), but the point is probably mooth
when filesystems sit on GEOM, so don't bother for now.
2004-10-26 07:39:12 +00:00
Alan Cox
cd9c0da805 Hold the lock on the containing vm object when calling
vm_page_sleep_if_busy().
2004-10-26 06:58:26 +00:00
Poul-Henning Kamp
9b7cc97f6c Remove unused si_bsize_best field from struct cdev. 2004-10-26 06:53:00 +00:00
Poul-Henning Kamp
1a1b280063 Get rid of the magic "stash" of cdev structures, we no longer call
make_dev() before malloc works.
2004-10-25 13:12:06 +00:00
Poul-Henning Kamp
e4fea39e9e Add delete_unrhdr() function.
It will fail fatally if all allocated numbers have not been returned first.
2004-10-25 12:27:03 +00:00
Poul-Henning Kamp
156cb26583 Loose the v_dirty* and v_clean* alias macros.
Check the count field where we just want to know the full/empty state,
rather than using TAILQ_EMPTY() or TAILQ_FIRST().
2004-10-25 09:14:03 +00:00
Poul-Henning Kamp
ee1d0eb330 Remove vnode->v_bsize. This was a dead-end. 2004-10-25 07:50:59 +00:00
Alan Cox
a50b705403 Use VM_ALLOC_NOBUSY to eliminate vm_page_wakeup() calls and the acquisition
and release of the global page queues lock required to make the call.

Remove GIANT_REQUIRED from vm_hold_free_pages().  All of its VM operations
are properly synchronized.
2004-10-25 06:34:14 +00:00
Poul-Henning Kamp
4dcd0ac4cf Collapse vnode->v_object and buf->b_object into bufobj->bo_object. 2004-10-25 06:02:57 +00:00
Robert Watson
ae1e5c5d8d Move from using the socket reference count to the file reference
count to prevent sockets from being garbage collected during
socket-specific system calls.  This is the same approach used in
most VFS-specific system calls, as well as generic file descriptor
system calls such as read() and write().

To do this, add a utility function getsock(), which is logically
identical to getvnode() used for the same purpose in VFS.  Unlike
fgetsock(), it returns with the file reference count elevated, but
no bump of the socket reference count.  Replace matching calls to
fputsock() with fdrop().

This change is made to all socket system calls other than
sendfile() and accept(), but the approach should be applicable to
those system calls also.

This shaves about four mutex operations off of each of these
system calls, including send() and recv() variants, adding about
1% to pps on minimal UDP packets for UP using netblast, and 4% on
SMP.

Reviewed by:	pjd
2004-10-24 23:45:01 +00:00
Alan Cox
01ad40dac5 Use VM_ALLOC_NOBUSY instead of calling vm_page_wakeup(). 2004-10-24 20:09:59 +00:00
Poul-Henning Kamp
b792bebeea Move the buffer method vector (buf->b_op) to the bufobj.
Extend it with a strategy method.

Add bufstrategy() which do the usual VOP_SPECSTRATEGY/VOP_STRATEGY
song and dance.

Rename ibwrite to bufwrite().

Move the two NFS buf_ops to more sensible places, add bufstrategy
to them.

Add inlines for bwrite() and bstrategy() which calls through
buf->b_bufobj->b_ops->b_{write,strategy}().

Replace almost all VOP_STRATEGY()/VOP_SPECSTRATEGY() calls with bstrategy().
2004-10-24 20:03:41 +00:00
Poul-Henning Kamp
9197ce2ee5 Add a new per-thread private flag: TDP_GEOM.
This flag gets set whenever the thread posts an event on the GEOM
event queue, and if the flag is set when the thread is prepared
to return to userland from the kernel, g_waitidle() will be called
to make sure that the posted events have completed.

This can replace an insufficient number of g_waitidle() calls in
various other places, and has the advantage of being failsafe:  Any
system call which does a VOP_OPEN()/VOP_CLOSE will now correctly
wait for any geom events it posted as part of spoils or tastes.

Assert that topology and Giant is not held in g_waitidle().
2004-10-23 20:49:17 +00:00
Poul-Henning Kamp
186e51cb30 Drop Giant around the call to g_waitidle().
This is necessary to allow any geom events which need it to pick up Giant.
2004-10-23 20:21:05 +00:00
Robert Watson
299b4e7fa6 Rebuild from syscalls.master:1.178. 2004-10-23 20:01:32 +00:00
Robert Watson
3e8c244949 Add system call place-holders for the following system calls
implementing Sun's BSM Audit API on FreeBSD:

  audit()
  auditon()
  getauid()
  setauid()
  getaudit()
  setaudit()
  getaudit_addr()
  setaudit_addr()
  auditctl()

Submitted by:	Wayne Salamon <wsalamon at computer dot org>
Obtained from:	TrustedBSD Project
2004-10-23 20:00:43 +00:00
Andre Oppermann
3a82a5451c socreate() does an early abort if either the protocol cannot be found,
or pru_attach is NULL.  With loadable protocols the SPACER dummy protocols
have valid function pointers for all methods to functions returning just
EOPNOTSUPP.  Thus the early abort check would not detect immediately that
attach is not supported for this protocol.  Instead it would correctly
get the EOPNOTSUPP error later on when it calls the protocol specific
attach function.

Add testing against the pru_attach_notsupp() function pointer to the
early abort check as well.
2004-10-23 19:06:43 +00:00
Andre Oppermann
480fa3f985 Aquire GIANT in pf_proto_[un]register() before manipulating the protosw. 2004-10-23 18:52:06 +00:00
David Xu
c283653201 Remove P_STOPPED_TRACE bit if debugger dies without a chance to
detach debugged process.
2004-10-23 11:20:26 +00:00
Robert Watson
857a600580 Add an annotation to the comment for sysv_ipc.c to indicate that the
MAC Framework doesn't require checks in ipcperm() because checks
relating to System V IPC will be performed in individual IPC
implementations.
2004-10-22 12:12:40 +00:00
Robert Watson
397b3428eb In osethostname(), don't need to call suser() directly as
userland_sysctl() will perform all necessary privilege checks for
the caller.
2004-10-22 12:10:50 +00:00
Robert Watson
b0e86f6ac2 When MAC is enabled, warn if getnewvnode() is asked to produce a vnode
without a mountpoint.  In this scenario, there's no useful source for
a label on the vnode, since we can't query the mountpoint for the
labeling strategy or default label.
2004-10-22 11:04:58 +00:00
Poul-Henning Kamp
ff7c5a4880 Alas, poor SPECFS! -- I knew him, Horatio; A filesystem of infinite
jest, of most excellent fancy: he hath taught me lessons a thousand
times; and now, how abhorred in my imagination it is! my gorge rises
at it.  Here were those hacks that I have curs'd I know not how
oft.  Where be your kludges now? your workarounds? your layering
violations, that were wont to set the table on a roar?

Move the skeleton of specfs into devfs where it now belongs and
bury the rest.
2004-10-22 09:59:37 +00:00
Poul-Henning Kamp
494eb176e7 Add b_bufobj to struct buf which eventually will eliminate the need for b_vp.
Initialize b_bufobj for all buffers.

Make incore() and gbincore() take a bufobj instead of a vnode.

Make inmem() local to vfs_bio.c

Change a lot of VI_[UN]LOCK(bp->b_vp) to BO_[UN]LOCK(bp->b_bufobj)
also VI_MTX() to BO_MTX(),

Make buf_vlist_add() take a bufobj instead of a vnode.

Eliminate other uses of bp->b_vp where bp->b_bufobj will do.

Various minor polishing: remove "register", turn panic into KASSERT,
use new function declarations, TAILQ_FOREACH_SAFE() etc.
2004-10-22 08:47:20 +00:00
Poul-Henning Kamp
a76d8f4ec9 Move the VI_BWAIT flag into no bo_flag element of bufobj and call it BO_WWAIT
Add bufobj_wref(), bufobj_wdrop() and bufobj_wwait() to handle the write
count on a bufobj.  Bufobj_wdrop() replaces vwakeup().

Use these functions all relevant places except in ffs_softdep.c where
the use if interlocked_sleep() makes this impossible.

Rename b_vnbufs to b_bobufs now that we touch all the relevant files anyway.
2004-10-21 15:53:54 +00:00
Poul-Henning Kamp
1bca607b9f Add BO_* macros parallel to VI_* macros for manipulating the bo_mtx.
Initialize the bo_mtx when we allocate a vnode i getnewvnode() For
now we point to the vnodes interlock mutex, that retains the exact
same locking sematics.

Move v_numoutput from vnode to bufobj.  Add renaming macro to
postpone code sweep.
2004-10-21 14:42:31 +00:00
Poul-Henning Kamp
67647b2312 Polish vtruncbuf() to improve readability and style a bit. 2004-10-21 14:13:54 +00:00
Poul-Henning Kamp
e163395619 Simplify buf_vlist_remove().
Now that we have encapsulated the splaytree related information
into a structure we can eliminate the half of this function.
2004-10-21 13:48:50 +00:00
Stephan Uphoff
f742a1edcd Zero terminate empty sting in kdb_sysctl_available.
Approved by:    sam (mentor)
MFC after: 1 week
2004-10-21 01:11:25 +00:00
Alan Cox
0f777d7d9b Modify the vm object locking in do_sendfile() so that the containing object
is locked when vm_page_io_finish() is called on a page.  This is to satisfy
a new, post-RELENG_5 assertion in vm_page_io_finish().  (I am in the
process of transitioning the responsibility for synchronizing access to
various fields/flags on the page from the global page queues lock to the
per-object lock.)

Tripped over by: obrien@
2004-10-20 17:44:40 +00:00
Andre Oppermann
312c75c362 Support for dynamically loadable and unloadable protocols within existing protocol
families.

The protosw[] array of any particular protocol family ("domain") is of fixed size
defined at compile time.  This made it impossible to dynamically add or remove any
protocols to or from it.  We work around this by introducing so called SPACER's
which are embedded into the protosw[] array at compile time.  The SPACER's have
a special protocol number (32767) to indicate the fact that they are SPACER's but
are otherwise NULL.  Only as many protocols can be dynamically loaded as SPACER's
are provided in the protosw[] structure.

The pr_usrreqs structure is treated more special and contains pointers to dummy
functions only returning EOPNOTSUPP.  This is needed because the use of those
functions pointers is usually not checked within the kernel because until now it
was assumed to be a valid function pointer.  Instead of fixing all potential
callers we just return a proper error code.

Two new functions provide a clean API to register and unregister a protocol.  The
register function expects a pointer to a valid and complete struct protosw including
a pointer to struct pru_usrreqs provided by the caller.  Upon successful registration
the pr_init() function will be called to finish initialization of the protocol.  The
unregister function restores the SPACER in place of the protocol again.  It is the
responseability of the caller to ensure proper closing of all sockets and freeing
of memory allocation by the unloading protocol.

 sys/protosw.h

  o Define generic PROTO_SPACER to be 32767
  o Prototypes for all pru_*_notsupp() functions
  o Prototypes for pf_proto_[un]register() functions

 kern/uipc_domain.c

  o Global struct pr_usrreqs nousrreqs containing valid pointers to the
    pru_*_notsupp() functions
  o New functions pf_proto_[un]register()

 kern/uipc_socket2.c

  o New functions bodies for all pru_*_notsupp() functions
2004-10-19 15:13:30 +00:00
Robert Watson
81158452be Push acquisition of the accept mutex out of sofree() into the caller
(sorele()/sotryfree()):

- This permits the caller to acquire the accept mutex before the socket
  mutex, avoiding sofree() having to drop the socket mutex and re-order,
  which could lead to races permitting more than one thread to enter
  sofree() after a socket is ready to be free'd.

- This also covers clearing of the so_pcb weak socket reference from
  the protocol to the socket, preventing races in clearing and
  evaluation of the reference such that sofree() might be called more
  than once on the same socket.

This appears to close a race I was able to easily trigger by repeatedly
opening and resetting TCP connections to a host, in which the
tcp_close() code called as a result of the RST raced with the close()
of the accepted socket in the user process resulting in simultaneous
attempts to de-allocate the same socket.  The new locking increases
the overhead for operations that may potentially free the socket, so we
will want to revise the synchronization strategy here as we normalize
the reference counting model for sockets.  The use of the accept mutex
in freeing of sockets that are not listen sockets is primarily
motivated by the potential need to remove the socket from the
incomplete connection queue on its parent (listen) socket, so cleaning
up the reference model here may allow us to substantially weaken the
synchronization requirements.

RELENG_5_3 candidate.

MFC after:	3 days
Reviewed by:	dwhite
Discussed with:	gnn, dwhite, green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-18 22:19:43 +00:00
Poul-Henning Kamp
95bc568977 Add new function ttyinitmode() which sets our systemwide default
modes on a tty structure.

Both the ".init" and the current settings are initialized allowing
the function to be used both at attach and open time.

The function takes an argument to decide if echoing should be enabled.
Echoing should not be enabled for regular physical serial ports
unless they are consoles, in which case they should be configured
by ttyconsolemode() instead.

Use the new function throughout.
2004-10-18 21:51:27 +00:00
Scott Long
b96741f410 If a process needs to be swapped in, wakeup the swapper from within
critical_exit as the process is getting scheduled to run.  This is subotimal
but for now avoid the LOR between the scheduler and the sleepq systems.
This is a 5.3 candidate.

Submitted by: davidxu
MFC After: 3 days
2004-10-16 06:38:22 +00:00
Poul-Henning Kamp
33da4e5bd8 Make pty's always come up in echo mode. 2004-10-15 09:03:07 +00:00
Poul-Henning Kamp
fffc55152b Add missing chunk of code to enforce the lock-bits of termios.
This solves the problem where serial consoles suddenly required
DCD to be asserted.

Reported by:	Randy Bush <randy@psg.com>
2004-10-14 18:30:24 +00:00
Nate Lawson
66ae9f6384 Update flags patch for the !ISA case.
* Get flags first, in case there is no devclass.
* Reset flags after each probe in case the next driver has no hints so it
  doesn't inherit the old ones.
* Set them again before the winning probe.

Tested ok both with and without ACPI for ISA device flags.

Reviewed by:	imp
MFC after:	1 day
2004-10-14 17:14:56 +00:00
John-Mark Gurney
583ef6b6d2 /me gets the wrong patch out of the pr :(
/me had the write patch w/o comments on his test system.

Pointed out by:	kuriyama and ache
Pointy hat to:	jmg
2004-10-14 03:26:50 +00:00
Stephan Uphoff
7c71b6453a Fix maybe_preempt_in_ksegrp for !SMP.
Tested   by: tegge
Reviewed by: julian
Approved by: sam (mentor)
MFC after: 3 days
2004-10-13 22:07:04 +00:00
John-Mark Gurney
d46316e8f9 fix a bug where signal events didn't set the flags for attach/detach..
PR:		72234
MFC after:	2 days
2004-10-13 20:55:19 +00:00
Nate Lawson
6f857c4b9f Set flags for devices before probing them. In the non-ISA case, flags set
via hints were not getting passed to the child.

PR:		kern/72489
MFC after:	1 day
2004-10-13 07:10:41 +00:00
Poul-Henning Kamp
43c72732aa Don't call driver close unless we have one. 2004-10-12 21:40:41 +00:00
Poul-Henning Kamp
13e7430fde Make !SMP kernels compile, and as far as I can tell, work again. 2004-10-12 20:57:37 +00:00
John Baldwin
ebcfea8764 Whitespace fix. 2004-10-12 19:36:00 +00:00
John Baldwin
2ff0e645d1 Refine the turnstile and sleep queue interfaces just a bit:
- Add a new _lock() call to each API that locks the associated chain lock
  for a lock_object pointer or wait channel.  The _lookup() functions now
  require that the chain lock be locked via _lock() when they are called.
- Change sleepq_add(), turnstile_wait() and turnstile_claim() to lookup
  the associated queue structure internally via _lookup() rather than
  accepting a pointer from the caller.  For turnstiles, this means that
  the actual lookup of the turnstile in the hash table is only done when
  the thread actually blocks rather than being done on each loop iteration
  in _mtx_lock_sleep().  For sleep queues, this means that sleepq_lookup()
  is no longer used outside of the sleep queue code except to implement an
  assertion in cv_destroy().
- Change sleepq_broadcast() and sleepq_signal() to require that the chain
  lock is already required.  For condition variables, this lets the
  cv_broadcast() and cv_signal() functions lock the sleep queue chain lock
  while testing the waiters count.  This means that the waiters count
  internal to condition variables is no longer protected by the interlock
  mutex and cv_broadcast() and cv_signal() now no longer require that the
  interlock be held when they are called.  This lets consumers of condition
  variables drop the lock before waking other threads which can result in
  fewer context switches.

MFC after:	1 month
2004-10-12 18:36:20 +00:00
John Baldwin
c7836018ea Add a WITNESS_WARN() to uiomove() to whine if locks are held when this
function is called.

MFC after:	1 month
2004-10-12 18:27:14 +00:00
Stephan Uphoff
c6a08cf2d7 Directly modifying the priority of a thread that may be on the runqueue
can break the sorting order of the ksegp run queue.

Tested   by: pho
Reviewed by: jhb, julian
Approved by: sam (mentor)
MFC: ASAP
2004-10-12 16:31:23 +00:00
Stephan Uphoff
84f9d4b137 Prevent preemption in slot_fill.
Implement preemption between threads in the same ksegp in out of slot
situations to prevent priority inversion.

Tested   by: pho
Reviewed by: jhb, julian
Approved by: sam (mentor)
MFC: ASAP
2004-10-12 16:30:20 +00:00
Stephan Uphoff
b9a80acadb Force MUTEX_WAKE_ALL.
A race condition in single thread wakeup may break priority inheritance.

Tested   by: pho
Reviewed by: jhb,julian
Approved by: sam (mentor)
MFC: ASAP
2004-10-12 16:28:18 +00:00
Poul-Henning Kamp
a1bd71b260 Add missing zero flag arguments to calls to userland_sysctl() 2004-10-12 07:49:15 +00:00
Peter Wemm
a7bc3102c4 Put on my peril sensitive sunglasses and add a flags field to the internal
sysctl routines and state.  Add some code to use it for signalling the need
to downconvert a data structure to 32 bits on a 64 bit OS when requested by
a 32 bit app.

I tried to do this in a generic abi wrapper that intercepted the sysctl
oid's, or looked up the format string etc, but it was a real can of worms
that turned into a fragile mess before I even got it partially working.

With this, we can now run 'sysctl -a' on a 32 bit sysctl binary and have
it not abort.  Things like netstat, ps, etc have a long way to go.

This also fixes a bug in the kern.ps_strings and kern.usrstack hacks.
These do matter very much because they are used by libc_r and other things.
2004-10-11 22:04:16 +00:00
Gleb Smirnoff
366538f251 Rename _m_tag_free() to m_tag_free_default() and make it non-static.
Approved by:	sam
2004-10-11 18:40:19 +00:00
Robert Watson
cc34aa2094 Add entropy harvest mutex to hard-coded spin lock witness lock order,
remove previous entropy harvesting mutex names as they are no longer
present.  Commit to this file was ommitted when randomdev_soft.c:1.5
was made.

Feet shot:	Robert Huff <roberthuff at rcn dot com>
2004-10-11 08:26:18 +00:00
Robert Watson
35b260cd69 Rework sofree() logic to take into account a possible race with accept().
Sockets in the listen queues have reference counts of 0, so if the
protocol decides to disconnect the pcb and try to free the socket, this
triggered a race with accept() wherein accept() would bump the reference
count before sofree() had removed the socket from the listen queues,
resulting in a panic in sofree() when it discovered it was freeing a
referenced socket.  This might happen if a RST came in prior to accept()
on a TCP connection.

The fix is two-fold: to expand the coverage of the accept mutex earlier
in sofree() to prevent accept() from grabbing the socket after the "is it
really safe to free" tests, and to expand the logic of the "is it really
safe to free" tests to check that the refcount is still 0 (i.e., we
didn't race).

RELENG_5 candidate.

Much discussion with and work by:	green
Reported by:	Marc UBM Bocklet <ubm at u-boot-man dot de>
Reported by:	Vlad <marchenko at gmail dot com>
2004-10-11 08:11:26 +00:00
Gleb Smirnoff
8c4a75be4a Revert last commit since it breaks API.
Requested by:	sam
2004-10-10 09:16:48 +00:00
Julian Elischer
042b7b1af0 Don't release the slot twice.. sched_rem() has already done it.
Submitted by:	stephan uphoff (ups at tree dot com)
MFC after:	3 days
2004-10-10 05:19:22 +00:00
Julian Elischer
9b036bdf5a Remove duplicate line. 2004-10-10 05:07:43 +00:00
Gleb Smirnoff
42c5607501 Remove inlined m_tag_free(). Rename _m_tag_free() to m_tag_free()
and make it visible (same way as in OpenBSD). Describe usage in manpage.

This change is useful for creating custom free methods, which
call default free method at their end.

While here, make malloc declaration for mbuf tags more informative.

Approved by:	julian (mentor), sam
MFC after:	1 month
2004-10-09 13:25:19 +00:00
Brian Feldman
41f57cbc8d Don't "implicitly order all sleep locks before spin locks" in witness
when the spin lock in question isn't -- it's the critical_enter() that
KDB set.  No more panic in DDB for console -> syscons -> tty -> knote
operations.
2004-10-09 08:16:37 +00:00
David Xu
84e0b075f6 Add an execve command for kse_thr_interrupt to allow libpthread to
restore signal mask correctly, this is required by POSIX.

Reviewed by: deischen
2004-10-07 13:50:10 +00:00
David Xu
ebfcca3d61 Regen to unbreak world.
Pointy hat to: mtm
2004-10-07 01:09:46 +00:00
David Schultz
cda5aba4b9 Back out rev 1.240; it is unnecessary. In particular,
p1 == curthread, so _PHOLD(p1) will not have to block
to swap in p1.

Noticed by:	jhb
2004-10-06 23:53:49 +00:00
Mike Makonnen
401901ac43 Close a race between a thread exiting and the freeing of it's stack.
After some discussion the best option seems to be to signal the thread's
death from within the kernel. This requires that thr_exit() take an
argument.

Discussed with: davidxu, deischen, marcel
MFC after: 3 days
2004-10-06 14:23:00 +00:00
David Xu
195f5806e4 Close a race between thr_create and sysctl -w, the thr_scope_sys could
be changed when thr_create is running, and we tested it for several times.
2004-10-06 02:29:19 +00:00
Greg Lehey
57259f2864 vtryrecycle: Don't rely on type VBAD alone to mean that we don't need
to clean the vnode.  If v_data is set, we still need to
	     clean it.  This code change should catch all incidents of
	     the previous commit (INVARIANTS only).
2004-10-06 02:09:59 +00:00
Greg Lehey
f2154b33d2 getnewvnode: Weaken the panic "cleaned vnode isn't" to a warning.
Discussion: this panic (or waning) only occurs when the kernel is
  compiled with INVARIANTS.  Otherwise the problem (which means that
  the vp->v_data field isn't NULL, and represents a coding error and
  possibly a memory leak) is silently ignored by setting it to NULL
  later on.

  Panicking here isn't very helpful: by this time, we can only find
  the symptoms.  The panic occurs long after the reason for "not
  cleaning" has been forgotten; in the case in point, it was the
  result of severe file system corruption which left the v_type field
  set to VBAD.  That issue will be addressed by a separate commit.
2004-10-06 02:06:11 +00:00
David Xu
e0cfeb44a8 Restore some code removed in revision 1.193 and 1.194, julian said
he'd like to keep these code.
2004-10-06 00:49:41 +00:00
David Xu
906ac69d08 In original kern_execve() code, at the start of the function, it forces
all other threads to suicide, problem is execve() could be failed, and
a failed execve() would change threaded process to unthreaded, this side
effect is unexpected.
The new code introduces a new single threading mode SINGLE_BOUNDARY, in
the mode, all threads should suspend themself at user boundary except
the singler. we can not use SINGLE_NO_EXIT because we want to start from
a clean state if execve() is successful, suspending other threads at unknown
point and later resuming them from there and forcing them to exit at user
boundary may cause the process to start from a dirty state. If execve() is
successful, current thread upgrades to SINGLE_EXIT mode and forces other
threads to suicide at user boundary, otherwise, other threads will be resumed
and their interrupted syscall will be restarted.

Reviewed by: julian
2004-10-06 00:40:41 +00:00
Julian Elischer
f8135176c9 Fix whitespace botch that only showed up in the commit message diff :-/
MFC after:	4 days
2004-10-05 22:14:02 +00:00
Julian Elischer
fcb7c67b7b Slight cleanup in the single threading code.
MFC after:	4 days
2004-10-05 22:05:25 +00:00
Julian Elischer
c20c691bed When preempting a thread, put it back on the HEAD of its run queue.
(Only really implemented in 4bsd)

MFC after:	4 days
2004-10-05 22:03:10 +00:00
Julian Elischer
c5c3fb335f Oops. left out part of the diff.
MFC after:	4 days
2004-10-05 21:26:27 +00:00
Julian Elischer
d39063f20d Use some macros to trach available scheduler slots to allow
easier debugging.

MFC after:	4 days
2004-10-05 21:10:44 +00:00
Julian Elischer
6f23adbc11 light rearrangement of some code to get some locking
more correct

MFC after:	4 days
2004-10-05 20:48:16 +00:00
Julian Elischer
e5bedcef92 Break out to a separate function, the code to revert a multithreaded
process back to officially being a non-threaded program.

MFC after:	4 days
2004-10-05 20:39:26 +00:00
John Baldwin
78c85e8dfc Rework how we store process times in the kernel such that we always store
the raw values including for child process statistics and only compute the
system and user timevals on demand.

- Fix the various kern_wait() syscall wrappers to only pass in a rusage
  pointer if they are going to use the result.
- Add a kern_getrusage() function for the ABI syscalls to use so that they
  don't have to play stackgap games to call getrusage().
- Fix the svr4_sys_times() syscall to just call calcru() to calculate the
  times it needs rather than calling getrusage() twice with associated
  stackgap, etc.
- Add a new rusage_ext structure to store raw time stats such as tick counts
  for user, system, and interrupt time as well as a bintime of the total
  runtime.  A new p_rux field in struct proc replaces the same inline fields
  from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime).  A new p_crux
  field in struct proc contains the "raw" child time usage statistics.
  ruadd() has been changed to handle adding the associated rusage_ext
  structures as well as the values in rusage.  Effectively, the values in
  rusage_ext replace the ru_utime and ru_stime values in struct rusage.  These
  two fields in struct rusage are no longer used in the kernel.
- calcru() has been split into a static worker function calcru1() that
  calculates appropriate timevals for user and system time as well as updating
  the rux_[isu]u fields of a passed in rusage_ext structure.  calcru() uses a
  copy of the process' p_rux structure to compute the timevals after updating
  the runtime appropriately if any of the threads in that process are
  currently executing.  It also now only locks sched_lock internally while
  doing the rux_runtime fixup.  calcru() now only requires the caller to
  hold the proc lock and calcru1() only requires the proc lock internally.
  calcru() also no longer allows callers to ask for an interrupt timeval
  since none of them actually did.
- calcru() now correctly handles threads executing on other CPUs.
- A new calccru() function computes the child system and user timevals by
  calling calcru1() on p_crux.  Note that this means that any code that wants
  child times must now call this function rather than reading from p_cru
  directly.  This function also requires the proc lock.
- This finishes the locking for rusage and friends so some of the Giant locks
  in exit1() and kern_wait() are now gone.
- The locking in ttyinfo() has been tweaked so that a shared lock of the
  proctree lock is used to protect the process group rather than the process
  group lock.  By holding this lock until the end of the function we now
  ensure that the process/thread that we pick to dump info about will no
  longer vanish while we are trying to output its info to the console.

Submitted by:	bde (mostly)
MFC after:	1 month
2004-10-05 18:51:11 +00:00
John Baldwin
b85975277e Add a critical section in turnstile_unpend() from before dropping the
turnstile chain lock until after making all the awakened threads
runnable.  First, this fixes a priority inversion race.  Second, this
attempts to finish waking up all of the threads waiting on a turnstile
before doing a preemption.

Reviewed by:	Stephan Uphoff (who found the priority inversion race)
2004-10-05 18:00:30 +00:00
Pawel Jakub Dawidek
8d02a378aa Back out changes which were introduced to delay mounting root file system.
Those changes were made on gmirror needs, but now gmirror handles this
by itself.
2004-10-05 11:26:43 +00:00
David Xu
b3a4fb14b3 Use scheduler api to adjust thread priority. 2004-10-05 09:10:30 +00:00
Warner Losh
14889b4229 Add taskqueue_drain. This waits for the specified task to finish, if
running, or returns.  The calling program is responsible for making sure
that nothing new is enqueued.

# man page coming soon.
2004-10-05 04:16:01 +00:00
Poul-Henning Kamp
37abb77f25 Change the perfectly precise message
printf("No buffers busy after final sync");
to
       printf("All buffers synced.");
in order to not leave the users wondering if there should be.
2004-10-04 13:13:23 +00:00
Julian Elischer
c233d032d2 Another case where we need to guard against a partially
constructed process.

Submitted by: Stephan Uphoff ( ups at tree.com	)
MFC after:	3 days
2004-10-04 06:45:48 +00:00
Julian Elischer
a9b5dc7d6d Always strt out with an initilalised ksegrp structure.
MFC after:	3 days
2004-10-03 20:06:11 +00:00
David Xu
482d099c50 Don't bother to turn off other P_STOPPED bits for SIGKILL, doing
so would cause kernel to produce an unkillable process in some cases,
especially, P_STOPPED_SINGLE has a singling thread, turning off the
bit would mess the state.
2004-10-03 13:23:49 +00:00
Alan Cox
86dac448f2 Add a SOCKBUF_LOCK() to a rarely executed path in do_sendfile(). 2004-10-02 05:37:47 +00:00
Alfred Perlstein
50434413b0 Clear a process's procfs trace points upon delivery of SIGKILL.
MT5 candidate. (Desired features for 5.3-RELEASE "More truss problems")
2004-10-01 14:15:20 +00:00
Poul-Henning Kamp
ba2851254f Fix a LOR relating to freeing cdevs. 2004-10-01 06:33:39 +00:00
Alfred Perlstein
576c004fb9 cover soreadable and sowriteable with the corresponding socketbuffer locks. 2004-10-01 05:54:06 +00:00
David Schultz
299bc7367d Avoid calling _PHOLD(p1) with p2's lock held, since _PHOLD()
may block to swap in p1.  Instead, call _PHOLD earlier, at a
point where the only lock held happens to be p1's.
2004-10-01 05:01:29 +00:00
John Baldwin
4a2aa5d054 Fix a typo to fix the !DIAGNOSTIC build.
Submitted by:	many
2004-09-30 18:13:18 +00:00
Poul-Henning Kamp
0cd3cb9a15 Assign a global unit number for the tty slave devices (init/lock) using
the new subr_unit.c code.

For now assert Giant in ttycreate() and ttyfree().  It is not obvious that
it will ever pay off to lock these with anything else.
2004-09-30 10:38:48 +00:00
Poul-Henning Kamp
f6bde1fd05 Add a new API for allocating unit number (-like) resources.
Allocation is always lowest free unit number.

A mixed range/bitmap strategy for maximum memory efficiency.  In
the typical case where no unit numbers are freed total memory usage
is 56 bytes on i386.

malloc is called M_WAITOK but no locking is provided (yet).  A bit of
experience will be necessary to determine the best strategy.  Hopefully
a "caller provides locking" strategy can be maintained, but that may
require use of M_NOWAIT allocation and failure handling.

A userland test driver is included.
2004-09-30 07:04:03 +00:00
Brian Feldman
1abf2c3678 Account for alias devices when tearing them down in destroy_dev() so we
don't panic on a NULL cdev->si_devsw.
2004-09-29 16:38:38 +00:00
Dag-Erling Smørgrav
479439b4fe Turn VM_KMEM_SIZE_MAX and VM_KMEM_SIZE_SCALE into tunables.
MFC after:	3 days
2004-09-29 14:21:40 +00:00
Poul-Henning Kamp
cf287576e5 Add functions to create and free the "tty-ness" of a serial port in a
generic way.  This code will allow a similar amount of code to be
removed from most if not all serial port drivers.

	Add generic cdevsw for tty devices.

	Add generic slave cdevsw for init/lock devices.

	Add ttypurge function which wakes up all know generic sleep
	points in the tty code, and calls into the hw-driver if it
	provides a method.

	Add ttycreate function which creates tty device and optionally
	cua device.  In both cases .init/.lock devices are created
	as well.

	Change ttygone() slightly to also call the hw driver provided
	purge routine.

	Add ttyfree() which will purge and destroy the cdevs.

	Add ttyconsole mode for setting console friendly termios
	on a port.
2004-09-28 19:33:49 +00:00
John-Mark Gurney
7b12509082 improve the mbuf m_print function.. Only pull length from pkthdr if there
is one, detect mbuf loops and stop, add an extra arg so you can only print
the first x bytes of the data per mbuf (print all if arg is -1), print
flags using %b (bitmask)...

No code in the tree appears to use m_print, and it's just a maner of adding
-1 as an additional arg to m_print to restore original behavior..

MFC after:	4 days
2004-09-28 18:40:18 +00:00
Poul-Henning Kamp
961da2716b Give cluster_write() an explicit vnode argument.
In the future a struct buf will not automatically point out a vnode for us.
2004-09-27 19:14:10 +00:00
Poul-Henning Kamp
a5993c332a Used cached cdevsw pointer. 2004-09-27 06:34:30 +00:00
Poul-Henning Kamp
743cd76a73 Add cdevsw->d_purge() support.
This device method shall wake up any threads sleeping in the device driver
and make the depart the drivers code for good.
2004-09-27 06:18:25 +00:00