Commit Graph

1932 Commits

Author SHA1 Message Date
Sean Eric Fagan c5edb423c6 Add support for run-time configuration of core file names. In a nutshell,
you can specify the corefile name by using:

	sysctl -w kern.corefile="format"

where format is a pathname (relative or absolute -- default is "%N.core"),
with "%N" (process name), "%P" (process ID), and "%U" (user ID) formats.

Reviewed by:	Mike Smith, with strong requests by Julian :)
1998-07-08 06:38:39 +00:00
Julian Elischer 6deaf84b1f Catch a few corner cases where FreeBSD differs enough from BSD 4.4 to
confuse Soft updates..
Should solve several "dangling deps" panics.
1998-07-08 01:04:33 +00:00
Bruce Evans c4ebf24f6e Don't depend on gcc's feature of casting lvalues. 1998-07-07 04:36:23 +00:00
Bill Fenner dece5b6a43 Introduce (fairly hacky) workaround for odd TCP behavior with application
writes of size (100,208]+N*MCLBYTES.

The bug:
 sosend() hands each mbuf off to the protocol output routine as soon as it
 has copied it, in the hopes of increasing parallelism (see
  http://www.kohala.com/~rstevens/vanj.88jul20.txt ). This works well for
 TCP as long as the first mbuf handed off is at least the MSS.  However,
 when doing small writes (between MHLEN and MINCLSIZE), the transaction is
 split into 2 small MBUF's and each is individually handed off to TCP.
 TCP assumes that the first small mbuf is the whole transaction, so sends
 a small packet.  When the second small mbuf arrives, Nagle prevents TCP
 from sending it so it must wait for a (potentially delayed) ACK.  This
 sends throughput down the toilet.

The workaround:
 Set the "atomic" flag when we're doing small writes.  The "atomic" flag
 has two meanings:
 1. Copy all of the data into a chain of mbufs before handing off to the
    protocol.
 2. Leave room for a datagram header in said mbuf chain.
 TCP wants the first but doesn't want the second.  However, the second
 simply results in some memory wastage (but is why the workaround is a
 hack and not a fix).

The real fix:
 The real fix for this problem is to introduce something like a "requested
 transfer size" variable in the socket->protocol interface.  sosend()
 would then accumulate an mbuf chain until it exceeded the "requested
 transfer size".  TCP could set it to the TCP MSS (note that the
 current interface causes strange TCP behaviors when the MSS > MCLBYTES;
 nobody notices because MCLBYTES > ethernet's MTU).
1998-07-06 19:27:14 +00:00
Julian Elischer 596f8506ad fix braino from yesterdays' megacommit
Not sure of the result of it..
(may or may not effect anything) but it's fixed now.
(found by: comparing what cvsup sent back to me with what I tested..)
1998-07-05 20:33:18 +00:00
Julian Elischer f7ea2f55d1 There is no such thing any more as "struct bdevsw".
There is only cdevsw (which should be renamed in a later edit to deventry
or something). cdevsw contains the union of what were in both bdevsw an
cdevsw entries.  The bdevsw[] table stiff exists and is a second pointer
to the cdevsw entry of the device. it's major is in d_bmaj rather than
d_maj. some cleanup still to happen (e.g. dsopen now gets two pointers
to the same cdevsw struct instead of one to a bdevsw and one to a cdevsw).

rawread()/rawwrite() went away as part of this though it's not strictly
the same  patch, just that it involves all the same lines in the drivers.

cdroms no longer have write() entries (they did have rawwrite (?)).
tapes no longer have support for bdev operations.

Reviewed by: Eivind Eklund and Mike Smith
	Changes suggested by eivind.
1998-07-04 22:30:26 +00:00
Julian Elischer fd5d1124e2 VOP_STRATEGY grows an (struct vnode *) argument
as the value in b_vp is often not really what you want.
(and needs to be frobbed). more cleanups will follow this.
Reviewed by: Bruce Evans <bde@freebsd.org>
1998-07-04 20:45:42 +00:00
Poul-Henning Kamp 52f8e5d672 Hmm, braino in last commit. 1998-07-04 19:29:15 +00:00
Poul-Henning Kamp 0edd53d22a Change the sign on a race-condition, so that instead of ending up several
tens of milliseconds out in the future we end up the right place with
a subweeniesecond error.
1998-07-04 19:12:21 +00:00
Poul-Henning Kamp 3e5e083cb7 Update M_EXT support in m_copypacket().
PR:		7122
Reviewed by:	phk
Submitted by:	Castor Fu <castor@geocast.com>
Originally forgotten by:	julian
1998-07-03 08:36:48 +00:00
David Greenman e25169f239 Reset MNT_ASYNC flag if needed if unmount() should fail.
Submitted by:	Paul Saab <paul@mu.org>
1998-07-03 03:47:24 +00:00
Poul-Henning Kamp 6ca4ca2476 When we transfer time from one timecounter to the next, use nanouptime(),
not nanotime();  Otherwise we end up in 2026...

Fix the arg to dummy_get_timecount()
1998-07-02 21:35:02 +00:00
Poul-Henning Kamp 8cb5266728 Add 3 sysctl variables for future use by ps)1_ 1998-06-30 21:25:58 +00:00
Bruce Evans 673796a715 Nuked opt_defunct.h and kern_opt.c. config(8) now generates good enough
warnings about all unknown options.
1998-06-30 14:43:04 +00:00
Poul-Henning Kamp 67f4e2ed05 Add trailing newline to sys/syscall.mk so that diff doesn't choke on it. 1998-06-28 10:01:52 +00:00
David Greenman c87e2930e6 Added a sysctl variable kern.sugid_coredump for controlling coredump
behavior of setuid/setgid binaries that defaults to 0 (coredump disabled).
1998-06-28 08:37:45 +00:00
Poul-Henning Kamp c259b8dd2b Report the mode as the result of the VOP_GETATTR rather than the
vnodes type, they may not correspond.
1998-06-27 06:43:09 +00:00
Poul-Henning Kamp 7c281842e3 Remove isdisk() hacks. 1998-06-26 18:14:25 +00:00
Poul-Henning Kamp b62591052c Remove bdevsw_add(), change the only two users to use bdevsw_add_generic().
Extend cdevsw to be superset of bdevsw.
Remove non-functional bdev lkm support.
Teach wcd what the open() args mean.
1998-06-25 11:28:07 +00:00
Bruce Evans be160d60ab Removed unused includes. 1998-06-21 18:02:50 +00:00
Bruce Evans e5b19842ef Removed unused includes. 1998-06-21 14:53:44 +00:00
Bruce Evans df471779ea Round tickadj up. This prevents tickadj from being 0 when HZ > 500,
which makes adjtime(2) useless and confuses xntpd(8) into refusing
to start even when it would use the kernel PLL instead of adjtime().
The result is the same as recommended by tickadj(8), at least when
HZ divides 10^6.  Of course, you wouldn't want to actually use
adjtime() when HZ is large.  In the silly boundary case of HZ == 10^6,
tickadj == tick == 1 so the clock stops while adjtime() is active.
1998-06-21 12:22:35 +00:00
Bruce Evans 316bbd5c6f Converted add_interrupt_randomness() to take a `void *' arg. Rewrote
mmioctl() to fix hundreds of style bugs and a few error handling bugs
(don't check for superuser privilege for inappropriate ioctls, don't
check the input arg for the output-only MEM_RETURNIRQ ioctl, and don't
return EPERM for null changes).
1998-06-21 11:33:32 +00:00
Bruce Evans 9a2daf9190 Changed the type of an isa/general interrupt handler to take a
`void *' arg.  Fixed or hid most of the resulting type mismatches.
Handlers can now be updated locally (except for reworking their
global declarations in isa_device.h).
1998-06-18 15:32:09 +00:00
Bruce Evans f95ac73519 Use copyout() instead of bcopy() to copy the image to user space.
bcopy() caused panics under heavy paging (not quite as suspected -
the kernel stack seemed to get corrupted).

Fixed long lines.

Reviewed by:	phk
1998-06-16 14:36:40 +00:00
Doug Rabson b1bf661000 [Add missing files from previous commit]
Major changes to the generic device framework for FreeBSD/alpha:

* Eliminate bus_t and make it possible for all devices to have
  attached children.

* Support dynamically extendable interfaces for drivers to replace
  both the function pointers in driver_t and bus_ops_t (which has been
  removed entirely.  Two system defined interfaces have been defined,
  'device' which is mandatory for all devices and 'bus' which is
  recommended for all devices which support attached children.

* In addition, the alpha port defines two simple interfaces 'clock'
  for attaching various real time clocks to the system and 'mcclock'
  for the many different variations of mc146818 clocks which can be
  attached to different alpha platforms.  This eliminates two more
  function pointer tables in favour of the generic method dispatch
  system provided by the device framework.

Future device interfaces may include:

* cdev and bdev interfaces for devfs to use in replacement for specfs
  and the fixed interfaces bdevsw and cdevsw.

* scsi interface to replace struct scsi_adapter (not sure how this
  works in CAM but I imagine there is something similar there).

* various tailored interfaces for different bus types such as pci,
  isa, pccard etc.
1998-06-14 13:53:12 +00:00
Doug Rabson 99d11cde56 Major changes to the generic device framework for FreeBSD/alpha:
* Eliminate bus_t and make it possible for all devices to have
  attached children.

* Support dynamically extendable interfaces for drivers to replace
  both the function pointers in driver_t and bus_ops_t (which has been
  removed entirely.  Two system defined interfaces have been defined,
  'device' which is mandatory for all devices and 'bus' which is
  recommended for all devices which support attached children.

* In addition, the alpha port defines two simple interfaces 'clock'
  for attaching various real time clocks to the system and 'mcclock'
  for the many different variations of mc146818 clocks which can be
  attached to different alpha platforms.  This eliminates two more
  function pointer tables in favour of the generic method dispatch
  system provided by the device framework.

Future device interfaces may include:

* cdev and bdev interfaces for devfs to use in replacement for specfs
  and the fixed interfaces bdevsw and cdevsw.

* scsi interface to replace struct scsi_adapter (not sure how this
  works in CAM but I imagine there is something similar there).

* various tailored interfaces for different bus types such as pci,
  isa, pccard etc.
1998-06-14 13:46:10 +00:00
Poul-Henning Kamp 938ee3ce4d Introduce std_pps_ioctl() to automagically DTRT.
Add scaling capability to timex.offset, ntpd-4.0.73 will support this.
1998-06-13 09:30:26 +00:00
Doug Rabson 3900ddb2dc Only build this on i386 for now. I may use it for the alpha later but
currently it doesn't compile.
1998-06-11 07:23:59 +00:00
Julian Elischer 32f5d4d843 Replace 'sleep()' with 'tsleep()'
Accidentally imported from Kirk's codebase.

Pointed out by: various.
1998-06-10 22:02:14 +00:00
Julian Elischer 28913ebe4e Submitted by: Kirk McKusick <mckusick@McKusick.COM>
Fix for potential hang when trying to reboot the system or
to forcibly unmount a soft update enabled filesystem.
FreeBSD already handled the reboot case differently, this is however a better
fix.
1998-06-10 18:13:19 +00:00
Doug Rabson 897cd717a5 Add initial support for the FreeBSD/alpha kernel. This is very much a
work in progress and has never booted a real machine.  Initial
development and testing was done using SimOS (see
http://simos.stanford.edu for details).  On the SimOS simulator, this
port successfully reaches single-user mode and has been tested with
loads as high as one copy of /bin/ls :-).

Obtained from: partly from NetBSD/alpha
1998-06-10 10:57:29 +00:00
Doug Rabson 8c12612cf6 64bit fixes: don't cast pointers to int. 1998-06-10 10:31:08 +00:00
Doug Rabson 2b605d0804 64bit fixes: don't cast p->p_retval to an int*. 1998-06-10 10:30:23 +00:00
Doug Rabson 831b9ef2be 64bit fixes: use u_long not int for ioctl command. 1998-06-10 10:29:31 +00:00
Doug Rabson 10d4743f6f 64bit fixes: use size_t not u_int for sizes. 1998-06-10 10:28:29 +00:00
Doug Rabson 2ef49ddfcb 64bit fixes: p->p_retval is a register_t[] not an int[]. 1998-06-10 10:27:43 +00:00
Poul-Henning Kamp a58f0f8e66 Add a tc_ prefix to struct timecounter members.
Urged by:	bde
1998-06-09 13:10:54 +00:00
Bruce Evans 1afde994e9 Pass lists of possible root devices and their names up to the
machine-independent code and try mounting the devices in the
lists instead of guessing alternative root devices in a machine-
dependent way.

autoconf.c:
Reject preposterous slice numbers instead of silently converting
them to COMPATIBILITY_SLICE.

Don't forget to force slice = COMPATIBILITY_SLICE in the floppy
device name.

Eliminated most magic numbers and magic device names in setroot().

Fixed dozens of style bugs.

vfs_conf.c:
Put the actual root device name instead of "root_device" in the
mount struct if the actual name is available.  This is useful after
booting with -s.  If it were set in all cases then it could be used
to do mount(8)'s ROOTSLICE_HUNT and fsck(8)'s hotroot guess better.
1998-06-09 12:52:35 +00:00
Bruce Evans e7c1c309fa Don't generate COMPAT_43 cruft if there are no COMPAT_43 syscalls.
In particular, don't generate an include of "opt_compat.h" if it
wouldn't affect anything we create.  This will fix recent breakage
of the ibcs2 LKM.  The ibcs2 syscall files were not regenerated
properly, so the LKM didn't break immediately when we started
generating this extraneous include.
1998-06-09 03:32:05 +00:00
John Dyson 0d3dd8fbc5 Remove some junk left over from a previous commit.
Submitted by:	phk
1998-06-08 18:18:28 +00:00
Bruce Evans 414c93f3aa Updated generated files. 1998-06-08 11:08:35 +00:00
Bruce Evans bf0955a99d Fixed some style bugs in output (missing tabs and unparenthesized macros).
Fixed some style bugs in source (mostly, superfluous backslashes).
1998-06-08 11:02:00 +00:00
Doug Rabson 2e91d07af9 Fix a typo which prevented i386 elf from working at all (including Linux
emulated elf binaries).
1998-06-08 09:19:35 +00:00
Poul-Henning Kamp 48115288df Add a member function more to the timecounters, this one is for use
with latch based PPS implementations.  The client that uses it will
be committed after more testing.
1998-06-07 20:36:55 +00:00
Doug Rabson ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
Poul-Henning Kamp dbb3475507 Add a "this" style argument and a "void *private" so timecounters can
figure out which instance to wount with.
1998-06-07 08:40:53 +00:00
Bruce Evans e3a03f0cfb Don't attempt to copy the whole slices "struct" for DIOCGSLICEINFO.
The slices "struct" isn't really a struct; we allocate only part of
it in the fully dangerously dedicated case.  Since the "struct" is
malloced, the page beyond it may not be mapped, so attempts to copy
it would crash.  This problem became larger when the full struct was
bloated from < 1K to > 3K by the addition of (mostly unused) DEVFS
tokens some time before 2.2.0 was released.
1998-06-06 03:06:55 +00:00
David Greenman b5afad7198 Moved limit frobbing (and the resulting limcopy()) that occurs for
accounting to the accounting function so that this isn't needlessly
done for some process exits.
Reviewed by:	bde,phk
1998-06-05 21:44:20 +00:00
David Greenman 9523f5c199 If we are out of mb_map space and we failed to m_reclaim() anything and
the alloc is not M_DONTWAIT, then panic with "Out of mbuf clusters".
Callers that specify M_WAIT can't deal with getting a NULL buffer, so this
is a more graceful failure than randomly page faulting in the socket code
or elsewhere.
1998-06-05 21:41:48 +00:00
John Dyson e8f367853b Correct sleep priority. 1998-06-02 05:39:13 +00:00
Peter Dufault ce47711dee Set PAGE_SIZE for _SC_PAGESIZE sysconf(). 1998-06-01 21:54:43 +00:00
Peter Wemm 4dc75870b2 Have the wakeup routine do the upcall if needed.
Obtained from: NetBSD
1998-05-31 18:38:43 +00:00
Poul-Henning Kamp e796e00de3 Some cleanups related to timecounters and weird ifdefs in <sys/time.h>.
Clean up (or if antipodic: down) some of the msgbuf stuff.

Use an inline function rather than a macro for timecounter delta.

Maintain process "on-cpu" time as 64 bits of microseconds to avoid
needless second rollover overhead.

Avoid calling microuptime the second time in mi_switch() if we do
not pass through _idle in cpu_switch()

This should reduce our context-switch overhead a bit, in particular
on pre-P5 and SMP systems.

WARNING:  Programs which muck about with struct proc in userland
will have to be fixed.

Reviewed, but found imperfect by:       bde
1998-05-28 09:30:28 +00:00
John Dyson cf2819ccb8 Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)
1998-05-21 07:47:58 +00:00
Peter Dufault aebde78243 1. Add new defs for mins and maxs for the POSIX flavor priorities. They
end up being the same, but it doesn't look like you're comparing
apples and oranges.

2. Use need_resched instead of reset_priority.  This isn't right
either, since for example you'll round-robin against equal priority FIFO
processes when lowering the priority of another process,
but this works better and a real fix needs to be in kern_synch and
not out here.

3. This is not a device driver: copyin/copyout the structure.
1998-05-19 21:11:53 +00:00
Poul-Henning Kamp 579f4456b9 Change a data type internal to the timecounters, and remove the "delta"
function.

Reviewed, but not entirely approved by: bde
1998-05-19 18:55:02 +00:00
Poul-Henning Kamp 58067a9909 Make the size of the msgbuf (dmesg) a "normal" option. 1998-05-19 08:58:53 +00:00
Tor Egge afc6ea238f Disallow reading the current kernel stack. Only the user structure and
the current registers should be accessible.
Reviewed by:	David Greenman <dg@root.com>
1998-05-19 00:00:14 +00:00
Peter Dufault 2a61a11038 1. Don't use "nosys" and generate coredumps for unconfigured
system calls - return ENOSYS per the spec.

2. Fix interface stub to set priority properly.
1998-05-18 12:53:45 +00:00
Tor Egge 2f1e70693d Add forwarding of roundrobin to other cpus. This gives a more regular
update of cpu usage as shown by top when one process is cpu bound
(no system calls) while the system is otherwise idle (except for top).

Don't attempt to switch to the BSP in boot().  If the system was idle when
an interrupt caused a panic, this won't work.  Instead, switch to the BSP
in cpu_reset.

Remove some spurious forward_statclock/forward_hardclock warnings.
1998-05-17 22:12:14 +00:00
Bruce Evans ee002b68d1 Fixed interval calculation in realitimexpire() again. Obtained from:
rev.1.9.  Broken in: rev.1.50.

Fixed a spelling error.  Obtained from: Lite2.
1998-05-17 20:13:01 +00:00
Bruce Evans c8b4782815 Fixed stale references to hzto() in comments. 1998-05-17 20:08:05 +00:00
Tor Egge cb87a87c16 Supply the correct process argument to dounmount when possible. 1998-05-17 19:38:55 +00:00
Tor Egge 5931a9c24e For SMP, use prv_PPAGE1/prv_PMAP1 instead of PADDR1/PMAP1.
get_ptbase and pmap_pte_quick no longer generates IPIs.
This should reduce the number of IPIs during heavy paging.
1998-05-17 18:53:19 +00:00
Poul-Henning Kamp c21410e119 s/nanoruntime/nanouptime/g
s/microruntime/microuptime/g

Reviewed by:	bde
1998-05-17 11:53:46 +00:00
Garrett Wollman 98271db4d5 Convert socket structures to be type-stable and add a version number.
Define a parameter which indicates the maximum number of sockets in a
system, and use this to size the zone allocators used for sockets and
for certain PCBs.

Convert PF_LOCAL PCB structures to be type-stable and add a version number.

Define an external format for infomation about socket structures and use
it in several places.

Define a mechanism to get all PF_LOCAL and PF_INET PCB lists through
sysctl(3) without blocking network interrupts for an unreasonable
length of time.  This probably still has some bugs and/or race
conditions, but it seems to work well enough on my machines.

It is now possible for `netstat' to get almost all of its information
via the sysctl(3) interface rather than reading kmem (changes to follow).
1998-05-15 20:11:40 +00:00
Peter Wemm 9c4aed2ed7 Nuke signanosleep(). (I've left nanosleep1() seperate to nanosleep()
as I don't want to mess with the multiple returns)
1998-05-14 11:31:08 +00:00
Peter Wemm 06b6493558 regen after signanosleep nuke 1998-05-14 11:29:06 +00:00
Peter Wemm 786cf38a29 deep-six signanosleep(). It sounded like a good idea at the time. 1998-05-14 11:28:11 +00:00
Peter Wemm 1973d51bfb Commit an old change that has been sitting around for a long while.
signanosleep() did not deal with signal masks properly.  This change was
based on a discussion with bde some time ago (at least 6 months or more).

signanosleep() should probably go away since it was never really used for
more than a few weeks and doesn't appear in released code.  It should
probably be killed before somebody uses it and it becomes a gratuitous
nonstandard feature.
1998-05-14 10:38:52 +00:00
Bruce Evans b322fb5d76 Backed out previous commit. It is invalid to call d_ioctl() on
possibly non-open devices, and we don't want to restrict dumping
to swap devices anwyay.  It is especially invalid to call d_ioctl()
in non-process context for panics.  d_psize() can be called on
non-open devices, at least on non-SLICED ones that support d_dump(),
and setdumpdev() has depended on this for a long time although it
is probably wrong, but even d_psize() can't be called in non-process
context - that's why dumpsys() depends on previously computed values
although these values may be stale.  The historical restriction to
devices with dkpart(dev) == SWAP_PART should go away.
1998-05-12 17:34:02 +00:00
John Dyson 1f56217280 Fix the futimes/undelete/utrace conflict with other BSD's. Note that
the only common  usage of utrace (the possible problem with this
commit) is with malloc, so this should be a real problem.  Add
the various NetBSD syscalls that allow full emulation of their
development environment.
1998-05-11 03:55:28 +00:00
John Dyson f0175db1ee Attempt to set write combining mode for graphics devices. 1998-05-11 01:06:08 +00:00
Mike Smith 7be2d30077 In the words of the submitter:
---------
Make callers of namei() responsible for releasing references or locks
instead of having the underlying filesystems do it.  This eliminates
redundancy in all terminal filesystems and makes it possible for stacked
transport layers such as umapfs or nullfs to operate correctly.

Quality testing was done with testvn, and lat_fs from the lmbench suite.

Some NFS client testing courtesy of Patrik Kudo.

vop_mknod and vop_symlink still release the returned vpp.  vop_rename
still releases 4 vnode arguments before it returns.  These remaining cases
will be corrected in the next set of patches.
---------

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-07 04:58:58 +00:00
Julian Elischer 7f2f1b784e Add dump support to the DEVFS/slice code.
now we can actually catch our crashes :-)

Submitted by: Luoqi Chen <luoqi@chen.ml.org> (the man who's everywhere)
1998-05-06 22:14:48 +00:00
Mike Smith 79cc756d8b As described by the submitter:
Reverse the VFS_VRELE patch.  Reference counting of vnodes does not need
to be done per-fs.  I noticed this while fixing vfs layering violations.
Doing reference counting in generic code is also the preference cited by
John Heidemann in recent discussions with him.

The implementation of alternative vnode management per-fs is still a valid
requirement for some filesystems but will be revisited sometime later,
most likely using a different framework.

Submitted by:	Michael Hancock <michaelh@cet.co.jp>
1998-05-06 05:29:41 +00:00
John Dyson 96fb8cf258 Fix the shm panic. I mistakenly used the shadow_count to keep the object
from being split, and instead added an OBJ_NOSPLIT.
1998-05-04 17:12:53 +00:00
John Dyson cbd8ec0902 Work around some VM bugs, the worst being an overly aggressive
swap space free calculation.  More complete fixes will be forthcoming,
in a week.
1998-05-04 03:01:44 +00:00
Bruce Evans 77849078bf Oops, the previous commit should have changed `i386' to `__i386__',
not `__i386'.
1998-05-01 16:40:21 +00:00
Bruce Evans 809e3a8464 Partially fixed write clustering for cases where cluster_wbuild() is
called from vfs_bio_awrite() without going through cluster_write()
or ufs_bmaparray(), in particular for all writes to block disk devices.
Only ufs_bmaparray() sets vp->v_maxio in a correct way, and it doesn't
seem to be called early enough even for regular files.
1998-05-01 16:29:27 +00:00
Peter Wemm b1951f4028 vm_page_is_valid() wasn't expecting a large offset argument, it's
expecting a sub-page offset.  We were passing the file position,
and vm_page_bits() could do some interesting things when base was
larger PAGE_SIZE.
if (size > PAGE_SIZE - base)
	size = PAGE_SIZE - base;
is interesting when (PAGE_SIZE - base) is negative.  I could imagine that
this could have interesting consequences for memory page -> device block
bit validation.
1998-05-01 15:10:59 +00:00
Peter Wemm f806d5a257 Fix one problem with NFSv3 > 2GB file support.
Submitted by: bde
1998-05-01 15:04:35 +00:00
Eivind Eklund 288078be0f Translate T_PROTFLT to SIGSEGV instead of SIGBUS when running under
Linux emulation.  This make Allegro Common Lisp 4.3 work under
FreeBSD!

Submitted by: Fred Gilham <gilham@csl.sri.com>
Commented on by: bde, dg, msmith, tg
Hoping he got everything right:  eivind
1998-04-28 18:15:08 +00:00
David E. O'Brien cbcfa1ba6a Discussed with: bde 1998-04-24 11:50:30 +00:00
David E. O'Brien 8f89f24fc3 Create virgin disklabels with 8 (MAXPARTITIONS) partitions rather than
three (RAW_PART + 1);
This makes ``disklabel -Brw sdN auto'' do the Right Thing.
1998-04-24 11:49:57 +00:00
David Greenman 9351a2295a Added kern.ipc.nmbclusters 1998-04-24 04:15:52 +00:00
Julian Elischer c0bab11dfe Make the devfs SLICE option a standard type option.
(hopefully it will go away eventually anyhow)
1998-04-20 03:57:41 +00:00
Julian Elischer 3e425b968d Add changes and code to implement a functional DEVFS.
This code will be turned on with the TWO options
DEVFS and SLICE. (see LINT)
Two labels PRE_DEVFS_SLICE and POST_DEVFS_SLICE will deliniate these changes.

/dev will be automatically mounted by init (thanks phk)
on bootup. See /sys/dev/slice/slice.4 for more info.
All code should act the same without these options enabled.

Mike Smith, Poul Henning Kamp, Soeren, and a few dozen others

This code does not support the following:
bad144 handling.
Persistance. (My head is still hurting from the last time we discussed this)
ATAPI flopies are not handled by the SLICE code yet.

When this code is running, all major numbers are arbitrary and COULD
be dynamically assigned. (this is not done, for POLA only)
Minor numbers for disk slices ARE arbitray and dynamically assigned.
1998-04-19 23:32:49 +00:00
Dag-Erling Smørgrav 59bad7c53b Backed out lseek changes. 1998-04-19 22:20:32 +00:00
Dag-Erling Smørgrav 25096724e8 Return EINVAL and do not change file pointer if resulting offset is negative.
PR:		kern/6184
1998-04-18 19:24:44 +00:00
Peter Wemm 37b8ccd37a In vfs_msync(), test to see if the vnode being examined is "interesting"
(ie: it has a vm_object attached and is marked as OBJ_MIGHTBEDIRTY) before
attempting to lock it.  This should reduce the cpu hit that is incurred
when doing a sync(2) and when the syncer process is doing the 30-second
writeback of dirty mmap() data to disk.  Skip this speedup if we are
doing an unmount() to be sure to get everything - we can afford to
occasionally miss a msync while the system is running, but not at unmount.

I'm not sure about the VXLOCK and MNT_WAIT case, it seems a bit odd to skip
doing a page_clean at unmount time just because a vnode is VXLOCKed, but
that's what was being done before...
1998-04-18 06:26:16 +00:00
Dag-Erling Smørgrav dc73342347 Seventy-odd "its" / "it's" typos in comments fixed as per kern/6108. 1998-04-17 22:37:19 +00:00
Bruce Evans ab36c3d3e7 Really finish supporting compiling with `gcc -ansi'. 1998-04-17 04:53:44 +00:00
Peter Wemm efdc5523c0 When the softdep conversion took place, the periodic vfs_msync() from
update got lost.  This is responsible for ensuring that dirty mmap() pages
get periodically written to disk.  Without it, long time mmap's might not
have their dirty pages written out at all of the system crashes or isn't
cleanly shut down.  This could be nasty if you've got a long-running
writing via mmap(), dirty pages used to get written to disk within 30
seconds or so.
1998-04-16 03:31:26 +00:00
Tor Egge 71033a8c50 Unlock mountlist_slock if the mount point was busy (unmount in progress)
during the attempt at lazy fsync.
1998-04-15 18:37:49 +00:00
Bruce Evans c1087c1324 Support compiling with `gcc -ansi'. 1998-04-15 17:47:40 +00:00
Poul-Henning Kamp 115facb29d Fix a minor mbuf leak created by the previous change.
Reviewed by:	phk
Submitted by:	pb@fasterix.freenix.org (Pierre Beyssac)
1998-04-14 06:24:43 +00:00
Poul-Henning Kamp aba558930b setsockopt() transports user option data in an mbuf. if the user
data is greater than MLEN, setsockopt is unable to pass it onto
the protocol handler.  Allocate a cluster in such case.

PR:		2575
Reviewed by:	 phk
Submitted by:	Julian Assange proff@iq.org
1998-04-11 20:31:46 +00:00
Poul-Henning Kamp a2481bbe8e When pmap_pinit0() allocates a page for proc0's page directory,
kernal page table may need to be extended.  But while growing the
kernel page table (pmap_growkernel()), newly allocated kernel page
table pages are entered into every process' page directory. For
proc0, the page directory is not allocated yet, and results in a
page fault.  Eventually, the machine panics with "lockmgr: not
holding exclusive lock".

PR:		5458
Reviewed by:	phk
Submitted by:	Luoqi Chen <luoqi@luoqi.watermarkgroup.com>
1998-04-11 17:24:06 +00:00
Alexander Langer 7c2e3d329a Grammar police. 1998-04-10 00:09:04 +00:00
Wolfram Schneider 5ddc8ded1d New mount option nosymfollow. If enabled, the kernel lookup()
function will not follow symbolic links on the mounted
file system and return EACCES (Permission denied).
1998-04-08 18:31:59 +00:00
Poul-Henning Kamp 5f88ec3625 Minor adjustments to the timecounting and proc0.
Mostly Submitted by:	bde
1998-04-08 09:01:53 +00:00
Peter Wemm 100ceca222 Today is not my lucky day. Fix missing brace and I got a request
to use EMLINK instead.
1998-04-06 19:32:37 +00:00
Peter Wemm 193afe0189 Use a different errno (ELOOP (as sef mentioned) since the text that goes
with the error sounds ok for the condition) if O_NOFOLLOW gets a link.
1998-04-06 18:43:28 +00:00
Peter Wemm 0fdc628b41 Rather than let users get fd's to symlink files, make O_NOFOLLOW cause
an error if it gets a link (like it does if it gets a socket).  The
implications of letting users try and do file operations on symlinks
themselves were too worrying.
1998-04-06 18:25:21 +00:00
Peter Wemm 7e3426aa1f Implement a new open(2) flag: O_NOFOLLOW. This will instruct open
to not follow symlinks, but to open a handle on the link itself(!).
As strange as this might sound, it has several useful applications
safe race-free ways of opening files in hostile areas (eg: /tmp, a mode
1777 /var/mail, etc).  It also would allow things like fchown() to work
on the link rather than having to implement a new syscall specifically for
that task.

Reviewed by: phk
1998-04-06 17:38:43 +00:00
Peter Wemm aacdc613e5 curproc is initialized in locore at the same time for both SMP and UP now. 1998-04-06 15:51:22 +00:00
Peter Wemm cf34ef61ee Use real types for the SMP pages being allocated rather than arrays of
ints.  Remove some no longer needed casts.  Initialize the per-cpu
global data area using the structs rather than knowing too much about
layout, alignment, etc.
1998-04-06 15:48:30 +00:00
Poul-Henning Kamp 2eeb0e2ea0 Make read_random() take a (void *) argument instead of (char *) 1998-04-06 09:30:42 +00:00
Poul-Henning Kamp 4cf41af3d4 Make a kernel version of the timer* functions called timerval* to be
more consistent.

OK'ed by:	bde
1998-04-06 08:26:08 +00:00
Poul-Henning Kamp 5704ba6a06 More fixes for the iterative case of nanosleep1 from bruce.
I hate the 2-arg time{spec|val}{add|sub} functions!
1998-04-05 12:10:41 +00:00
Poul-Henning Kamp bfe6c9fabf Make the dummy timecounter run at 1 MHz rather than 100kHz (noticed by bde)
fix the itimer(REAL) handling.
1998-04-05 11:49:36 +00:00
Peter Wemm d59fbbf6c8 If there is no error code, don't copyout the remaining time. (As
documented in the man page and the standards).  (and besides, nanosleep1
isn't setting it in this case at present anyway, so we'd be copying junk).
1998-04-05 11:17:19 +00:00
Poul-Henning Kamp 338418263d Fix nanosleep1 based on Bruces suggestion. 1998-04-05 10:28:01 +00:00
Andrey A. Chernov 80a39463c9 Remove unused atv.tv_usec = 0; from select/poll code 1998-04-05 10:03:52 +00:00
Peter Wemm 2257b488b9 tsleep() returns EWOULDBLOCK if the timeout expired. Don't return this
to usermode, otherwise sleep(3) fails, cron doesn't work, etc etc etc.
1998-04-05 07:31:44 +00:00
Peter Wemm b90dcc0c5d Fix previous commit. Don't people read compiler messages or something?? 1998-04-05 02:59:10 +00:00
Poul-Henning Kamp 91ad39c6b3 Handle double fraction overflow in nano & microtime functions (spotted by Bruce)
Use tvtohz() a place where it fits.
1998-04-04 18:46:13 +00:00
Poul-Henning Kamp 00af9731c9 Time changes mark 2:
* Figure out UTC relative to boottime.  Four new functions provide
      time relative to boottime.

    * move "runtime" into struct proc.  This helps fix the calcru()
      problem in SMP.

    * kill mono_time.

    * add timespec{add|sub|cmp} macros to time.h.  (XXX: These may change!)

    * nanosleep, select & poll takes long sleeps one day at a time

Reviewed by:    bde
Tested by:      ache and others
1998-04-04 13:26:20 +00:00
John Dyson aec0bcdf5b Perhaps fix a problem that some drivers have that they don't properly
initialize the b_kvasize element.  This might fix some of the split
I/O requests that some people have.
1998-04-04 05:55:05 +00:00
Poul-Henning Kamp 4ff16568be Try to fix poll & select after I broke them. 1998-04-02 07:22:17 +00:00
Tor Egge 5758c2de94 Add two workarounds for broken MP tables:
- Attempt to handle PCI devices where the interrupt is
	  an ISA/EISA interrupt according to the mp table.

	- Attempt to handle multiple IO APIC pins connected to
	  the same PCI or ISA/EISA interrupt source.  Print a
	  warning if this happens, since performance is suboptimal.
	  This workaround is only used for PCI devices.

With these two workarounds, the -SMP kernel is capable of running on
my Asus P/I-P65UP5 motherboard when version 1.4 of the MP table is disabled.
1998-04-01 21:07:37 +00:00
Poul-Henning Kamp 460608e768 Fix an off by 1<<32 error. 1998-03-31 10:47:01 +00:00
Poul-Henning Kamp 75da0aa298 Add a dummy timecounter until we find the real thing(s). 1998-03-31 10:44:56 +00:00
Poul-Henning Kamp 227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
John Dyson 006b9b7df9 Correct a significant problem with the softupdates port. Allow fsync
to work properly within the softupdates framework, and thereby eliminate
some unfortunate panics.
1998-03-29 18:23:44 +00:00
Poul-Henning Kamp 934f5f3306 Export MD5Transform in md5.c and remove a private version in random_machdep.c
md5 is standard as a consequence of this.
1998-03-29 11:55:06 +00:00
Peter Dufault 7c9f6f8f8b Remove duplicate comment 1998-03-28 18:16:29 +00:00
Peter Dufault 38c76440b8 Include sys/resource.h to get PRIO_MAX. 1998-03-28 14:49:47 +00:00
Bruce Evans 3c1300a6b3 Removed unused #includes. 1998-03-28 13:25:01 +00:00
Bruce Evans 771b51ef7b Don't depend on <sys/mount.h> including <sys/socket.h>. 1998-03-28 12:04:40 +00:00
Peter Dufault 8a6472b723 Finish _POSIX_PRIORITY_SCHEDULING. Needs P1003_1B and
_KPOSIX_PRIORITY_SCHEDULING options to work.  Changes:

Change all "posix4" to "p1003_1b".  Misnamed files are left
as "posix4" until I'm told if I can simply delete them and add
new ones;

Add _POSIX_PRIORITY_SCHEDULING system calls for FreeBSD and Linux;

Add man pages for _POSIX_PRIORITY_SCHEDULING system calls;

Add options to LINT;

Minor fixes to P1003_1B code during testing.
1998-03-28 11:51:01 +00:00
Bruce Evans 08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
Poul-Henning Kamp c6bcf724da Split the padding out into a separate function.
Synchronize the kernel and libmd versions of md5c.c

PR:		misc/6127
Reviewed by:	phk
Submitted by:	Ari Suutari <ari@suutari.iki.fi>
1998-03-27 10:23:00 +00:00
John Dyson f9be84912c Correct a problem where buffers might not be zeroed when needed. The
B_MALLOC buffers might not have been properly zeroed.
1998-03-27 06:48:24 +00:00
Poul-Henning Kamp a0502b19d4 Add two new functions, get{micro|nano}time.
They are atomic, but return in essence what is in the "time" variable.
gettime() is now a macro front for getmicrotime().

Various patches to use the two new functions instead of the various
hacks used in their absence.

Some puntuation and grammer patches from Bruce.

A couple of XXX comments.
1998-03-26 20:54:05 +00:00
Jonathan Lemon 640c4313af Add the ability to make real-mode BIOS calls from the kernel. Currently,
everything is contained inside #ifdef VM86, so this option must be
present in the config file to use this functionality.

Thanks to Tor Egge, these changes should work on SMP machines.  However,
it may not be throughly SMP-safe.

Currently, the only BIOS calls made are memory-sizing routines at bootup,
these replace reading the RTC values.
1998-03-23 19:52:59 +00:00
John Dyson 52c64c95c5 In kern_physio.c fix tsleep priority messup.
In vfs_bio.c, remove b_generation count usage,
	remove redundant reassignbuf,
	remove redundant spl(s),
	manage page PG_ZERO flags more correctly,
	utilize in invalid value for b_offset until it
		is properly initialized.  Add asserts
		for #ifdef DIAGNOSTIC, when b_offset is
		improperly used.
	when a process is not performing I/O, and just waiting
		on a buffer generally, make the sleep priority
		low.
	only check page validity in getblk for B_VMIO buffers.

In vfs_cluster, add b_offset asserts, correct pointer calculation
	for clustered reads.  Improve readability of certain parts of
	the code.  Remove redundant spl(s).

In vfs_subr, correct usage of vfs_bio_awrite (From Andrew Gallatin
	<gallatin@cs.duke.edu>).  More vtruncbuf problems fixed.
1998-03-19 22:48:16 +00:00
John Dyson 1c77c6b7b0 Fix an embarassing problem in vtruncbuf. 1998-03-19 18:46:58 +00:00
John Dyson 4641c8ac1d Correct a problem where data OR metadata could be thrown away if a
buffer is grown.
1998-03-17 17:36:05 +00:00
KATO Takenori f1aca9c33f Deleted PC-98 code because (1) machine dependent code should not be in
here, and (2) the flag used in PC-98 code has been assigned to another
purpose.
1998-03-17 08:41:28 +00:00
John Dyson 2deb5d0417 Correct a severely evil bug in the vtruncbuf code. It didn't cause
me any problems until after the previous commit.  This problem then
caused a severe case of creeping crud on my diskdrive, and hosed
my system so bad, that I needed to do a complete reinstall.  Sorry!!!

I assume that others have manifest this bug.
1998-03-17 06:30:52 +00:00
Julian Elischer c2a94b7a3c Remove a soft-update hook that was accidentally added to the READ path.
also add some comments, and a couple of very minor cosmetic changes.
1998-03-16 18:39:41 +00:00
Poul-Henning Kamp b05dcf3c2f A bunch of BNN (Bruce Normal Nits) from bde:
Bring back the softclock inlining
	save a couple of <<32's
	many white-space shuffles.
1998-03-16 10:19:12 +00:00
John Dyson e85c1afb7c Allow vfs_ioopt to be enabled with a (temporary) config option. 1998-03-16 02:13:03 +00:00
John Dyson bef608bd7e Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
John Dyson 26300b34f1 Disable the vfs.ioopt option for now, so that we don't get gratuitious
bugreports.  I might not be able to fix the problems before 3.0, due
to other, more important things.
1998-03-14 19:50:36 +00:00
Tor Egge 8293f20aee Don't misuse vnode interlocks in routines that can be called from interrupts.
PR:		5893
1998-03-14 02:55:01 +00:00
Peter Dufault 917827427e idprio processes must be preempted as soon as anything is runnable. 1998-03-11 20:50:42 +00:00