1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-24 11:29:10 +00:00
Commit Graph

829 Commits

Author SHA1 Message Date
Poul-Henning Kamp
b62591052c Remove bdevsw_add(), change the only two users to use bdevsw_add_generic().
Extend cdevsw to be superset of bdevsw.
Remove non-functional bdev lkm support.
Teach wcd what the open() args mean.
1998-06-25 11:28:07 +00:00
Bruce Evans
be160d60ab Removed unused includes. 1998-06-21 18:02:50 +00:00
Bruce Evans
e5b19842ef Removed unused includes. 1998-06-21 14:53:44 +00:00
Doug Rabson
ecbb00a262 This commit fixes various 64bit portability problems required for
FreeBSD/alpha.  The most significant item is to change the command
argument to ioctl functions from int to u_long.  This change brings us
inline with various other BSD versions.  Driver writers may like to
use (__FreeBSD_version == 300003) to detect this change.

The prototype FreeBSD/alpha machdep will follow in a couple of days
time.
1998-06-07 17:13:14 +00:00
David Greenman
5994d8937d Changed the log() of "Out of mbuf clusters - increase maxusers" to a
printf() of "Out of mbuf clusters - adjust NMBCLUSTERS or increase
maxusers" so that the message is more informative and so that it will
appear in the kernel message buffer.
1998-06-05 21:48:45 +00:00
John Dyson
976f208be3 Cleanup and remove some dead code from the initialization. 1998-06-02 05:50:08 +00:00
John Dyson
e8f367853b Correct sleep priority. 1998-06-02 05:39:13 +00:00
John Dyson
b9cefc08e2 Support a 16K first level cache for 512K 2nd level. Also, add support
for 1MB 2nd level cache.
1998-05-24 04:25:27 +00:00
John Dyson
cf2819ccb8 Make flushing dirty pages work correctly on filesystems that
unexpectedly do not complete writes even with sync I/O requests.
This should help the behavior of mmaped files when using
softupdates (and perhaps in other circumstances also.)
1998-05-21 07:47:58 +00:00
Peter Wemm
4183b6b622 Make the previous commit compile.. 1998-05-19 07:13:21 +00:00
Guido van Rooij
05feb99ff1 Plug hole reported on Bugtraq: do not allow mmap with WRITE privs for
append-only and immutable files.

Obtained from: OpenBSD (partly)
1998-05-18 18:26:27 +00:00
John Dyson
bd6be9150d An important fix for proper inheritance of backing objects for
object splits.  Another excellent detective job by Tor.
Submitted by:	Tor Egge <Tor.Egge@idi.ntnu.no>
1998-05-16 23:03:20 +00:00
John Dyson
96fb8cf258 Fix the shm panic. I mistakenly used the shadow_count to keep the object
from being split, and instead added an OBJ_NOSPLIT.
1998-05-04 17:12:53 +00:00
John Dyson
cbd8ec0902 Work around some VM bugs, the worst being an overly aggressive
swap space free calculation.  More complete fixes will be forthcoming,
in a week.
1998-05-04 03:01:44 +00:00
John Dyson
86524867d1 Another minor cleanup of the split code. Make sure that pages are
busied during the entire time, so that the waits for pages being
unbusy don't make the objects inconsistant.
1998-05-02 06:36:16 +00:00
Peter Wemm
3c33646725 Seatbelts for vm_page_bits() in case a file offset is passed in rather than
the page offset.  If a large file offset was passed in, a large negative
array index could be generated which could cause page faults etc at worst
and file corruption at the least.  (Pages are allocated within file
space on page alignment boundaries, so a file offset being passed in here
is harmless to DTRT.  The case where this was happening has already been
fixed though, this is in case it happens again).

Reviewed by: dyson
1998-05-02 03:02:13 +00:00
John Dyson
e493d28abc Fix minor bug with new over used swap fix. 1998-05-01 02:25:29 +00:00
John Dyson
dda6b17151 Add a needed prototype, and fix a panic problem with the new
memory code.
1998-04-29 06:59:08 +00:00
John Dyson
c0877f103f Tighten up management of memory and swap space during map allocation,
deallocation cycles.  This should provide a measurable improvement
on swap and memory allocation on loaded systems.  It is unlikely a
complete solution.  Also, provide more map info with procfs.
Chuck Cranor spurred on this improvement.
1998-04-29 04:28:22 +00:00
John Dyson
2dbea5d2e3 Fix a pseudo-swap leak problem. This mitigates "leaks" due to
freeing partial objects, not freeing entire objects didn't
free any of it.  Simple fix to the map code.
Reviewed by:	dg
1998-04-28 05:54:47 +00:00
John Dyson
adc78b8c71 Correct copyright. 1998-04-25 04:50:03 +00:00
Bruce Evans
c1087c1324 Support compiling with `gcc -ansi'. 1998-04-15 17:47:40 +00:00
Poul-Henning Kamp
227ee8a188 Eradicate the variable "time" from the kernel, using various measures.
"time" wasn't a atomic variable, so splfoo() protection were needed
around any access to it, unless you just wanted the seconds part.

Most uses of time.tv_sec now uses the new variable time_second instead.

gettime() changed to getmicrotime(0.

Remove a couple of unneeded splfoo() protections, the new getmicrotime()
is atomic, (until Bruce sets a breakpoint in it).

A couple of places needed random data, so use read_random() instead
of mucking about with time which isn't random.

Add a new nfs_curusec() function.

Mark a couple of bogosities involving the now disappeard time variable.

Update ffs_update() to avoid the weird "== &time" checks, by fixing the
one remaining call that passwd &time as args.

Change profiling in ncr.c to use ticks instead of time.  Resolution is
the same.

Add new function "tvtohz()" to avoid the bogus "splfoo(), add time, call
hzto() which subtracts time" sequences.

Reviewed by:	bde
1998-03-30 09:56:58 +00:00
Bruce Evans
08637435f2 Moved some #includes from <sys/param.h> nearer to where they are actually
used.
1998-03-28 10:33:27 +00:00
John Dyson
bef608bd7e Some VM improvements, including elimination of alot of Sig-11
problems.  Tor Egge and others have helped with various VM bugs
lately, but don't blame him -- blame me!!!

pmap.c:
1)	Create an object for kernel page table allocations.  This
	fixes a bogus allocation method previously used for such, by
	grabbing pages from the kernel object, using bogus pindexes.
	(This was a code cleanup, and perhaps a minor system stability
	 issue.)

pmap.c:
2)	Pre-set the modify and accessed bits when prudent.  This will
	decrease bus traffic under certain circumstances.

vfs_bio.c, vfs_cluster.c:
3)	Rather than calculating the beginning virtual byte offset
	multiple times, stick the offset into the buffer header, so
	that the calculated offset can be reused.  (Long long multiplies
	are often expensive, and this is a probably unmeasurable performance
	improvement, and code cleanup.)

vfs_bio.c:
4)	Handle write recursion more intelligently (but not perfectly) so
	that it is less likely to cause a system panic, and is also
	much more robust.

vfs_bio.c:
5)	getblk incorrectly wrote out blocks that are incorrectly sized.
	The problem is fixed, and writes blocks out ONLY when B_DELWRI
	is true.

vfs_bio.c:
6)	Check that already constituted buffers have fully valid pages.  If
	not, then make sure that the B_CACHE bit is not set. (This was
	a major source of Sig-11 type problems.)

vfs_bio.c:
7)	Fix a potential system deadlock due to an incorrectly specified
	sleep priority while waiting for a buffer write operation.  The
	change that I made opens the system up to serious problems, and
	we need to examine the issue of process sleep priorities.

vfs_cluster.c, vfs_bio.c:
8)	Make clustered reads work more correctly (and more completely)
	when buffers are already constituted, but not fully valid.
	(This was another system reliability issue.)

vfs_subr.c, ffs_inode.c:
9)	Create a vtruncbuf function, which is used by filesystems that
	can truncate files.  The vinvalbuf forced a file sync type operation,
	while vtruncbuf only invalidates the buffers past the new end of file,
	and also invalidates the appropriate pages.  (This was a system reliabiliy
	and performance issue.)

10)	Modify FFS to use vtruncbuf.

vm_object.c:
11)	Make the object rundown mechanism for OBJT_VNODE type objects work
	more correctly.  Included in that fix, create pager entries for
	the OBJT_DEAD pager type, so that paging requests that might slip
	in during race conditions are properly handled.  (This was a system
	reliability issue.)

vm_page.c:
12)	Make some of the page validation routines be a little less picky
	about arguments passed to them.  Also, support page invalidation
	change the object generation count so that we handle generation
	counts a little more robustly.

vm_pageout.c:
13)	Further reduce pageout daemon activity when the system doesn't
	need help from it.  There should be no additional performance
	decrease even when the pageout daemon is running.  (This was
	a significant performance issue.)

vnode_pager.c:
14)	Teach the vnode pager to handle race conditions during vnode
	deallocations.
1998-03-16 01:56:03 +00:00
Guido van Rooij
c8bdd56b3a Fix for mmap of char devices bug as described in OpenBSD advisory of
1998/02/20
Reviewed by:	John Dyson
Submitted by:	"Cy Schubert" <cschuber@uumail.gov.bc.ca>
1998-03-12 19:36:18 +00:00
Mike Smith
86ffbd76d0 Complement diagnostic messages about missing per-FS VOP page operations,
but don't make their absence fatal.
Submitted by:	terry
1998-03-09 08:58:53 +00:00
John Dyson
be01eafd5f Quell unneeded pageout daemon activity. 1998-03-08 18:19:17 +00:00
John Dyson
6215e86272 Remove a very ill advised vm_page_protect. This was being called
for a non-managed page.  That is a big no-no.
1998-03-08 18:05:59 +00:00
John Dyson
e163e201ef Some cruft left over from my megacommit. A page rotation optimization
was a good idea, but can cause instability.  That optimization is
now removed.
1998-03-08 06:27:30 +00:00
John Dyson
edd97f3a37 Several minor fixes:
1) When freeing pages, it is a good idea to protect them off.
	   (This is probably gratuitious, but good form.)
	2) Allow collapsing pages in the backing object that are
	   PQ_CACHE.  This will improve memory utilization.
	3) Correct the collapse code so that pages that were on the
	   cache queue are moved to the inactive queue.  This is
	   done when pages are marked dirty (so that those pages
	   will be properly paged out instead of freed), so that
	   cached pages will not be paradoxically marked dirty.
1998-03-08 06:25:59 +00:00
John Dyson
8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
John Dyson
4866e0856c Make vm_fault much cleaner by removing the evil macro inlines, and
put alot of it's context into a data structure.  This allows
significant shortening of its codepath, and will significantly
decrease it's cache footprint.

Also, add some stats to vmmeter.  Note that you'll have to
rebuild/recompile vmstat, systat, etc... Otherwise, you'll
get "very interesting" paging stats.
1998-03-07 20:45:47 +00:00
Peter Dufault
917e476dad Reviewed by: msmith, bde long ago
POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:27:00 +00:00
John Dyson
ffc82b0a70 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
Mike Smith
ce75f2c365 In the author's words:
These diffs implement the first stage of a VOP_{GET|PUT}PAGES pushdown
for local media FS's.

See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation
details for generic *_{get|put}pages for local media FS's.  Support
is trivial to add for any FS that formerly relied on the default
behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the
ffs_getpages() code for the FS in question's *_{get|put}pages).

Obviously, it would be better if each local media FS implemented a
more optimal method, instead of calling an exported interface from
the /sys/vm/vnode_pager.c, but this is a necessary first step in
getting the FS's to a point where they can be supplied with better
implementations on a case-by-case basis.

Obviously, the cd9660_putpages() can be rather trivial (since it
is a read-only FS type 8-)).

A slight (temporary) modification is made to print a diagnostic message
in the case where the underlying filesystem attempts to engage in the
previous behaviour.  Failure is likely to be ungraceful.

Submitted by:	terry@freebsd.org (Terry Lambert)
1998-02-26 06:39:59 +00:00
John Dyson
660957521c Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs.  The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by:	Partially from Tor Egge.
1998-02-25 03:56:15 +00:00
John Dyson
a15403de7b Correct some severe VM tuning problems for small systems (<=16MB), and
improve tuning on larger systems.  (A couple of the VM tuning params for
small systems were so badly chosen that the system could hang under load.)

The broken tuning was originaly my fault.
1998-02-24 10:16:23 +00:00
John Dyson
e47ed70b0f Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time.  The swap pager rundown code
has been clean-up, and unneeded wakeups removed.  Lots of splbio's
are changed to splvm's.  Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
1998-02-23 08:22:48 +00:00
John Dyson
d9bed5bee1 Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden
in a way identically as before.)  I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE.  Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.

Add some accouting for vm_zone memory allocations, and provide properly
for vm_zone allocations out of the kmem_map.  Also move the vm_zone
allocation stats to the VM OID tree from the KERN OID tree.
1998-02-23 07:42:43 +00:00
Bruce Evans
39e4376ba7 Removed unused #includes. 1998-02-20 13:11:54 +00:00
Mike Smith
eed5086e1a Move the 'sw' device off block major #1, which is now occupied by 'wfd'. 1998-02-19 12:15:06 +00:00
Eivind Eklund
303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
John Dyson
157ac55f97 Fix an argument to vn_lock. It appears that alot of the vn_lock usage
is a bit undisciplined, and should be checked carefully.
1998-02-08 14:55:13 +00:00
Eivind Eklund
0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
John Dyson
95461b450d 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
Eivind Eklund
47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
Bruce Evans
e7a5897899 Added #include of <sys/queue.h> so that this file is more "self"-sufficent. 1998-02-03 22:19:35 +00:00
John Dyson
e736cd05cb This fix should help the panic problems in -current. There
were some errors in "interval" management.  Due to the
clustering mechanism, the code is necessarily complex and
error prone.
1998-02-03 00:50:36 +00:00
Bruce Evans
8bcc577e92 Forward declare more structs that are used in prototypes here - don't
depend on <sys/types.h> forward declaring common ones.
1998-02-01 20:08:39 +00:00
John Dyson
1f13bdaa97 Fix a performance problem caused by an earlier commit. 1998-02-01 02:00:20 +00:00
John Dyson
c15541e7a7 contigalloc doesn't place the allocated page(s) into an object, and
now this breaks vm_page_wire (due to wired page accounting per object.)

This should fix a problem as described by Donald Maddox.
1998-01-31 20:30:18 +00:00
John Dyson
eaf13dd73a Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00
Eivind Eklund
f71bb3a160 Turn NSWAPDEV into a new-style option. 1998-01-25 04:13:25 +00:00
Eivind Eklund
7b778b5e61 Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.
This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.
1998-01-24 02:54:56 +00:00
John Dyson
50ce7ff499 Add better support for larger I/O clusters, including larger physical
I/O.  The support is not mature yet, and some of the underlying implementation
needs help.  However, support does exist for IDE devices now.
1998-01-24 02:01:46 +00:00
John Dyson
2d8acc0f4a VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
John Dyson
480ba2f552 Allow gdb to work again. 1998-01-21 12:18:00 +00:00
John Dyson
4722175765 Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap.  Fix a problem with faulting in pages.  Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.
1998-01-17 09:17:02 +00:00
John Dyson
925a3a419a Fix some vnode management problems, and better mgmt of vnode free list.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.

PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.
1998-01-12 01:46:33 +00:00
John Dyson
bf27292b35 Turn off the VTEXT flag when an object is no longer referenced, so
that an executable that is no longer running can be written to.  Also,
clear the OBJ_OPT flag more often, when appropriate.
1998-01-07 03:12:19 +00:00
John Dyson
95e5e988e0 Make our v_usecount vnode reference count work identically to the
original BSD code.  The association between the vnode and the vm_object
no longer includes reference counts.  The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also.  The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached.  The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code.  There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
Alexander Langer
651bb81717 caddr_t --> void * 1997-12-31 02:35:29 +00:00
John Dyson
60f8d46448 Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem
with the object cache removal.
1997-12-29 01:03:55 +00:00
John Dyson
2be70f79f6 Lots of improvements, including restructring the caching and management
of vnodes and objects.  There are some metadata performance improvements
that come along with this.  There are also a few prototypes added when
the need is noticed.  Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.
1997-12-29 00:25:11 +00:00
John Dyson
6d1756a948 The ioopt code is still buggy, but wasn't fully disabled. 1997-12-25 20:55:15 +00:00
John Dyson
b44e4b7a2b Support running with inadequate swap space. Additionally, the code
will complain with a suggestion of increasing it.
1997-12-24 15:05:25 +00:00
John Dyson
998d8cd662 Improve my copyright. 1997-12-22 11:48:13 +00:00
John Dyson
c2e11a039d Change bogus usage of btoc to atop. The incorrect usage of btoc was
pointed out by bde.
1997-12-19 15:31:13 +00:00
John Dyson
1efb74fbcc Some performance improvements, and code cleanups (including changing our
expensive OFF_TO_IDX to btoc whenever possible.)
1997-12-19 09:03:37 +00:00
Eivind Eklund
5591b823d1 Make COMPAT_43 and COMPAT_SUNOS new-style options. 1997-12-16 17:40:42 +00:00
John Dyson
bd28588799 Fix a recursive kernel_map lock problem in vm_zone allocator.
PR:	5298
1997-12-15 05:16:09 +00:00
John Dyson
b0d8408e21 Slight improvement to the vm_zone stats output. Also, some other superficial
cleanups.
1997-12-14 05:17:44 +00:00
John Dyson
8256655132 After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations.  The number
of UP TLB flushes is decreased much more than significantly with these
changes.  Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.
1997-12-14 02:11:23 +00:00
John Dyson
3a2dc656bc Fix the prototype for swapout_procs();
Submitted by:	dima@best.net
1997-12-11 02:10:55 +00:00
John Dyson
ceb0cf87e8 Support an optional, sysctl enabled feature of idle process swapout. This
is apparently useful for large shell systems, or systems  with long running
idle processes.  To enable the feature:

	sysctl -w vm.swap_idle_enabled=1

Please note that some of the other vm sysctl variables have been renamed
to be more accurate.
Submitted by:	Much of it from Matt Dillon <dillon@best.net>
1997-12-06 02:23:36 +00:00
Bruce Evans
1cd52ec333 Don't include <sys/lock.h> in headers when only `struct simplelock' is
required.  Fixed everything that depended on the pollution.
1997-12-05 19:55:52 +00:00
John Dyson
70111b9016 Add new (very useful) tunable for pageout daemon. The flag changes
the maximum pageout rate:

sysctl -w vm.vm_maxlaunder=n

 1 < n < inf.

If paging heavily on large systems, it is likely that a performance
improvement can be achieved by increasing the parameter.  On a large
system, the parm is 32, but numbers as large as 128 can make a big
difference.  If paging is expensive, you might try decreasing the
number to 1-8.
1997-12-05 05:41:06 +00:00
John Dyson
12ac6a1dbb Support applications that need to resist or deny use of swap space.
sysctl -w vm.defer_swap_pageouts=1
	Causes the system to resist the use of swap space.  In low memory
	conditions, performance will decrease.
sysctl -w vm.disable_swap_pageouts=1
	Causes the system to mostly disable the use of swap space.  In
	low memory conditions, the system will likely start killing
	processes.
1997-12-04 19:00:56 +00:00
Poul-Henning Kamp
ab3f746966 In all such uses of struct buf: 's/b_un.b_addr/b_data/g' 1997-12-02 21:07:20 +00:00
Bruce Evans
b672aa4ba6 Removed all traces of P_IDLEPROC. It was tested but never set. 1997-11-24 15:15:33 +00:00
Bruce Evans
5270ecea67 Don't #define max() to get a version that works with vm_ooffset's.
Just use qmax().

This should be fixed more generally using overloaded functions.
1997-11-24 15:03:13 +00:00
Bruce Evans
fe0dd4acd3 Removed unused #include of <sys/malloc.h>. This file now uses only
zalloc().  Many more cases like this are probably obscured by not
including <vm/zone.h> explicitly (it is spammed into <sys/malloc.h>).
1997-11-18 11:02:19 +00:00
Tor Egge
b44959ce49 Simplify map entries during user page wire and user page unwire operations in
vm_map_user_pageable().

Check return value of vm_map_lock_upgrade() during a user page wire operation.
1997-11-14 23:42:10 +00:00
Poul-Henning Kamp
0abc78a697 Rename some local variables to avoid shadowing other local variables.
Found by: -Wshadow
1997-11-07 09:21:01 +00:00
Poul-Henning Kamp
4a11ca4e29 Remove a bunch of variables which were unused both in GENERIC and LINT.
Found by:	-Wunused
1997-11-07 08:53:44 +00:00
Poul-Henning Kamp
cb226aaa62 Move the "retval" (3rd) parameter from all syscall functions and put
it in struct proc instead.

This fixes a boatload of compiler warning, and removes a lot of cruft
from the sources.

I have not removed the /*ARGSUSED*/, they will require some looking at.

libkvm, ps and other userland struct proc frobbing programs will need
recompiled.
1997-11-06 19:29:57 +00:00
John Dyson
0aa8918597 Fix the "missing page" problem. Also, improve the performance of page
allocation in common cases.
1997-11-06 08:35:50 +00:00
Bruce Evans
55b211e3af Removed unused #includes. 1997-10-28 15:59:26 +00:00
John Dyson
5985940e79 Support garbage collecting the pmap pv entries. The management doesn't
happen until the system would have nearly failed anyway, so no signficant
overhead is added.  This helps large systems with lots of processes.
1997-10-25 02:41:56 +00:00
John Dyson
0a80f406b3 Decrease the initial allocation for the zone allocations. 1997-10-24 23:41:04 +00:00
Poul-Henning Kamp
a1c995b626 Last major round (Unless Bruce thinks of somthing :-) of malloc changes.
Distribute all but the most fundamental malloc types.  This time I also
remembered the trick to making things static:  Put "static" in front of
them.

A couple of finer points by:	bde
1997-10-12 20:26:33 +00:00
Poul-Henning Kamp
55166637cd Distribute and statizice a lot of the malloc M_* types.
Substantial input from:	bde
1997-10-11 18:31:40 +00:00
Peter Wemm
3820ec1d4d Attempt to fix the previous fix to the contigmalloc1 prototype.
struct malloc_type isn't defined in all cases (eg: from ddb), and the line
wrapping was very badly mangled.
1997-10-11 10:39:19 +00:00
Poul-Henning Kamp
f0d45e6aae Fix contigmalloc() and contigmalloc1() arguments. 1997-10-10 18:18:47 +00:00
John Dyson
7e00649986 Improve management of pages moving from the inactive to active queue. Additionally,
add some much needed comments.
1997-10-06 02:48:16 +00:00
John Dyson
e7b0208f61 Relax the vnode locking for read only operations. 1997-10-06 02:38:30 +00:00
Peter Wemm
af866d9a23 Fix some style(9) and formatting problems. tabsize 4 formatting doesn't
look too great with 'more' etc.

Approved by: dyson (with a minor grumble :-)
1997-09-21 11:41:12 +00:00
John Dyson
99448ed11d Change the M_NAMEI allocations to use the zone allocator. This change
plus the previous changes to use the zone allocator decrease the useage
of malloc by half.  The Zone allocator will be upgradeable to be able
to use per CPU-pools, and has more intelligent usage of SPLs.  Additionally,
it has reasonable stats gathering capabilities, while making most calls
inline.
1997-09-21 04:24:27 +00:00
Peter Wemm
35b8b2ddab Update select -> poll in drivers. 1997-09-14 03:19:42 +00:00
Peter Wemm
f8ddc1e209 Print correct function name in panics 1997-09-13 15:04:52 +00:00
Jonathan Lemon
987b847efc Do not consider VM_PROT_OVERRIDE_WRITE to be part of the protection
entry when handling a fault.  This is set by procfs whenever it wants
to write to a page, as a means of overriding `r-x COW' entries, but
causes failures in the `rwx' case.

Submitted by:	 bde
1997-09-12 15:58:47 +00:00
Bruce Evans
41fadeeb28 Removed yet more vestiges of config-time swap configuration and/or
cleaned up nearby cruft.
1997-09-07 16:21:11 +00:00
Bruce Evans
79624e2147 Removed unused #includes. 1997-09-01 03:17:34 +00:00
Bruce Evans
4de628dec4 Some staticized variables were still declared to be extern. 1997-09-01 02:55:50 +00:00
Bruce Evans
dfeca1b8ae Print a device number in hex instead of decimal. 1997-09-01 02:28:32 +00:00
Poul-Henning Kamp
a051452ae2 Change the 0xdeadb hack to a flag called VDOOMED.
Introduce VFREE which indicates that vnode is on freelist.
Rename vholdrele() to vdrop().
Create vfree() and vbusy() to add/delete vnode from freelist.
Add vfree()/vbusy() to keep (v_holdcnt != 0 || v_usecount != 0)
  vnodes off the freelist.
Generalize vhold()/v_holdcnt to mean "do not recycle".
Fix reassignbuf()s lack of use of vhold().
Use vhold() instead of checking v_cache_src list.
Remove vtouch(), the vnodes are always vget'ed soon enough
  after for it to have any measuable effect.
Add sysctl debug.freevnodes to keep track of things.
Move cache_purge() up in getnewvnodes to avoid race.
Decrement v_usecount after VOP_INACTIVE(), put a vhold() on
  it during VOP_INACTIVE()
Unmacroize vhold()/vdrop()
Print out VDOOMED and VFREE flags (XXX: should use %b)

Reviewed by:		dyson
1997-08-31 07:32:39 +00:00
Peter Wemm
54f42e4ba0 Allow non-page aligned file offset mmap's, providing that the system is
allowed to choose the address, or that the MAP_FIXED address has the same
remainder when modulo PAGE_SIZE as the file offset.  Apparently this is
posix1003.1b specified behavior.  SVR4 and the other *BSD's allow it too.
It costs us nothing to support and means we don't get EINVAL on some mmap
code that works perfectly elsewhere.

Obtained from: NetBSD
1997-08-30 18:50:06 +00:00
Bruce Evans
b9dcd593ff Fixed type mismatches for functions with args of type vm_prot_t and/or
vm_inherit_t.  These types are smaller than ints, so the prototypes
should have used the promoted type (int) to match the old-style function
definitions.  They use just vm_prot_t and/or vm_inherit_t.  This depends
on gcc features to work.  I fixed the definitions since this is easiest.
The correct fix may be to change the small types to u_int, to optimize
for time instead of space.
1997-08-25 22:15:31 +00:00
John Dyson
89721f6f1a This is a trial improvement for the vnode reference count while on the vnode
free list problem.  Also, the vnode age flag is no longer used by the
vnode pager.  (It is actually incorrect to use then.)  Constructive
feedback welcome -- just be kind.
1997-08-22 03:56:37 +00:00
Bruce Evans
b1037dcd53 #include <machine/limits.h> explicitly in the few places that it is required. 1997-08-21 20:33:42 +00:00
Steve Passe
7cbfd031b6 Added includes of smp.h for SMP.
This eliminates a bazillion warnings about implicit s_lock & friends.
1997-08-18 03:29:21 +00:00
John Dyson
03e9c6c101 Fix kern_lock so that it will work. Additionally, clean-up some of the
VM systems usage of the kernel lock (lockmgr) code.  This is a first
pass implementation, and is expected to evolve as needed.  The API
for the lock manager code has not changed, but the underlying implementation
has changed significantly.  This change should not materially affect
our current SMP or UP code without non-standard parameters being used.
1997-08-18 02:06:35 +00:00
John Dyson
1c5ff0a712 The "cutsie" register parameter passing that I had mistakenly used breaks
profiling.  Since it doesn't really improve perf much, I have backed it
out.
1997-08-10 00:12:13 +00:00
John Dyson
f1c1c5b5a4 More vm_zone cleanup. The sysctl now accounts for items better, and
counts the number of allocations.
1997-08-07 03:52:55 +00:00
John Dyson
507b10b48c Add exposure of some vm_zone allocation stats by sysctl. Also, change
the initialization parameters of some zones in VM map.  This contains
only optimizations and not bugfixes.
1997-08-06 04:58:05 +00:00
John Dyson
ba9be04c72 Fixed the commit botch that was causing crashes soon after system
startup.  Due to the error, the initialization of the zone for
pv_entries was missing.  The system should be usable again.
1997-08-05 23:03:24 +00:00
John Dyson
0d65e566b9 Another attempt at cleaning up the new memory allocator. 1997-08-05 22:24:31 +00:00
John Dyson
b79933ebfa Fix some bugs, document vm_zone better. Add copyright to vm_zone.h. Use
the new zone code in pmap.c so that we can get rid of the ugly ad-hoc
allocations in pmap.c.
1997-08-05 22:07:27 +00:00
John Dyson
f2adc8bb27 Modify pmap to use our new memory allocator. Also, change the vm_map_entry
allocations to be interrupt safe.
1997-08-05 01:32:52 +00:00
John Dyson
565bca977d A very simple zone allocator. 1997-08-05 00:07:31 +00:00
John Dyson
3075778b63 Get rid of the ad-hoc memory allocator for vm_map_entries, in lieu of
a simple, clean zone type allocator.  This new allocator will also be
used for machine dependent pmap PV entries.
1997-08-05 00:02:08 +00:00
Bruce Evans
1fd0b0588f Removed unused #includes. 1997-08-02 14:33:27 +00:00
John Dyson
dc2efb2766 Add the ability for the pageout daemon to measure stats on memory usage before
the system is out of memory.  The daemon does a minimal amount of work that
increases as the system becomes more likely to run out of memory and page in/out.

The default tuning is fairly low in background CPU usage, and sysctl variables
have been added to enable flexable operation.  This is an experimental feature
that will likely be changed and improved over time.
1997-07-27 04:49:19 +00:00
John Dyson
11cccda1de Fix a very subtile problem that causes unnessary numbers of objects backing
a single logical object.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1997-07-27 04:44:12 +00:00
John Dyson
0a0a85b3e0 Add support for 4MB pages. This includes the .text, .data, .data parts
of the kernel, and also most of the dynamic parts of the kernel.  Additionally,
4MB pages will be allocated for display buffers as appropriate (only.)

The 4MB support for SMP isn't complete, but doesn't interfere with operation
either.
1997-07-17 04:34:03 +00:00
Tor Egge
208d433777 Don't try upgrading an existing exclusive lock in vm_map_user_pageable.
This should close PR kern/3180.
Also remove a bogus unconditional call to vm_map_unlock_read in
vm_map_lookup.
1997-06-23 21:51:03 +00:00
Peter Wemm
3b18caba29 Kill some stale leftovers from the earlier attempts at SMP per-cpu pages 1997-06-22 15:47:16 +00:00
John Dyson
3c631446d3 Remove a window during running down a file vnode. Also, the OBJ_DEAD
flag wasn't being respected during vref(), et. al.  Note that this
isn't the eventual fix for the locking problem.  Fine grained SMP
in the VM and VFS code will require (lots) more work.
1997-06-22 03:00:24 +00:00
John Dyson
4a40e3d42f Correct the return code for the mlock system call. Also add the stubs
for mlockall and munlockall.
1997-06-15 23:35:32 +00:00
John Dyson
dbc806e731 Fix a reference problem with maps. Only appears to manifest itself when
sharing address spaces.
1997-06-15 23:33:52 +00:00
Peter Wemm
0228905ae4 Update the #include "opt_smpxxx.h" includes - opt_smp.h isn't needed
very much in the generic parts of the kernel now.
1997-05-29 02:57:22 +00:00
Doug Rabson
32ad9cb531 Fix a few bugs with NFS and mmap caused by NFS' use of b_validoff
and b_validend.  The changes to vfs_bio.c are a bit ugly but hopefully
can be tidied up later by a slight redesign.

PR:		kern/2573, kern/2754, kern/3046 (possibly)
Reviewed by:	dyson
1997-05-19 14:36:56 +00:00
John Dyson
6160099735 Check the correct queue for waking up the pageout daemon. Specifically,
the pageout daemon wasn't always being waken up appropriately when the
(cache + free) queues were depleted.
Submitted by:	David S. Miller <davem@jenolan.rutgers.edu>
1997-05-01 14:36:01 +00:00
Peter Wemm
477a642cee Man the liferafts! Here comes the long awaited SMP -> -current merge!
There are various options documented in i386/conf/LINT, there is more to
come over the next few days.

The kernel should run pretty much "as before" without the options to
activate SMP mode.

There are a handful of known "loose ends" that need to be fixed, but
have been put off since the SMP kernel is in a moderately good condition
at the moment.

This commit is the result of the tinkering and testing over the last 14
months by many people.  A special thanks to Steve Passe for implementing
the APIC code!
1997-04-26 11:46:25 +00:00
Peter Wemm
92da7e012d Send this to the Attic so there's no mixups over which kern_lock.c is in
use in -current.
1997-04-21 13:39:56 +00:00
Peter Wemm
e108835bbc Unused variable (upobj is now purely handled within pmap) 1997-04-14 03:40:42 +00:00
John Dyson
5856e12e69 Fully implement vfork. Vfork is now much much faster than even our
fork. (On my machine, fork is about 240usecs, vfork is 78usecs.)

Implement rfork(!RFPROC !RFMEM), which allows a thread to divorce its memory
	from the other threads of a group.

Implement rfork(!RFPROC RFCFDG), which closes all file descriptors, eliminating
	possible existing shares with other threads/processes.

Implement rfork(!RFPROC RFFDG), which divorces the file descriptors for a
	thread from the rest of the group.

Fix the case where a thread does an exec.  It is almost nonsense for a thread
	to modify the other threads address space by an exec, so we
	now automatically divorce the address space before modifying it.
1997-04-13 01:48:35 +00:00
Peter Wemm
a2a1c95c10 The biggie: Get rid of the UPAGES from the top of the per-process address
space. (!)

Have each process use the kernel stack and pcb in the kvm space.  Since
the stacks are at a different address, we cannot copy the stack at fork()
and allow the child to return up through the function call tree to return
to user mode - create a new execution context and have the new process
begin executing from cpu_switch() and go to user mode directly.
In theory this should speed up fork a bit.

Context switch the tss_esp0 pointer in the common tss.  This is a lot
simpler since than swithching the gdt[GPROC0_SEL].sd.sd_base pointer
to each process's tss since the esp0 pointer is a 32 bit pointer, and the
sd_base setting is split into three different bit sections at non-aligned
boundaries and requires a lot of twiddling to reset.

The 8K of memory at the top of the process space is now empty, and unmapped
(and unmappable, it's higher than VM_MAXUSER_ADDRESS).

Simplity the pmap code to manage process contexts, we no longer have to
double map the UPAGES, this simplifies and should measuably speed up fork().

The following parts came from John Dyson:

Set PG_G on the UPAGES that are now in kernel context, and invalidate
them when swapping them out.

Move the upages object (upobj) from the vmspace to the proc structure.

Now that the UPAGES (pcb and kernel stack) are out of user space, make
rfork(..RFMEM..) do what was intended by sharing the vmspace
entirely via reference counting rather than simply inheriting the mappings.
1997-04-07 07:16:06 +00:00
Peter Wemm
2100d64585 Commit a typo fix that's been sitting in my tree for ages, quite forgotten.
The typo was detected once apon a time with the -Wunused compile option.
The result was that a block of code for implementing
madvise(.. MADV_SEQUENTIAL..) behavior was "dead" and unused, probably
negating the effect of activating the option.

Reviewed by: dyson
1997-04-06 16:16:11 +00:00
John Dyson
7d78abc9d9 Make vm_map_protect be more complete about map simplification. This
is useful when a process changes it's page range protections very
much.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1997-04-06 03:04:31 +00:00
John Dyson
15cb98465f Correction to the prototype for vm_fault. 1997-04-06 02:30:56 +00:00
John Dyson
a04c970a7a Fix the gdb executable modify problem. Thanks to the detective work
by Alan Cox <alc@cs.rice.edu>, and his description of the problem.

The bug was primarily in procfs_mem, but the mistake likely happened
due to the lack of vm system support for the operation.  I added
better support for selective marking of page dirty flags so that
vm_map_pageable(wiring) will not cause this problem again.

The code in procfs_mem is now less bogus (but maybe still a little
so.)
1997-04-06 02:29:45 +00:00
Bruce Evans
3f39dbc52d Removed potentially harmful garbage <vm/lock.h> and fixed bogus
use of it.  It was actually harmless because the use was null due
to fortuitous include orders and identical (wrong) idempotency
macros.
1997-04-01 08:39:07 +00:00
David Greenman
9caaadb63a Changed the way that the exec image header is read to be filesystem-
centric rather than VM-centric to fix a problem with errors not being
detectable when the header is read.
Killed exech_map as a result of these changes.
There appears to be no performance difference with this change.
1997-03-31 11:11:26 +00:00
Bruce Evans
3ac4d1ef0c Don't #include <sys/fcntl.h> in <sys/file.h> if KERNEL is defined.
Fixed everything that depended on getting fcntl.h stuff from the wrong
place.  Most things don't depend on file.h stuff at all.
1997-03-23 03:37:54 +00:00
John Dyson
c5d593ae63 Fix a significant error in the accounting for pre-zeroed pages. This
is a candidate for RELENG_2_2...
1997-03-23 02:44:54 +00:00
John Dyson
eb2c768ebb When removing IN_RECURSE support during the Lite/2 merge, read/write
to/from mmaped regions was broken.  This commit fixes the breakage, and
uses the new Lite/2 locking mechanisms.
1997-03-08 04:33:47 +00:00
Bruce Evans
2f558c3e59 Removed a wrong LK_INTERLOCK flag. 1997-02-27 15:38:41 +00:00
Peter Wemm
6875d25465 Back out part 1 of the MCFH that changed $Id$ to $FreeBSD$. We are not
ready for it yet.
1997-02-22 09:48:43 +00:00
Bruce Evans
697030ed3d Removed vestiges of Mach lock types.
vm_map.h:
Removed #include of <sys/proc.h>.  curproc is only used in some macros
and users of the macros already include <sys/proc.h>.
1997-02-18 14:07:03 +00:00
Garrett Wollman
10825343b0 Provide an alternative interface to contigmalloc() which allows a specific
map to be used when allocating the kernel va (e.g., mb_map).  The VM
gurus may want to look this over.
1997-02-13 19:37:40 +00:00
John Dyson
996c772f58 This is the kernel Lite/2 commit. There are some requisite userland
changes, so don't expect to be able to run the kernel as-is (very well)
without the appropriate Lite/2 userland changes.

The system boots and can mount UFS filesystems.

Untested: ext2fs, msdosfs, NFS
Known problems: Incorrect Berkeley ID strings in some files.
		Mount_std mounts will not work until the getfsent
		library routine is changed.

Reviewed by:	various people
Submitted by:	Jeffery Hsu <hsu@freebsd.org>
1997-02-10 02:22:35 +00:00
John Dyson
5069bf5747 Another fix to inheriting shared segments. Do the copy on write
thing if needed.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1997-01-31 04:10:41 +00:00
David Greenman
098415100b Added a check/panic for v_usecount being 0 (no vnode reference) in
vnode_pager_alloc().
1997-01-24 22:20:23 +00:00
John Dyson
fed9a9032e Fix two problems where a NULL object is dereferenced. One problem
was in the VM_INHERIT_SHARE case of vmspace_fork, and also in vm_map_madvise.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1997-01-22 01:34:48 +00:00
John Dyson
6e20a16589 Make MADV_FREE work better. Specifically, it did not wait for
the page to be unbusy, and it caused some algorithmic problems
as a result.  There were some other problems with it also, so
this is a general cleanup of the code.
Submitted by:	Douglas Crosher <dtc@scrooge.ee.swin.oz.au> and myself.
1997-01-20 02:25:14 +00:00
John Dyson
afa07f7e83 Change the map entry flags from bitfields to bitmasks. Allows
for some code simplification.
1997-01-16 04:16:22 +00:00
David Greenman
649c409d03 Fix bug related to map entry allocations where a sleep might be attempted
when allocating memory for network buffers at interrupt time. This is due
to inadequate checking for the new mcl_map. Fixed by merging mb_map and
mcl_map into a single mb_map.

Reviewed by:	wollman
1997-01-15 20:46:02 +00:00
Bruce Evans
16a02c1105 Removed redundant spl0()'s from kernel processes. They were work-arounds
for a bug in fork().
1997-01-15 19:05:08 +00:00
Jordan K. Hubbard
1130b656e5 Make the long-awaited change from $Id$ to $FreeBSD$
This will make a number of things easier in the future, as well as (finally!)
avoiding the Id-smashing problem which has plagued developers for so long.

Boy, I'm glad we're not using sup anymore.  This update would have been
insane otherwise.
1997-01-14 07:20:47 +00:00
John Dyson
d4a272db61 Slightly correct the code that moves pages from the active to the
inactive queue.  This is only a minor performance improvement, but will
not affect perf on machines that don't have ref bits.
1997-01-11 07:22:24 +00:00
John Dyson
9b5a5d81be Prepare better for multi-platform by eliminating another required
pmap routine (pmap_is_referenced.)  Upper level recoded to use
pmap_ts_referenced.
1997-01-11 07:19:02 +00:00
John Dyson
106031ef73 Undo the collapse breakage (swap space usage problem.) 1997-01-03 17:02:28 +00:00
John Dyson
3c018e7214 Guess what? We left alot of the old collapse code that is not needed
anymore with the "full" collapse fix that we added about 1yr ago!!!  The
code has been removed by optioning it out for now, so we can put it back
in ASAP if any problems are found.
1997-01-01 04:45:05 +00:00
John Dyson
8cc7e047a3 A very significant improvement in the management of process maps
and objects.  Previously, "fancy" memory management techniques
such as that used by the M3 RTS would have the tendancy of chopping
up processes allocated memory into lots of little objects.  Alan
has come up with some improvements to migtigate the sitution to
the point where even the M3 RTS only has one object for bss and
it's managed memory (when running CVSUP.)  (There are still cases where the
situation isn't improved when the system pages -- but this is much much
better for the vast majority of cases.)  The system will now be able
to much more effectively merge map entries.

Submitted by:	Alan Cox <alc@cs.rice.edu>
1996-12-31 16:23:38 +00:00
John Dyson
d0aea04fe0 Let the VM system know that on certain arch's that VM_PROT_READ
also implies VM_PROT_EXEC.  We support it that way for now,
since the break system call by default gives VM_PROT_ALL.  Now
we have a better chance of coalesing map entries when mixing
mmap/break type operations.  This was contributing to excessive
numbers of map entries on the modula-3 runtime system.  The
problem is still not "solved", but the situation makes more
sense.

Eventually, when we work on architectures where VM_PROT_READ
is orthogonal to VM_PROT_EXEC, we will have to visit this
issue carefully (esp. regarding security issues.)
1996-12-30 05:31:21 +00:00
John Dyson
bc0d333478 EEEK!!! useracc and kernacc didn't lock their respective
maps.  Additionally, eliminate the map->hint distortion
associated with useracc.  That may/may-not be the "right"
thing to do -- but time will tell.
Submitted by:	Partially by Alan Cox <alc@cs.rice.edu>
1996-12-30 03:56:11 +00:00
John Dyson
595236df9b Superficial cleanup of comment. 1996-12-29 02:33:12 +00:00
John Dyson
b7b2aac2b6 Eliminate the redundancy due to the similarity between the routines
vm_map_simplify and vm_map_simplify_entry.  Make vm_map_simplify_entry
handle wired maps so that we can get rid of vm_map_simplify.  Modify
the callers of vm_map_simplify to properly use vm_map_simplify_entry.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1996-12-28 23:07:49 +00:00
John Dyson
94328e9057 The code unnecessarily created an object with no handle up-front, which
has the negative effect of disabling some map optimizations.  This
patch defers the creation of the object until it needs to be at fault time.
Submitted by:	Alan Cox <alc@cs.rice.edu>
1996-12-28 22:40:44 +00:00
Joerg Wunsch
e9822d926c Make DFLDSIZ and MAXDSIZ fully-supported options.
"Don't forget to do a ``make depend''" :-)
1996-12-22 23:17:09 +00:00
John Dyson
7aaaa4fd5d Implement closer-to POSIX mlock semantics. The major difference is
that we do allow mlock to span unallocated regions (of course, not
mlocking them.)  We also allow mlocking of RO regions (which the old
code couldn't.)  The restriction there is that once a RO region is
wired (mlocked), it cannot be debugged (or EVER written to.)

Under normal usage, the new mlock code will be a significant improvement
over our old stuff.
1996-12-14 17:54:17 +00:00
John Dyson
0362d7d737 Expunge inlines... 1996-12-07 07:44:05 +00:00
John Dyson
62487bb4db Fix a map entry leak problem found by DG. Also, de-inline a function
vm_map_entry_dispose, because it won't help being inlined.
1996-12-07 06:19:37 +00:00
John Dyson
cdc2c29161 Make vm_map_insert much more intelligent in the MAP_NOFAULT case so
that map entries are coalesced when appropriate.  Also, conditionalize
some code that is currently not used in vm_map_insert.  This mod
has been added to eliminate unnecessary map entries in buffer map.

Additionally, there were some cases where map coalescing could be done
when it shouldn't.  That problem has been resolved.
1996-12-07 00:03:43 +00:00
John Dyson
09e0c6ccdd Implement a new totally dynamic (up to MAXPHYS) buffer kva allocation
scheme.  Additionally, add the capability for checking for unexpected
kernel page faults.  The maximum amount of kva space for buffers hasn't
been decreased from where it is, but it will now be possible to do so.

This scheme manages the kva space similar to the buffers themselves.  If
there isn't enough kva space because of usage or fragementation, buffers
will be reclaimed until a buffer allocation is successful.  This scheme
should be very resistant to fragmentation problems until/if the LFS code
is fixed and uses the bogus buffer locking scheme -- but a 'fixed' LFS
is not likely to use such a scheme.

Now there should be NO problem allocating buffers up to MAXPHYS.
1996-11-30 22:41:49 +00:00
John Dyson
e0c5a895f1 Make the kernel smaller with at worst a neutral effect on perf by
de-inlining some VM calls.  (Actually, I measured a small improvement.)
1996-11-28 23:15:07 +00:00
John Dyson
5b0a74089d Improve the locality of reference for variables in vm_page and
vm_kern by moving them from .bss to .data.  With this change,
there is a measurable perf improvement in fork/exec.
1996-11-17 02:38:31 +00:00
John Dyson
db2c0faa4c Vastly improved contigmalloc routine. It does not solve the
problem of allocating contiguous buffer memory in general, but
make it much more likely to work at boot-up time.  The best
chance for an LKM-type load of a sound driver is immediately
after the mount of the root filesystem.

This appears to work for a 64K allocation on an 8MB system.
1996-11-05 04:19:08 +00:00
John Dyson
851c12ff1d Change mmap to use OBJT_DEFAULT instead of OBJT_SWAP by default
for anonymous objects.  The system will automatically change the
type to SWAP if needed (for size or pageout reasons.)
1996-10-29 22:07:11 +00:00
Poul-Henning Kamp
281cd9b020 The way we get a vnode for swapdev is not quite kosher. In particular
it breaks in the DEVFS_ROOT case.  replicate a bit too much of bdevvp()
in here to circumvent the problem.  The real problem is the magic that
lives in bdevsw[1].
1996-10-27 22:31:00 +00:00
John Dyson
fcae040bc0 Remove a bogus optimization in the mmap code. It is superfluous,
and at best is the same speed as the unoptimized code.  At worst, it
slows down trivial programs.
1996-10-24 02:56:23 +00:00
John Dyson
a669a6e9a9 Make processes waken up eligible for immediate swap-in. 1996-10-17 02:58:20 +00:00
John Dyson
ad98052216 Clean up the rundown of the object backing a vnode. This should fix
NFS problems associated with forcible dismounts.
1996-10-17 02:49:35 +00:00
Bruce Evans
aad9af2ba3 Removed nested include of <sys/proc.h> from <vm/vm_object.h> and fixed
the one place that depended on it.  wakeup() is now prototyped in
<sys/systm.h> so that it is normally visible.

Added nested include of <sys/queue.h> in <vm/vm_object.h>.  The queue
macros are a more fundamental prerequisite for <vm/vm_object.h> than
the wakeup prototype and previously happened to be included by
namespace pollution from <sys/proc.h> or elsewhere.
1996-10-15 18:24:34 +00:00
John Dyson
675878e732 Move much of the machine dependent code from vm_glue.c into
pmap.c.  Along with the improved organization, small proc fork
performance is now about 5%-10% faster.
1996-10-15 03:16:45 +00:00
Poul-Henning Kamp
1111860c29 Remove a stale comment. 1996-10-13 07:16:50 +00:00
Bruce Evans
8ba0c490a9 Removed __pure's and __pure2's. __pure is a no-op for recent versions
of gcc by definition, and __pure2 is a no-op in effect (presumably the
compiler can see when an inline function has no side effects).
1996-10-12 20:09:48 +00:00
John Dyson
853a7bc893 Make the default cache size optim to be 256K, the old default was
64K.  The change has essentially neutral effect on those machines with
little or no cache, and has a positive effect on "normal" machines
with 256K or more cache.
1996-10-06 22:26:13 +00:00
John Dyson
f7d6dab2fd Fix a problem with the page coloring code that the system will not always
be able to use all of the free pages.  This can manifest as a panic
using DIAGNOSTIC, or as a panic on an indirect memory reference.
1996-10-06 18:27:39 +00:00
Bruce Evans
322dfc2bd5 Fixed undeclared variables for the !(PQ_L2_SIZE > 1) case.
Removed redundant #include.
1996-09-28 17:53:18 +00:00
John Dyson
a2f4a84696 Reviewed by:
Submitted by:
Obtained from:
1996-09-28 03:33:40 +00:00
David Greenman
cd6eea255f Fixed bug with reversed trunc/round_page() in madvise...start must be
trunced, end must be rounded.
1996-09-19 10:12:41 +00:00
Bruce Evans
dd106ca742 Removed iprintf(). It was copied to db_iprintf() in ddb. 1996-09-15 11:24:21 +00:00
Bruce Evans
c7c34a24a3 Attached vm ddb commands show map', show vmochk', `show object',
`show vmopag', `show page' and `show pageq'.  Moved all vm ddb stuff
to the ends of the vm source files.

Changed printf() to db_printf(), `indent' to db_indent, and iprintf()
to db_iprintf() in ddb commands.  Moved db_indent and db_iprintf()
from vm to ddb.

vm_page.c:
Don't use __pure.  Staticized.

db_output.c:
Reduced page width from 80 to 79 to inhibit double spacing for long
lines (there are still some problems if words are printed across
column 79).
1996-09-14 11:54:59 +00:00
John Dyson
9fea9a6f5b The whole issue of not support VOP_LOCK for VBLK devices should be
rethought.  This fixes YET another problem with unmounting filesystems.
The root cause is not fixed here, but at least the problem has gone
away.
1996-09-10 05:28:23 +00:00
John Dyson
4334b0d815 Fixed the use of the wrong variable in vm_map_madvise. 1996-09-08 23:49:47 +00:00
John Dyson
5070c7f8c5 Addition of page coloring support. Various levels of coloring are afforded.
The default level works with minimal overhead, but one can also enable
full, efficient use of a 512K cache.  (Parameters can be generated
to support arbitrary cache sizes also.)
1996-09-08 20:44:49 +00:00
John Dyson
b8e251a56d Improve the scalability of certain pmap operations. 1996-09-08 16:57:53 +00:00
John Dyson
6476c0d204 Even though this looks like it, this is not a complex code change.
The interface into the "VMIO" system has changed to be more consistant
and robust.  Essentially, it is now no longer necessary to call vn_open
to get merged VM/Buffer cache operation, and exceptional conditions
such as merged operation of VBLK devices is simpler and more correct.

This code corrects a potentially large set of problems including the
problems with ktrace output and loaded systems, file create/deletes,
etc.

Most of the changes to NFS are cosmetic and name changes, eliminating
a layer of subroutine calls.  The direct calls to vput/vrele have
been re-instituted for better cross platform compatibility.

Reviewed by: davidg
1996-08-21 21:56:23 +00:00
John Dyson
67bf686897 Backed out the recent changes/enhancements to the VM code. The
problem with the 'shell scripts' was found, but there was a 'strange'
problem found with a 486 laptop that we could not find.  This commit
backs the code back to 25-jul, and will be re-entered after the snapshot
in smaller (more easily tested) chunks.
1996-07-30 03:08:57 +00:00
David Greenman
0f281c28fa Slight performance tweak for previous commit. 1996-07-28 02:54:09 +00:00
John Dyson
f230c45cbe Undo part of the scalability commit. Many of the changes
in vm_fault had some performance enhancements not ready
for prime time.  This commit backs out some of the changes.
1996-07-28 01:14:01 +00:00
John Dyson
bf6dfc7b35 Allow sequentially created mmap'ed anonymous regions to coalesce. There
is little or no reason to create a swap pager for small mmap's.  The
vm_map_insert code will automatically create a swap pager if the object
becomes too large.  This fix, per a request from phk.
1996-07-27 17:21:41 +00:00
John Dyson
3b297e93b8 Clean up some lint. 1996-07-27 04:22:12 +00:00
John Dyson
feb32a8fa9 Remove experimental header file. My test-build must have picked it
up in an unexpected place.
Submitted by:	jkh
1996-07-27 04:06:11 +00:00
John Dyson
819c1c6f43 Missing (prototype) change from the previous commit. 1996-07-27 03:47:35 +00:00
John Dyson
4f4d35edf0 This commit is meant to solve a couple of VM system problems or
performance issues.

	1) The pmap module has had too many inlines, and so the
	   object file is simply bigger than it needs to be.
	   Some common code is also merged into subroutines.
	2) Removal of some *evil* PHYS_TO_VM_PAGE macro calls.
	   Unfortunately, a few have needed to be added also.
	   The removal caused the need for more vm_page_lookups.
	   I added lookup hints to minimize the need for the
	   page table lookup operations.
	3) Removal of some bogus performance improvements, that
	   mostly made the code more complex (tracking individual
	   page table page updates unnecessarily).  Those improvements
	   actually hurt 386 processors perf (not that people who
	   worry about perf use 386 processors anymore :-)).
	4) Changed pv queue manipulations/structures to be TAILQ's.
	5) The pv queue code has had some performance problems since
	   day one.  Some significant scalability issues are resolved
	   by threading the pv entries from the pmap AND the physical
	   address instead of just the physical address.  This makes
	   certain pmap operations run much faster.  This does
	   not affect most micro-benchmarks, but should help loaded system
	   performance *significantly*.  DG helped and came up with most
	   of the solution for this one.
	6) Most if not all pmap bit operations follow the pattern:
		pmap_test_bit();
		pmap_clear_bit();
	   That made for twice the necessary pv list traversal.   The
	   pmap interface now supports only pmap_tc_bit type operations:
	   pmap_[test/clear]_modified, pmap_[test/clear]_referenced.
	   Additionally, the modified routine now takes a vm_page_t arg
	   instead of a phys address.  This eliminates a PHYS_TO_VM_PAGE
	   operation.
	7) Several rewrites of routines that contain redundant code to
	   use common routines, so that there is a greater likelihood of
	   keeping the cache footprint smaller.
1996-07-27 03:24:10 +00:00
Bruce Evans
6ab46d52a5 Don't use NULL in non-pointer contexts. 1996-07-12 04:12:25 +00:00
John Dyson
502ba6e4a8 Back-off on the previous commit, specifically remove the look-ahead
optimization on the active queue scan.  I will do this correctly later.
1996-07-08 03:22:55 +00:00
John Dyson
c8c4b40cca Fix a problem with the pageout daemon RSS limiting, where it degrades
performance to LRU or worse when RSS limiting takes effect.  Also,
make an end condition in the active queue scan more efficient in the
case where pages are removed from the active queue as a side effect
of a pmap operation.
1996-07-08 02:25:53 +00:00
David Greenman
9579ee641a In all special cases for spl or page_alloc where kmem_map is check for,
mb_map (a submap of kmem_map) must also be checked.
Thanks to wcarchive (err...sort of) for demonstrating this bug.
1996-07-07 03:27:41 +00:00
John Dyson
a6e6bcc5f4 Properly set the PG_MAPPED and PG_WRITEABLE flags. This fixes some potential
problems with vm_map_remove/vm_map_delete.
1996-07-02 02:08:02 +00:00
John Dyson
877329e059 Make -current consistant with -stable regarding time that a process
sleeps before being swapped out.  The time is increased from 4 secs to
10 secs.  Originally I had decreased it from 20 to 4, but that is a bit
severe.  20 is too long though.
1996-06-30 21:16:18 +00:00
David Greenman
01155bd720 Make sure we have an object in the map entry before trying to trim pages
from it.
1996-06-29 09:17:17 +00:00
John Dyson
38efa82b23 This commit does a couple of things:
Re-enables the RSS limiting, and the routine is now tail-recursive,
	making it much more safe (eliminates the possiblity of kernel stack
	overflow.) Also, the RSS limiting is a little more intelligent about
	finding the likely objects that are pushing the process over the limit.

	Added some sysctls that help with VM system tuning.

New sysctl features:
	1)	Enable/disable lru pageout algorithm.
		vm.pageout_algorithm = 0, default algorithm that works
			well, especially using X windows and heavy
			memory loading.  Can have adverse effects,
			sometimes slowing down program loading.

		vm.pageout_algorithm = 1, close to true LRU.  Works much
			better than clock, etc.  Does not work as well as
			the default algorithm in general.  Certain memory
			"malloc" type benchmarks work a little better with
			this setting.

		Please give me feedback on the performance results
		associated with these.

	2)	Enable/disable swapping.
		vm.swapping_enabled = 1, default.

		vm.swapping_enabled = 0, useful for cases where swapping
			degrades performance.

		The config option "NO_SWAPPING" is still operative, and
		takes precedence over the sysctl.  If "NO_SWAPPING" is
		specified, the sysctl still exists, but "vm.swapping_enabled"
		is hard-wired to "0".

Each of these can be changed "on the fly."
1996-06-26 05:39:27 +00:00
John Dyson
f0e2953e5e Fix some serious problems with limits checking in the sbrk(2)/brk(2)
code.
Reviewed by:	bde
1996-06-25 00:36:46 +00:00
John Dyson
a001376dc3 Remove RSS limiting until I rewrite the code to be non-recursive. The
code can overrun the kernel stack under very stressful conditions.
1996-06-24 04:30:24 +00:00
John Dyson
2a4eb04bfd Improve algorithm for page hash queue. It was previously about
as bad as it could be.  This algorithm appears to improve fork
performance (barely) measurably.
1996-06-21 05:39:22 +00:00
John Dyson
ef743ce6ed Several bugfixes/improvements:
1) Make it much less likely to miss a wakeup in vm_page_free_wakeup
	2) Create a new entry point into pmap: pmap_ts_referenced, eliminates
	   the need to scan the pv lists twice in many cases.  Perhaps there
	   is alot more to do here to work on minimizing pv list manipulation
	3) Minor improvements to vm_pageout including the use of pmap_ts_ref.
	4) Major changes and code improvement to pmap.  This code has had
	   several serious bugs in page table page manipulation.  In order
	   to simplify the problem, and hopefully solve it for once and all,
	   page table pages are no longer "managed" with the pv list stuff.
	   Page table pages are only (mapped and held/wired) or
	   (free and unused) now.  Page table pages are never inactive,
	   active or cached.  These changes have probably fixed the
	   hold count problems, but if they haven't, then the code is
	   simpler anyway for future bugfixing.
	5) The pmap code has been sorely in need of re-organization, and I
	   have taken a first (of probably many) steps.  Please tell me
	   if you have any ideas.
1996-06-17 03:35:40 +00:00
John Dyson
b5b40fa62b Various bugfixes/cleanups from me and others:
1) Remove potential race conditions on waking up in vm_page_free_wakeup
   by making sure that it is at splvm().
2) Fix another bug in vm_map_simplify_entry.
3) Be more complete about converting from default to swap pager
   when an object grows to be large enough that there can be
   a problem with data structure allocation under low memory
   conditions.
4) Make some madvise code more efficient.
5) Added some comments.
1996-06-16 20:37:31 +00:00
David Greenman
664275648a Move a case of PG_MAPPED being set before a pmap_enter(). This will likely
make no difference, but it will make it consistent with other uses of
PG_MAPPED.
1996-06-14 23:26:40 +00:00
John Dyson
419702a468 Fix a very significant cnt.v_wire_count leak in vm_page.c, and some
minor leaks in pmap.c.  Bruce Evans made me aware of this problem.
1996-06-12 06:52:12 +00:00
John Dyson
5fcf66debe Fix some serious errors in vm_map_simplify_entries. 1996-06-12 04:03:21 +00:00
John Dyson
3091ee0955 Mostly superficial code improvements, add a diagnostic. The
code improvements include significant simplification of the reservation
of the swap pager control blocks for reads.  Add a panic for an inconsistent
swap pager control block count.
1996-06-10 04:58:48 +00:00
John Dyson
c82b01813e Keep the vm_fault/vm_pageout from getting into an "infinite paging loop", by
reserving "cached" pages before waking up the pageout daemon.  This will reserve
the faulted page, and keep the system from thrashing itself to death given
this condition.
1996-06-10 00:25:40 +00:00
John Dyson
886d3e1150 Adjust the threshold for blocking on movement of pages from the cache
queue in vm_fault.

Move the PG_BUSY in vm_fault to the correct place.

Remove redundant/unnecessary code in pmap.c.

Properly block on rundown of page table pages, if they are busy.

I think that the VM system is in pretty good shape now, and the following
individuals (among others, in no particular order) have helped with this
recent bunch of bugs, thanks!  If I left anyone out, I apologize!

Stephen McKay, Stephen Hocking, Eric J. Chet, Dan O'Brien, James Raynard,
Marc Fournier.
1996-06-08 06:48:35 +00:00
John Dyson
6b6f000870 Keep page-table pages from ever being sensed as dirty. This should fix
some problems with the page-table page management code, since it can't
deal with the notion of page-table pages being paged out or in transit.
Also, clean up some stylistic issues per some suggestions from
Stephen McKay.
1996-06-05 03:31:49 +00:00
John Dyson
ff97964a2e Disable madvise optimizations for device pager objects (some of the
operations don't work with FICTITIOUS pages.)  Also, close a window
between PG_MANAGED and pmap_enter that can mess up the accounting of
the managed flag.  This problem could likely cause a hold_count error
for page table pages.
1996-06-01 20:50:57 +00:00
John Dyson
f35329ac0f This commit is dual-purpose, to fix more of the pageout daemon
queue corruption problems, and to apply Gary Palmer's code cleanups.
David Greenman helped with these problems also.  There is still
a hang problem using X in small memory machines.
1996-05-31 00:38:04 +00:00
John Dyson
545901f794 Correct some unfortunately chosen constants, otherwise, not enough
pages are calculated for deferred allocation of swap pager data structures.
This is a follow-on to the previous commit to this file.
1996-05-29 06:33:30 +00:00
John Dyson
b182ec9eb4 After careful review by David Greenman and myself, David had found a
case where blocking can occur, thereby giving other process's a chance
to modify the queue where a page resides.  This could cause numerous
process and system failures.
1996-05-29 05:15:33 +00:00
John Dyson
a5b6fd29a3 Make sure that pageout deadlocks cannot occur. There is a problem
that the datastructures needed to support the swap pager can take
enough space to fully deplete system memory, and cause a deadlock.
This change keeps large objects from being filled with dirty pages
without the appropriate swap pager datastructures.  Right now,
default objects greater than 1/4 the size of available system memory
are converted to swap objects, thereby eliminating the risk of deadlock.
1996-05-29 05:12:23 +00:00
John Dyson
85a376eb93 Fix a couple of problems in the pageout_scan routine. First, there is
a condition when blocking can occur, and the daemon did not check properly
for a page remaining on the expected queue.  Additionally, the inactive
target was being set much too large for small memory machines.  It is now
being calculated based upon the amount of user memory available on every
pageout daemon run.  Another problem was that if memory was very low, the
pageout daemon could fail repeatedly to traverse the inactive queue.
1996-05-26 07:52:09 +00:00
John Dyson
0ed4376231 I think this covers (fixes) the last batch of freeing active/held/busy page
problem.  BY MISTAKE, the vm_page_unqueue (or equiv) was removed from the
vm_fault code.  Really bad things appear to happen if a page is on a queue
while it is being faulted.
1996-05-26 05:30:33 +00:00
John Dyson
f777ab7b8b Add an assert to vm_page_cache. We should never cache a dirty page. 1996-05-24 05:20:15 +00:00
John Dyson
1eeaa1e31f Add apparently needed splvm protection to the active queue, and eliminate
an unnecessary test for dirty pages if it is already known to be dirty.
1996-05-24 05:19:15 +00:00
John Dyson
3077a9c2f4 Eliminate inefficient check for dirty pages for pages in the PQ_CACHE
queue.  Also, modify the MADV_FREE policy (it probably still isn't the final
version.)
1996-05-24 05:17:21 +00:00
John Dyson
a9d4727439 Make the conversion from the default pager to swap pager more robust
in the face of low memory conditions.
1996-05-24 05:14:44 +00:00
John Dyson
99ea1af0a6 Eliminate a vm_page_free, busy panic, in kern_malloc. 1996-05-23 02:24:55 +00:00
John Dyson
0a47b48b9f Initial support for MADV_FREE, support for pages that we don't care
about the contents anymore.  This gives us alot of the advantage of
freeing individual pages through munmap, but with almost none of the
overhead.
1996-05-23 00:45:58 +00:00
John Dyson
4a62209c07 After reviewing the previous commit to vm_object, the page protection
is never necessary, not just for PG_FICTICIOUS.
1996-05-21 17:13:31 +00:00
John Dyson
07c647c528 Don't protect non-managed pages off during object rundown. This fixes
a hang that occurs under certain circumstances when exiting X.
1996-05-21 05:26:27 +00:00
John Dyson
867a482d66 Initial support for mincore and madvise. Both are almost fully
supported, except madvise does not page in with MADV_WILLNEED, and
MADV_DONTNEED doesn't force dirty pages out.
1996-05-19 07:36:50 +00:00
John Dyson
7f5fe93fc7 One more file missing from the mega-commit. This inlines some very
simple routines in vm_page.c, so that an unnecessary subroutine call
is removed.
1996-05-18 04:00:18 +00:00
John Dyson
1b4435b8ce File mistakenly left out of the previous mega-commit. This provides
a global defn for 'exech_map.'
1996-05-18 03:52:13 +00:00
John Dyson
b18bfc3da7 This set of commits to the VM system does the following, and contain
contributions or ideas from Stephen McKay <syssgm@devetir.qld.gov.au>,
Alan Cox <alc@cs.rice.edu>, David Greenman <davidg@freebsd.org> and me:

	More usage of the TAILQ macros.  Additional minor fix to queue.h.
	Performance enhancements to the pageout daemon.
		Addition of a wait in the case that the pageout daemon
		has to run immediately.
		Slightly modify the pageout algorithm.
	Significant revamp of the pmap/fork code:
		1) PTE's and UPAGES's are NO LONGER in the process's map.
		2) PTE's and UPAGES's reside in their own objects.
		3) TOTAL elimination of recursive page table pagefaults.
		4) The page directory now resides in the PTE object.
		5) Implemented pmap_copy, thereby speeding up fork time.
		6) Changed the pv entries so that the head is a pointer
		   and not an entire entry.
		7) Significant cleanup of pmap_protect, and pmap_remove.
		8) Removed significant amounts of machine dependent
		   fork code from vm_glue.  Pushed much of that code into
		   the machine dependent pmap module.
		9) Support more completely the reuse of already zeroed
		   pages (Page table pages and page directories) as being
		   already zeroed.
	Performance and code cleanups in vm_map:
		1) Improved and simplified allocation of map entries.
		2) Improved vm_map_copy code.
		3) Corrected some minor problems in the simplify code.
	Implemented splvm (combo of splbio and splimp.)  The VM code now
		seldom uses splhigh.
	Improved the speed of and simplified kmem_malloc.
	Minor mod to vm_fault to avoid using pre-zeroed pages in the case
		of objects with backing objects along with the already
		existant condition of having a vnode.  (If there is a backing
		object, there will likely be a COW...  With a COW, it isn't
		necessary to start with a pre-zeroed page.)
	Minor reorg of source to perhaps improve locality of ref.
1996-05-18 03:38:05 +00:00
Garrett Wollman
cb7545a995 Allocate mbufs from a separate submap so that NMBCLUSTERS works as
expected.
1996-05-10 19:28:55 +00:00
Poul-Henning Kamp
aa8de40ae5 Another sweep over the pmap/vm macros, this time with more focus on
the usage.  I'm not satisfied with the naming, but now at least there is
less bogus stuff around.
1996-05-03 21:01:54 +00:00
Poul-Henning Kamp
e911eafcba removed:
CLBYTES PD_SHIFT PGSHIFT NBPG PGOFSET CLSIZELOG2 CLSIZE pdei()
        ptei() kvtopte() ptetov() ispt() ptetoav() &c &c
new:
        NPDEPG

Major macro cleanup.
1996-05-02 14:21:14 +00:00
Poul-Henning Kamp
a8c5fef5e6 KGDB is dead. It may come back one day if somebody does it. 1996-05-02 09:34:51 +00:00
John Dyson
3ea2f344e0 Move the map entry allocations from the kmem_map to the kernel_map. As
a side effect, correct the associated object offset.
1996-04-29 22:04:57 +00:00
John Dyson
0891ef4c9a This fixes kmem_malloc/kmem_free (and malloc/free of objects of > 8K).
A page index was calculated incorrectly in vm_kern, and vm_object_page_remove
removed pages that should not have been.
1996-04-24 04:16:45 +00:00
Bruce Evans
bd105bb750 Fixed a spl hog. The vmdaemon process ran entirely at splhigh. It
sometimes disabled clock interrupts for 60 msec or more on a P133.
Clock interrupts were lost ...

Reviewed by:	dyson
1996-04-11 21:05:25 +00:00
John Dyson
d3a3498598 Reinstitute the map lock for processes being swapped out. This
is needed because of the vm_fault used to bring the page table page
for the kernel stack (UPAGES) back in.  The consequence of the
previous incorrect change was a system hang.
1996-04-09 04:36:58 +00:00
John Dyson
b5cfb15fad Map lock checks not needed anymore for swapping out. We don't use
map operations for it anymore.  Certain deadlocks should never happen
anymore.
1996-04-08 03:42:01 +00:00
Bruce Evans
6ffde942bf Removed never-used #includes of <machine/cpu.h>. Many were apparently
copied from bad examples.
1996-04-07 17:39:28 +00:00
John Dyson
030ad08012 Fixed a problem that the UPAGES of a process were being run down
in a suboptimal manner.  I had also noticed some panics that appeared
to be at least superficially caused by this problem.  Also, included
are some minor mods to support more general handling of page table page
faulting.  More details in a future commit.
1996-04-03 05:23:44 +00:00
David Greenman
46268a606f Revert to previous calculation of vm_object_cache_max: it simply works
better in most real-world cases.
1996-03-29 06:28:48 +00:00
Bruce Evans
8375baabed Undid last revision. It duplicated part of second last revision. 1996-03-28 15:40:17 +00:00
Marc G. Fournier
7f49be143c devfs_add_devsw() -> devfs_add_devswf modifications
Reviewed by:	julian@freebsd.org
1996-03-28 14:36:48 +00:00
John Dyson
bb35ebd6cc Add a function prototype for pmap_prefault. 1996-03-28 04:54:50 +00:00
John Dyson
30dcfc09f2 VM performance improvements, and reorder some operations in VM fault
in anticipation of a fix in pmap that will allow the mlock system call to work
without panicing the system.
1996-03-28 04:53:28 +00:00
John Dyson
f32dbbeeed More map_simplify fixes from Alan Cox. This very significanly improves the
performance when the map has been chopped up.  The map simplify operations
really work now.
Reviewed by: dyson
Submitted by:	Alan Cox <alc@cs.rice.edu>
1996-03-28 04:22:17 +00:00
Bruce Evans
5ea390eff5 Added drum device.
Submitted by:	partly by "Marc G. Fournier" <scrappy@ki.net>
1996-03-27 20:09:26 +00:00
John Dyson
ad5dd2341c Fix the problem that unmounting filesystems that are backed by a VMIO
device have reference count problems.  We mark the underlying object
ono-persistent, and account for the reference count that the VM system
maintainsfor the special device close.  This should fix the removable
device problem.
1996-03-19 05:13:22 +00:00
David Greenman
8f2ec877b8 Force device mappings to always be shared. It doesn't make sense for them
to ever be COW and we need the mappings to be shared for backward
compatibilty.

Reviewed by:	dyson
1996-03-16 15:00:05 +00:00
John Dyson
308c24ba5e This commit is as a result of a comment by Alan Cox (alc@cs.rice.edu)
regarding the "real" problem with maps that we have been having
over the last few weeks.  He noted that the first_free pointer was
left dangling in certain circumstances -- and he was right!!!   This
should fix the map problems that we were having, and also give us the
advantage of being able to simplify maps more aggressively.
1996-03-13 01:18:14 +00:00
John Dyson
2fc2c638d5 Fix the map corruption problem that appears as a u_map allocation
error.
1996-03-12 13:46:13 +00:00
John Dyson
5850152d95 Allow mmap'ed devices to work correctly across forks. The sanest
solution appeared to be to allow the child to maintain the same mapping as
the parent.
1996-03-12 02:27:20 +00:00
Jeffrey Hsu
1b67ec6de9 For Lite2: proc LIST changes.
Reviewed by:	 davidg & bde
1996-03-11 06:11:43 +00:00
John Dyson
9ea857084d Delay forking a process until there are more pages available. It was
possible to deadlock with the low threshold that we had used.
1996-03-09 06:57:53 +00:00
John Dyson
9ee58740bc Modify a threshold for waking up the pageout daemon. Also, add a consistancy
check for making sure that held pages aren't freed (DG).
1996-03-09 06:56:39 +00:00
John Dyson
c68f9c929b Add a missing initialization of the hold_count for device pager ficticiouse
pages.
1996-03-09 06:54:41 +00:00
John Dyson
6ac5bfdb3a Fix a calculation for a paging parameter. 1996-03-09 06:53:27 +00:00
John Dyson
67cc64f4c7 Fix two problems:
The pmap_remove in vm_map_clean incorrectly unmapped the entire
	map entry.
	The new vm_map_simplify_entry code had an error (the offset
	of the combined map entry was not set correctly.)
Submitted by:	Alan Cox <alc@cs.rice.edu>
1996-03-09 06:52:05 +00:00
John Dyson
65bc79b85f Set the page valid bits in fewer places, as opposed to being scattered
in various places.
1996-03-09 06:48:26 +00:00
John Dyson
45952afcc7 Fix a problem in the swap pager that caused some of the pages that
were paged in under low swap space conditions to both loose their
backing store and their dirty bits.  This would cause pages to
be demand zeroed under certain conditions in low VM space conditions
and consequential sig-11's or sig-10's.  This situation was made
worse lately when the level for swap space reclaim threshold was
increased.
1996-03-06 04:31:46 +00:00
John Dyson
8a02c104f9 Fix a problem that pages in a mapped region were not always
properly invalidated.  Now we traverse the object shadow chain
properly.
1996-03-04 02:04:24 +00:00
John Dyson
836e5d1360 In order to fix some concurrency problems with the swap pager early
on in the FreeBSD development, I had made a global lock around the
rlist code.  This was bogus, and now the lock is maintained on a
per resource list basis.  This now allows the rlist code to be used for
almost any non-interrupt level application.
1996-03-03 21:11:08 +00:00
Peter Wemm
5e004bea6f Remove the #ifdef notyet from the prototype of vm_map_simplify. John
re-enabled the function but missed the prototype, causing a warning.
1996-03-03 18:53:10 +00:00
Peter Wemm
9154ee6aec Oops.. I nearly forgot the actual core of the length/rounding/etc fixes
that Bruce asked for.

These still are not quite perfect, and in particular, it can get
upset on extreme boundary cases (addr = 0xfff, len = 0xffffffff,
which would end up mapping a single page rather than failing), but
this is better code that I committed before.

(note, the VM system does not (apparently) support single mmap segment
sizes above 0x80000000 anyway)
1996-03-02 17:14:09 +00:00
John Dyson
de5f6a7765 1) Eliminate unnecessary bzero of UPAGES.
2) Eliminate unnecessary copying of pages during/after forks.
3) Add user map simplification.
1996-03-02 02:54:24 +00:00
Peter Wemm
dabee6fecc kern_descrip.c: add fdshare()/fdcopy()
kern_fork.c: add the tiny bit of code for rfork operation.
kern/sysv_*: shmfork() takes one less arg, it was never used.
sys/shm.h: drop "isvfork" arg from shmfork() prototype
sys/param.h: declare rfork args.. (this is where OpenBSD put it..)
sys/filedesc.h: protos for fdshare/fdcopy.
vm/vm_mmap.c: add minherit code, add rounding to mmap() type args where
it makes sense.
vm/*: drop unused isvfork arg.

Note: this rfork() implementation copies the address space mappings,
it does not connect the mappings together.  ie: once the two processes
have split, the pages may be shared, but the address space is not. If one
does a mmap() etc, it does not appear in the other.  This makes it not
useful for pthreads, but it is useful in it's own right for having
light-weight threads in a static shared address space.

Obtained from: Original by Ron Minnich, extended by OpenBSD
1996-02-23 18:49:25 +00:00
David Greenman
5afce28270 Add a "NO_SWAPPING" option to disable swapping. This was originally done
to help diagnose a problem on wcarchive (where the kernel stack was
sometimes not present), but is useful in its own right since swapping
actually reduces performance on some systems (such as wcarchive).
Note: swapping in this context means making the U pages pageable and has
nothing to do with generic VM paging, which is unaffected by this option.

Reviewed by:	 <dyson>
1996-02-22 10:57:37 +00:00
John Dyson
a02051c37a Fixed a really bogus problem with msync ripping pages away from
objects before they were written.  Also, don't allow processes
without write access to remove pages from vm_objects.
1996-02-11 22:03:49 +00:00
John Dyson
dca5129987 Changed vm_fault_quick in vm_machdep.c to be global. Needed for
new pipe code.
1996-02-04 22:09:12 +00:00
David Greenman
1af87c9263 "out of space" -> "out of swap space". 1996-01-31 13:14:21 +00:00
David Greenman
729b1e5149 Improved killproc() log message and made it and the other similar message
tolerant of p_ucred being invalid. Starting using killproc() where
appropriate.
1996-01-31 12:44:33 +00:00
David Greenman
8c73da1e15 Print a more descriptive message when the mb_map is filled (out of mbuf
clusters), and tell the operator what to do about it (increase maxusers).
1996-01-31 12:05:52 +00:00
Mike Pritchard
6c5e9bbdf5 Fix a bunch of spelling errors in the comment fields of
a bunch of system include files.
1996-01-30 23:02:38 +00:00
David Greenman
2c68345ab4 Added a check/panic for vm_map_find failing to find space for the page
tables/u-pages when forking. This is a "can't happen" case. :-)
1996-01-29 12:10:30 +00:00
Bruce Evans
324e9ed2a4 Added a `boundary' arg to vm_alloc_page_contig(). Previously the only
way to avoid crossing a 64K DMA boundary was to specify an alignment
greater than the size even when the alignment didn't matter, and for
sizes larger than a page, this reduced the chance of finding enough
contiguous pages.  E.g., allocations of 8K not crossing a 64K boundary
previously had to be allocated on 8K boundaries; now they can be
allocated on any 4K boundary except (64 * n + 60)K.

Fixed bugs in vm_alloc_page_contig():
- the last page wasn't allocated for sizes smaller than a page.
- failures of kmem_alloc_pageable() weren't handled.

Mutated vm_page_alloc_contig() to create a more convenient interface
named contigmalloc().  This is the same as the one in 1.1.5 except
it has `low' and `high' args, and the `alignment' and `boundary'
args are multipliers instead of masks.
1996-01-27 00:13:33 +00:00
Poul-Henning Kamp
f782b11a04 Don't use %r, we havn't got it anymore.
Submitted by: bde
1996-01-25 07:15:40 +00:00
John Dyson
bd7e5f992e Eliminated many redundant vm_map_lookup operations for vm_mmap.
Speed up for vfs_bio -- addition of a routine bqrelse to greatly diminish
	overhead for merged cache.
Efficiency improvement for vfs_cluster.  It used to do alot of redundant
	calls to cluster_rbuild.
Correct the ordering for vrele of .text and release of credentials.
Use the selective tlb update for 486/586/P6.
Numerous fixes to the size of objects allocated for files.  Additionally,
	fixes in the various pagers.
Fixes for proper positioning of vnode_pager_setsize in msdosfs and ext2fs.
Fixes in the swap pager for exhausted resources.  The pageout code
	will not as readily thrash.
Change the page queue flags (PG_ACTIVE, PG_INACTIVE, PG_FREE, PG_CACHE) into
	page queue indices (PQ_ACTIVE, PQ_INACTIVE, PQ_FREE, PQ_CACHE),
	thereby improving efficiency of several routines.
Eliminate even more unnecessary vm_page_protect operations.
Significantly speed up process forks.
Make vm_object_page_clean more efficient, thereby eliminating the pause
	that happens every 30seconds.
Make sequential clustered writes B_ASYNC instead of B_DELWRI even in the
	case of filesystems mounted async.
Fix a panic with busy pages when write clustering is done for non-VMIO
	buffers.
1996-01-19 04:00:31 +00:00
Garrett Wollman
0e41ee3037 Convert DDB to new-style option. 1996-01-04 21:13:23 +00:00
Garrett Wollman
50c73f3620 Convert SYSV IPC to new-style options. (I hope I got everything...)
The LKMs will need an extra file, to come later.
1996-01-04 20:29:06 +00:00
David Greenman
a2d5b14236 Increased vm_object_cache_max by about 50% to yield better utilization of
memory when lots of small files are cached.

Reviewed by:	dyson
1996-01-04 18:32:31 +00:00
Peter Wemm
a5b996a7ec recording cvs-1.6 file death 1995-12-30 19:02:48 +00:00