Commit Graph

554 Commits

Author SHA1 Message Date
Guido van Rooij c8bdd56b3a Fix for mmap of char devices bug as described in OpenBSD advisory of
1998/02/20
Reviewed by:	John Dyson
Submitted by:	"Cy Schubert" <cschuber@uumail.gov.bc.ca>
1998-03-12 19:36:18 +00:00
Mike Smith 86ffbd76d0 Complement diagnostic messages about missing per-FS VOP page operations,
but don't make their absence fatal.
Submitted by:	terry
1998-03-09 08:58:53 +00:00
John Dyson be01eafd5f Quell unneeded pageout daemon activity. 1998-03-08 18:19:17 +00:00
John Dyson 6215e86272 Remove a very ill advised vm_page_protect. This was being called
for a non-managed page.  That is a big no-no.
1998-03-08 18:05:59 +00:00
John Dyson e163e201ef Some cruft left over from my megacommit. A page rotation optimization
was a good idea, but can cause instability.  That optimization is
now removed.
1998-03-08 06:27:30 +00:00
John Dyson edd97f3a37 Several minor fixes:
1) When freeing pages, it is a good idea to protect them off.
	   (This is probably gratuitious, but good form.)
	2) Allow collapsing pages in the backing object that are
	   PQ_CACHE.  This will improve memory utilization.
	3) Correct the collapse code so that pages that were on the
	   cache queue are moved to the inactive queue.  This is
	   done when pages are marked dirty (so that those pages
	   will be properly paged out instead of freed), so that
	   cached pages will not be paradoxically marked dirty.
1998-03-08 06:25:59 +00:00
John Dyson 8f9110f6a1 This mega-commit is meant to fix numerous interrelated problems. There
has been some bitrot and incorrect assumptions in the vfs_bio code.  These
problems have manifest themselves worse on NFS type filesystems, but can
still affect local filesystems under certain circumstances.  Most of
the problems have involved mmap consistancy, and as a side-effect broke
the vfs.ioopt code.  This code might have been committed seperately, but
almost everything is interrelated.

1)	Allow (pmap_object_init_pt) prefaulting of buffer-busy pages that
	are fully valid.
2)	Rather than deactivating erroneously read initial (header) pages in
	kern_exec, we now free them.
3)	Fix the rundown of non-VMIO buffers that are in an inconsistent
	(missing vp) state.
4)	Fix the disassociation of pages from buffers in brelse.  The previous
	code had rotted and was faulty in a couple of important circumstances.
5)	Remove a gratuitious buffer wakeup in vfs_vmio_release.
6)	Remove a crufty and currently unused cluster mechanism for VBLK
	files in vfs_bio_awrite.  When the code is functional, I'll add back
	a cleaner version.
7)	The page busy count wakeups assocated with the buffer cache usage were
	incorrectly cleaned up in a previous commit by me.  Revert to the
	original, correct version, but with a cleaner implementation.
8)	The cluster read code now tries to keep data associated with buffers
	more aggressively (without breaking the heuristics) when it is presumed
	that the read data (buffers) will be soon needed.
9)	Change to filesystem lockmgr locks so that they use LK_NOPAUSE.  The
	delay loop waiting is not useful for filesystem locks, due to the
	length of the time intervals.
10)	Correct and clean-up spec_getpages.
11)	Implement a fully functional nfs_getpages, nfs_putpages.
12)	Fix nfs_write so that modifications are coherent with the NFS data on
	the server disk (at least as well as NFS seems to allow.)
13)	Properly support MS_INVALIDATE on NFS.
14)	Properly pass down MS_INVALIDATE to lower levels of the VM code from
	vm_map_clean.
15)	Better support the notion of pages being busy but valid, so that
	fewer in-transit waits occur.  (use p->busy more for pageouts instead
	of PG_BUSY.)  Since the page is fully valid, it is still usable for
	reads.
16)	It is possible (in error) for cached pages to be busy.  Make the
	page allocation code handle that case correctly.  (It should probably
	be a printf or panic, but I want the system to handle coding errors
	robustly.  I'll probably add a printf.)
17)	Correct the design and usage of vm_page_sleep.  It didn't handle
	consistancy problems very well, so make the design a little less
	lofty.  After vm_page_sleep, if it ever blocked, it is still important
	to relookup the page (if the object generation count changed), and
	verify it's status (always.)
18)	In vm_pageout.c, vm_pageout_clean had rotted, so clean that up.
19)	Push the page busy for writes and VM_PROT_READ into vm_pageout_flush.
20)	Fix vm_pager_put_pages and it's descendents to support an int flag
	instead of a boolean, so that we can pass down the invalidate bit.
1998-03-07 21:37:31 +00:00
John Dyson 4866e0856c Make vm_fault much cleaner by removing the evil macro inlines, and
put alot of it's context into a data structure.  This allows
significant shortening of its codepath, and will significantly
decrease it's cache footprint.

Also, add some stats to vmmeter.  Note that you'll have to
rebuild/recompile vmstat, systat, etc... Otherwise, you'll
get "very interesting" paging stats.
1998-03-07 20:45:47 +00:00
Peter Dufault 917e476dad Reviewed by: msmith, bde long ago
POSIX.4 headers and sysctl variables.  Nothing should change
unless POSIX4 is defined or _POSIX_VERSION is set to 199309.
1998-03-04 10:27:00 +00:00
John Dyson ffc82b0a70 1) Use a more consistent page wait methodology.
2)	Do not unnecessarily force page blocking when paging
	pages out.
3)	Further improve swap pager performance and correctness,
	including fixing the paging in progress deadlock (except
	in severe I/O error conditions.)
4)	Enable vfs_ioopt=1 as a default.
5)	Fix and enable the page prezeroing in SMP mode.

All in all, SMP systems especially should show a significant
improvement in "snappyness."
1998-03-01 04:18:54 +00:00
Mike Smith ce75f2c365 In the author's words:
These diffs implement the first stage of a VOP_{GET|PUT}PAGES pushdown
for local media FS's.

See ffs_putpages in /sys/ufs/ufs/ufs_readwrite.c for implementation
details for generic *_{get|put}pages for local media FS's.  Support
is trivial to add for any FS that formerly relied on the default
behaviour of the vnode_pager in in EOPNOTSUPP cases (just copy the
ffs_getpages() code for the FS in question's *_{get|put}pages).

Obviously, it would be better if each local media FS implemented a
more optimal method, instead of calling an exported interface from
the /sys/vm/vnode_pager.c, but this is a necessary first step in
getting the FS's to a point where they can be supplied with better
implementations on a case-by-case basis.

Obviously, the cd9660_putpages() can be rather trivial (since it
is a read-only FS type 8-)).

A slight (temporary) modification is made to print a diagnostic message
in the case where the underlying filesystem attempts to engage in the
previous behaviour.  Failure is likely to be ungraceful.

Submitted by:	terry@freebsd.org (Terry Lambert)
1998-02-26 06:39:59 +00:00
John Dyson 660957521c Fix page prezeroing for SMP, and fix some potential paging-in-progress
hangs.  The paging-in-progress diagnosis was a result of Tor Egge's
excellent detective work.
Submitted by:	Partially from Tor Egge.
1998-02-25 03:56:15 +00:00
John Dyson a15403de7b Correct some severe VM tuning problems for small systems (<=16MB), and
improve tuning on larger systems.  (A couple of the VM tuning params for
small systems were so badly chosen that the system could hang under load.)

The broken tuning was originaly my fault.
1998-02-24 10:16:23 +00:00
John Dyson e47ed70b0f Significantly improve the efficiency of the swap pager, which appears to
have declined due to code-rot over time.  The swap pager rundown code
has been clean-up, and unneeded wakeups removed.  Lots of splbio's
are changed to splvm's.  Also, set the dynamic tunables for the
pageout daemon to be more sane for larger systems (thereby decreasing
the daemon overheadla.)
1998-02-23 08:22:48 +00:00
John Dyson d9bed5bee1 Try to dynamically size the VM_KMEM_SIZE (but is still able to be overridden
in a way identically as before.)  I had problems with the system properly
handling the number of vnodes when there is alot of system memory, and the
default VM_KMEM_SIZE.  Two new options "VM_KMEM_SIZE_SCALE" and
"VM_KMEM_SIZE_MAX" have been added to support better auto-sizing for systems
with greater than 128MB.

Add some accouting for vm_zone memory allocations, and provide properly
for vm_zone allocations out of the kmem_map.  Also move the vm_zone
allocation stats to the VM OID tree from the KERN OID tree.
1998-02-23 07:42:43 +00:00
Bruce Evans 39e4376ba7 Removed unused #includes. 1998-02-20 13:11:54 +00:00
Mike Smith eed5086e1a Move the 'sw' device off block major #1, which is now occupied by 'wfd'. 1998-02-19 12:15:06 +00:00
Eivind Eklund 303b270b0a Staticize. 1998-02-09 06:11:36 +00:00
John Dyson 157ac55f97 Fix an argument to vn_lock. It appears that alot of the vn_lock usage
is a bit undisciplined, and should be checked carefully.
1998-02-08 14:55:13 +00:00
Eivind Eklund 0b08f5f737 Back out DIAGNOSTIC changes. 1998-02-06 12:14:30 +00:00
John Dyson 95461b450d 1) Start using a cleaner and more consistant page allocator instead
of the various ad-hoc schemes.
2)	When bringing in UPAGES, the pmap code needs to do another vm_page_lookup.
3)	When appropriate, set the PG_A or PG_M bits a-priori to both avoid some
	processor errata, and to minimize redundant processor updating of page
	tables.
4)	Modify pmap_protect so that it can only remove permissions (as it
	originally supported.)  The additional capability is not needed.
5)	Streamline read-only to read-write page mappings.
6)	For pmap_copy_page, don't enable write mapping for source page.
7)	Correct and clean-up pmap_incore.
8)	Cluster initial kern_exec pagin.
9)	Removal of some minor lint from kern_malloc.
10)	Correct some ioopt code.
11)	Remove some dead code from the MI swapout routine.
12)	Correct vm_object_deallocate (to remove backing_object ref.)
13)	Fix dead object handling, that had problems under heavy memory load.
14)	Add minor vm_page_lookup improvements.
15)	Some pages are not in objects, and make sure that the vm_page.c can
	properly support such pages.
16)	Add some more page deficit handling.
17)	Some minor code readability improvements.
1998-02-05 03:32:49 +00:00
Eivind Eklund 47cfdb166d Turn DIAGNOSTIC into a new-style option. 1998-02-04 22:34:03 +00:00
Bruce Evans e7a5897899 Added #include of <sys/queue.h> so that this file is more "self"-sufficent. 1998-02-03 22:19:35 +00:00
John Dyson e736cd05cb This fix should help the panic problems in -current. There
were some errors in "interval" management.  Due to the
clustering mechanism, the code is necessarily complex and
error prone.
1998-02-03 00:50:36 +00:00
Bruce Evans 8bcc577e92 Forward declare more structs that are used in prototypes here - don't
depend on <sys/types.h> forward declaring common ones.
1998-02-01 20:08:39 +00:00
John Dyson 1f13bdaa97 Fix a performance problem caused by an earlier commit. 1998-02-01 02:00:20 +00:00
John Dyson c15541e7a7 contigalloc doesn't place the allocated page(s) into an object, and
now this breaks vm_page_wire (due to wired page accounting per object.)

This should fix a problem as described by Donald Maddox.
1998-01-31 20:30:18 +00:00
John Dyson eaf13dd73a Change the busy page mgmt, so that when pages are freed, they
MUST be PG_BUSY.  It is bogus to free a page that isn't busy,
because it is in a state of being "unavailable" when being
freed.  The additional advantage is that the page_remove code
has a better cross-check that the page should be busy and
unavailable for other use.  There were some minor problems
with the collapse code, and this plugs those subtile "holes."

Also, the vfs_bio code wasn't checking correctly for PG_BUSY
pages.  I am going to develop a more consistant scheme for
grabbing pages, busy or otherwise.  For now, we are stuck
with the current morass.
1998-01-31 11:56:53 +00:00
Eivind Eklund f71bb3a160 Turn NSWAPDEV into a new-style option. 1998-01-25 04:13:25 +00:00
Eivind Eklund 7b778b5e61 Make all file-system (MFS, FFS, NFS, LFS, DEVFS) related option new-style.
This introduce an xxxFS_BOOT for each of the rootable filesystems.
(Presently not required, but encouraged to allow a smooth move of option *FS
to opt_dontuse.h later.)

LFS is temporarily disabled, and will be re-enabled tomorrow.
1998-01-24 02:54:56 +00:00
John Dyson 50ce7ff499 Add better support for larger I/O clusters, including larger physical
I/O.  The support is not mature yet, and some of the underlying implementation
needs help.  However, support does exist for IDE devices now.
1998-01-24 02:01:46 +00:00
John Dyson 2d8acc0f4a VM level code cleanups.
1)	Start using TSM.
	Struct procs continue to point to upages structure, after being freed.
	Struct vmspace continues to point to pte object and kva space for kstack.
	u_map is now superfluous.
2)	vm_map's don't need to be reference counted.  They always exist either
	in the kernel or in a vmspace.  The vmspaces are managed by reference
	counts.
3)	Remove the "wired" vm_map nonsense.
4)	No need to keep a cache of kernel stack kva's.
5)	Get rid of strange looking ++var, and change to var++.
6)	Change more data structures to use our "zone" allocator.  Added
	struct proc, struct vmspace and struct vnode.  This saves a significant
	amount of kva space and physical memory.  Additionally, this enables
	TSM for the zone managed memory.
7)	Keep ioopt disabled for now.
8)	Remove the now bogus "single use" map concept.
9)	Use generation counts or id's for data structures residing in TSM, where
	it allows us to avoid unneeded restart overhead during traversals, where
	blocking might occur.
10)	Account better for memory deficits, so the pageout daemon will be able
	to make enough memory available (experimental.)
11)	Fix some vnode locking problems. (From Tor, I think.)
12)	Add a check in ufs_lookup, to avoid lots of unneeded calls to bcmp.
	(experimental.)
13)	Significantly shrink, cleanup, and make slightly faster the vm_fault.c
	code.  Use generation counts, get rid of unneded collpase operations,
	and clean up the cluster code.
14)	Make vm_zone more suitable for TSM.

This commit is partially as a result of discussions and contributions from
other people, including DG, Tor Egge, PHK, and probably others that I
have forgotten to attribute (so let me know, if I forgot.)

This is not the infamous, final cleanup of the vnode stuff, but a necessary
step.  Vnode mgmt should be correct, but things might still change, and
there is still some missing stuff (like ioopt, and physical backing of
non-merged cache files, debugging of layering concepts.)
1998-01-22 17:30:44 +00:00
John Dyson 480ba2f552 Allow gdb to work again. 1998-01-21 12:18:00 +00:00
John Dyson 4722175765 Tie up some loose ends in vnode/object management. Remove an unneeded
config option in pmap.  Fix a problem with faulting in pages.  Clean-up
some loose ends in swap pager memory management.

The system should be much more stable, but all subtile bugs aren't fixed yet.
1998-01-17 09:17:02 +00:00
John Dyson 925a3a419a Fix some vnode management problems, and better mgmt of vnode free list.
Fix the UIO optimization code.
Fix an assumption in vm_map_insert regarding allocation of swap pagers.
Fix an spl problem in the collapse handling in vm_object_deallocate.
When pages are freed from vnode objects, and the criteria for putting
the associated vnode onto the free list is reached, either put the
vnode onto the list, or put it onto an interrupt safe version of the
list, for further transfer onto the actual free list.
Some minor syntax changes changing pre-decs, pre-incs to post versions.
Remove a bogus timeout (that I added for debugging) from vn_lock.

PHK will likely still have problems with the vnode list management, and
so do I, but it is better than it was.
1998-01-12 01:46:33 +00:00
John Dyson bf27292b35 Turn off the VTEXT flag when an object is no longer referenced, so
that an executable that is no longer running can be written to.  Also,
clear the OBJ_OPT flag more often, when appropriate.
1998-01-07 03:12:19 +00:00
John Dyson 95e5e988e0 Make our v_usecount vnode reference count work identically to the
original BSD code.  The association between the vnode and the vm_object
no longer includes reference counts.  The major difference is that
vm_object's are no longer freed gratuitiously from the vnode, and so
once an object is created for the vnode, it will last as long as the
vnode does.

When a vnode object reference count is incremented, then the underlying
vnode reference count is incremented also.  The two "objects" are now
more intimately related, and so the interactions are now much less
complex.

When vnodes are now normally placed onto the free queue with an object still
attached.  The rundown of the object happens at vnode rundown time, and
happens with exactly the same filesystem semantics of the original VFS
code.  There is absolutely no need for vnode_pager_uncache and other
travesties like that anymore.

A side-effect of these changes is that SMP locking should be much simpler,
the I/O copyin/copyout optimizations work, NFS should be more ponderable,
and further work on layered filesystems should be less frustrating, because
of the totally coherent management of the vnode objects and vnodes.

Please be careful with your system while running this code, but I would
greatly appreciate feedback as soon a reasonably possible.
1998-01-06 05:26:17 +00:00
Alexander Langer 651bb81717 caddr_t --> void * 1997-12-31 02:35:29 +00:00
John Dyson 60f8d46448 Fix the decl of vfs_ioopt, allow LFS to compile again, fix a minor problem
with the object cache removal.
1997-12-29 01:03:55 +00:00
John Dyson 2be70f79f6 Lots of improvements, including restructring the caching and management
of vnodes and objects.  There are some metadata performance improvements
that come along with this.  There are also a few prototypes added when
the need is noticed.  Changes include:

1) Cleaning up vref, vget.
2) Removal of the object cache.
3) Nuke vnode_pager_uncache and friends, because they aren't needed anymore.
4) Correct some missing LK_RETRY's in vn_lock.
5) Correct the page range in the code for msync.

Be gentle, and please give me feedback asap.
1997-12-29 00:25:11 +00:00
John Dyson 6d1756a948 The ioopt code is still buggy, but wasn't fully disabled. 1997-12-25 20:55:15 +00:00
John Dyson b44e4b7a2b Support running with inadequate swap space. Additionally, the code
will complain with a suggestion of increasing it.
1997-12-24 15:05:25 +00:00
John Dyson 998d8cd662 Improve my copyright. 1997-12-22 11:48:13 +00:00
John Dyson c2e11a039d Change bogus usage of btoc to atop. The incorrect usage of btoc was
pointed out by bde.
1997-12-19 15:31:13 +00:00
John Dyson 1efb74fbcc Some performance improvements, and code cleanups (including changing our
expensive OFF_TO_IDX to btoc whenever possible.)
1997-12-19 09:03:37 +00:00
Eivind Eklund 5591b823d1 Make COMPAT_43 and COMPAT_SUNOS new-style options. 1997-12-16 17:40:42 +00:00
John Dyson bd28588799 Fix a recursive kernel_map lock problem in vm_zone allocator.
PR:	5298
1997-12-15 05:16:09 +00:00
John Dyson b0d8408e21 Slight improvement to the vm_zone stats output. Also, some other superficial
cleanups.
1997-12-14 05:17:44 +00:00
John Dyson 8256655132 After one of my analysis passes to evaluate methods for SMP TLB mgmt, I
noticed some major enhancements available for UP situations.  The number
of UP TLB flushes is decreased much more than significantly with these
changes.  Since a TLB flush appears to cost minimally approx 80 cycles,
this is a "nice" enhancement, equiv to eliminating between 40 and 160
instructions per TLB flush.

Changes include making sure that kernel threads all use the same PTD,
and eliminate unneeded PTD switches at context switch time.
1997-12-14 02:11:23 +00:00
John Dyson 3a2dc656bc Fix the prototype for swapout_procs();
Submitted by:	dima@best.net
1997-12-11 02:10:55 +00:00