freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-20 11:11:24 +00:00

Author	SHA1	Message	Date
Peter Wemm	0d2a298904	Initial landing of SMP support for FreeBSD/amd64. - This is heavily derived from John Baldwin's apic/pci cleanup on i386. - I have completely rewritten or drastically cleaned up some other parts. (in particular, bootstrap) - This is still a WIP. It seems that there are some highly bogus bioses on nVidia nForce3-150 boards. I can't stress how broken these boards are. I have a workaround in mind, but right now the Asus SK8N is broken. The Gigabyte K8NPro (nVidia based) is also mind-numbingly hosed. - Most of my testing has been with SCHED_ULE. SCHED_4BSD works. - the apic and acpi components are 'standard'. - If you have an nVidia nForce3-150 board, you are stuck with 'device atpic' in addition, because they somehow managed to forget to connect the 8254 timer to the apic, even though its in the same silicon! ARGH! This directly violates the ACPI spec.	2003-11-17 08:58:16 +00:00
Peter Wemm	fcfe57d640	Update the graffiti.	2003-11-08 04:39:22 +00:00
Bruce M Simpson	2bc7dd5661	Move pmap_resident_count() from the MD pmap.h to the MI pmap.h. Add a definition of pmap_wired_count(). Add a definition of vmspace_wired_count(). Reviewed by: truckman Discussed with: peter	2003-10-06 01:47:12 +00:00
Alan Cox	9060731130	Eliminate the pte object.	2003-09-27 20:53:01 +00:00
Peter Wemm	bf8ca114e2	Fix the VADDR() macros to use either KVADDR() or UVADDR(), depending on the implied sign extension. The single unified VADDR() macro was not able to avoid sign extending the VM_MAXUSER_ADDRESS/USRSTACK values. Be explicit about UVADDR() (positive address space) and KVADDR() (kernel negative address space) to make mistakes show up more spectacularly. Increase user VM space from 1/2TB (512GB) to 128TB.	2003-07-09 23:04:23 +00:00
Hidetoshi Shimokawa	e07324646e	Move KERNBASE to -2GB. Currently, we cannot increase KVA more than 2GB.	2003-06-22 13:02:45 +00:00
Peter Wemm	cbd667fa2f	Update comments. Note that the kernel is at -1GB, not -2GB as erroniously implied by the previous commit. KVM is still only 1GB until pmap_growkernel() learns about the extra page table level. Approved by: re (blanket)	2003-05-23 06:35:45 +00:00
Peter Wemm	3c9a3c9ca3	Major pmap rework to take advantage of the larger address space on amd64 systems. Of note: - Implement a direct mapped region using 2MB pages. This eliminates the need for temporary mappings when getting ptes. This supports up to 512GB of physical memory for now. This should be enough for a while. - Implement a 4-tier page table system. Most of the infrastructure is there for 128TB of userland virtual address space, but only 512GB is presently enabled due to a mystery bug somewhere. The design of this was heavily inspired by the alpha pmap.c. - The kernel is moved into the negative address space(!). - The kernel has 2GB of KVM available. - Provide a uma memory allocator to use the direct map region to take advantage of the 2MB TLBs. - Fixed some assumptions in the bus_space macros about the ability to fit virtual addresses in an 'int'. Notable missing things: - pmap_growkernel() should be able to grow to 512GB of KVM by expanding downwards below kernbase. The kernel must be at the top 2GB of the negative address space because of gcc code generation strategies. - need to fix the >512GB user vm code. Approved by: re (blanket)	2003-05-23 05:04:54 +00:00
Peter Wemm	be52ef1399	Use compile time constants for things like PTmap[] etc because they're about to move outside of the +/- 2GB range Suggested by: jake Approved by: re (amd64/* blanket)	2003-05-15 00:20:17 +00:00
Peter Wemm	afa8862328	Commit MD parts of a loosely functional AMD64 port. This is based on a heavily stripped down FreeBSD/i386 (brutally stripped down actually) to attempt to get a stable base to start from. There is a lot missing still. Worth noting: - The kernel runs at 1GB in order to cheat with the pmap code. pmap uses a variation of the PAE code in order to avoid having to worry about 4 levels of page tables yet. - It boots in 64 bit "long mode" with a tiny trampoline embedded in the i386 loader. This simplifies locore.s greatly. - There are still quite a few fragments of i386-specific code that have not been translated yet, and some that I cheated and wrote dumb C versions of (bcopy etc). - It has both int 0x80 for syscalls (but using registers for argument passing, as is native on the amd64 ABI), and the 'syscall' instruction for syscalls. int 0x80 preserves all registers, 'syscall' does not. - I have tried to minimize looking at the NetBSD code, except in a couple of places (eg: to find which register they use to replace the trashed %rcx register in the syscall instruction). As a result, there is not a lot of similarity. I did look at NetBSD a few times while debugging to get some ideas about what I might have done wrong in my first attempt.	2003-05-01 01:05:25 +00:00
Jake Burkholder	14ce5bd49b	Use inlines for loading and storing page table entries. Use cmpxchg8b for the PAE case to ensure idempotent 64 bit loads and stores. Sponsored by: DARPA, Network Associates Laboratories	2003-04-28 20:35:36 +00:00
Jake Burkholder	ac00210525	Remove invalid cast to vm_offset_t to avoid truncating a physical address when doing pmap_kextract on a 2MB page. Spotted by: peter Sponsored by: DARPA, Network Associates Laboratories	2003-04-08 18:22:41 +00:00
Jake Burkholder	46ea68dd10	Better fix for previous previous which still allows the 4megs of kva at the top of the address space to be reclaimed. The problem is that with the APTD gone the mapable kernel address space runs right to the end of the 32 bit address space. As a max this is 0x100000000, which can't be represented in 32 bits, so we have to use ptd entry n-1 and pte offset n-1, instead of ptd entry n and pte offset 0. There's still 1 page we can't use, but we gain just under 4 megs of kva (8 megs with PAE). Sponsored by: DARPA, Network Associates Laboratories	2003-04-07 14:27:19 +00:00
Jake Burkholder	d1d03c2b72	Bandaid fix for previous commit while I figure out why it broke. This caused crashes early in boot on i386 UP machines. Reported by: phk Pointy hat to: jake	2003-04-04 10:09:44 +00:00
Jake Burkholder	163529c2b3	- Removed APTD and associated macros, it is no longer used. BANG BANG BANG etc. Sponsored by: DARPA, Network Associates Laboratories	2003-04-03 23:44:35 +00:00
Peter Wemm	cc66ebe2a9	Commit a partial lazy thread switch mechanism for i386. it isn't as lazy as it could be and can do with some more cleanup. Currently its under options LAZY_SWITCH. What this does is avoid %cr3 reloads for short context switches that do not involve another user process. ie: we can take an interrupt, switch to a kthread and return to the user without explicitly flushing the tlb. However, this isn't as exciting as it could be, the interrupt overhead is still high and too much blocks on Giant still. There are some debug sysctls, for stats and for an on/off switch. The main problem with doing this has been "what if the process that you're running on exits while we're borrowing its address space?" - in this case we use an IPI to give it a kick when we're about to reclaim the pmap. Its not compiled in unless you add the LAZY_SWITCH option. I want to fix a few more things and get some more feedback before turning it on by default. This is NOT a replacement for Bosko's lazy interrupt stuff. This was more meant for the kthread case, while his was for interrupts. Mine helps a little for interrupts, but his helps a lot more. The stats are enabled with options SWTCH_OPTIM_STATS - this has been a pseudo-option for years, I just added a bunch of stuff to it. One non-trivial change was to select a new thread before calling cpu_switch() in the first place. This allows us to catch the silly case of doing a cpu_switch() to the current process. This happens uncomfortably often. This simplifies a bit of the asm code in cpu_switch (no longer have to call choosethread() in the middle). This has been implemented on i386 and (thanks to jake) sparc64. The others will come soon. This is actually seperate to the lazy switch stuff. Glanced at by: jake, jhb	2003-04-02 23:53:30 +00:00
Jake Burkholder	7ab9b220d9	- Add support for PAE and more than 4 gigs of ram on x86, dependent on the kernel opition 'options PAE'. This will only work with device drivers which either use busdma, or are able to handle 64 bit physical addresses. Thanks to Lanny Baron from FreeBSD Systems for the loan of a test machine with 6 gigs of ram. Sponsored by: DARPA, Network Associates Laboratories, FreeBSD Systems	2003-03-30 05:24:52 +00:00
Jake Burkholder	de54353fb8	- Remove invalid casts. Sponsored by: DARPA, Network Associates Laboratories	2003-03-30 01:44:16 +00:00
Jake Burkholder	aea57872f0	- Convert all uses of pmap_pte and get_ptbase to pmap_pte_quick. When accessing an alternate address space this causes 1 page table page at a time to be mapped in, rather than using the recursive mapping technique to map in an entire alternate address space. The recursive mapping technique changes large portions of the address space and requires global tlb flushes, which seem to cause problems when PAE is enabled. This will also allow IPIs to be avoided when mapping in new page table pages using the same technique as is used for pmap_copy_page and pmap_zero_page. Sponsored by: DARPA, Network Associates Laboratories	2003-03-30 01:16:19 +00:00
Jake Burkholder	227f9a1c58	- Add vm_paddr_t, a physical address type. This is required for systems where physical addresses larger than virtual addresses, such as i386s with PAE. - Use this to represent physical addresses in the MI vm system and in the i386 pmap code. This also changes the paddr parameter to d_mmap_t. - Fix printf formats to handle physical addresses >4G in the i386 memory detection code, and due to kvtop returning vm_paddr_t instead of u_long. Note that this is a name change only; vm_paddr_t is still the same as vm_offset_t on all currently supported platforms. Sponsored by: DARPA, Network Associates Laboratories Discussed with: re, phk (cdevsw change)	2003-03-25 00:07:06 +00:00
Jake Burkholder	5501d40bb9	Made the prototypes for pmap_kenter and pmap_kremove MD. These functions are machine dependent because they are not required to update the tlb when mappings are added or removed, and doing so is machine dependent. In addition, an implementation may require that pages mapped with pmap_kenter have a backing vm_page_t, which is not necessarily true of all physical pages, and so may choose to pass the vm_page_t to pmap_kenter instead of the physical address in order to make this requirement clear.	2003-03-16 04:16:03 +00:00
Alan Cox	8480cd45a5	Remove some long unused declarations. (For example, the PV flags have not been used since revision 1.8, roughly nine years ago.)	2003-02-27 20:13:20 +00:00
Jake Burkholder	0f1a7e05a2	- Added inlines pmap_is_current, pmap_is_alternate and pmap_set_alternate for testing and setting the current and alternate address spaces. - Changed PTDpde and APTDpde to arrays to support multiple page directory pages. ponsored by: DARPA, Network Associates Laboratories	2003-02-25 19:40:21 +00:00
Jake Burkholder	5cd612b27e	- Removed UMAXPTDI and UMAXPTEOFF. - Changed VM_MAXUSER_ADDRESS to be defined in terms of PTDPTDI. In order for assumptions about the recursive page table map to work it must be the base of the recursive map. Any pte offset that's not NPTEPG will break these assumptions. Sponsored by: DARPA, Network Associates Laboratories	2003-02-24 20:29:52 +00:00
Jake Burkholder	ef49a94104	Previous commit missed a 1 that should be NGPTD, and an NPDEPG that should be NPDEPTD. Grumble. Sponsored by: DARPA, Network Associates Laboratories	2003-02-23 22:12:08 +00:00
Jake Burkholder	910548dea7	- Added macros NPGPTD, NBPTD, and NPDEPTD, for dealing with the size of the page directory. - Use these instead of the magic constants 1 or PAGE_SIZE where appropriate. There are still numerous assumptions that the page directory is exactly 1 page. Sponsored by: DARPA, Network Associates Laboratories	2003-02-23 21:20:00 +00:00
Jake Burkholder	e29632c9e1	- Added macros PDESHIFT and PTESHIFT, use these instead of magic constants in locore. - Removed the macros PTESIZE and PDESIZE, use sizeof instead in C. Sponsored by: DARPA, Network Associates Laboratories	2003-02-23 09:45:50 +00:00
Alan Cox	01a06ce250	The root of the splay tree maintained within the pm_pteobj always refers to the last accessed pte page. Thus, the pm_ptphint is redundant and can be removed.	2003-02-22 23:43:08 +00:00
Alan Cox	8365d52166	o Introduce pmap_page_is_mapped(). Its purpose is to obsolete the PG_MAPPED flag.	2002-08-05 03:40:28 +00:00
Peter Wemm	f1b665c8fe	Revive backed out pmap related changes from Feb 2002. The highlights are: - It actually works this time, honest! - Fine grained TLB shootdowns for SMP on i386. IPI's are very expensive, so try and optimize things where possible. - Introduce ranged shootdowns that can be done as a single IPI. - PG_G support for i386 - Specific-cpu targeted shootdowns. For example, there is no sense in globally purging the TLB cache for where we are stealing a page from the local unshared process on the local cpu. Use pm_active to track this. - Add some instrumentation for the tlb shootdown code. - Rip out SMP code from <machine/cpufunc.h> - Try and fix some very bogus PG_G and PG_PS interactions that were bad enough to cause vm86 bios calls to break. vm86 depended on our existing bugs and this was the cause of the VESA panics last time. - Fix the silly one-line error that caused the 'panic: bad pte' last time. - Fix a couple of other silly one-line errors that should have caused more pain than they did. Some more work is needed: - pmap_{zero,copy}_page[_idle]. These can be done without IPI's if we have a hook in cpu_switch. - The IPI handlers need some cleanup. I have a bogus %ds load that can be avoided. - APTD handling is rather bogus and appears to be a large source of global TLB IPI shootdowns for no really good reason. I see speedups of between 1.5% and ~4% on buildworlds in a while 1 loop. I expect to see a bigger difference when there is significant pageout activity or the system otherwise has memory shortages. I have backed out a few optimizations that I had been using over the last few days in order to be a little more conservative. I'll revisit these again over the next few days as the dust settles. New option: DISABLE_PG_G - In case I missed something.	2002-07-12 07:56:11 +00:00
Peter Wemm	fcdc053233	Cosmetic. Remove #if 0 definition of vtophys() - it predates 4MB pages. Remove avtophys(), it isn't referenced anywhere.	2002-07-08 08:14:28 +00:00
Peter Wemm	db17c6fc07	Tidy up some loose ends. i386/ia64/alpha - catch up to sparc64/ppc: - replace pmap_kernel() with refs to kernel_pmap - change kernel_pmap pointer to (&kernel_pmap_store) (this is a speedup since ld can set these at compile/link time) all platforms (as suggested by jake): - gc unused pmap_reference - gc unused pmap_destroy - gc unused struct pmap.pm_count (we never used pm_count - we track address space sharing at the vmspace)	2002-04-29 07:43:16 +00:00
Alfred Perlstein	b63dc6ad47	Remove __P.	2002-03-20 05:48:58 +00:00
Peter Wemm	d1693e1701	Back out all the pmap related stuff I've touched over the last few days. There is some unresolved badness that has been eluding me, particularly affecting uniprocessor kernels. Turning off PG_G helped (which is a bad sign) but didn't solve it entirely. Userland programs still crashed.	2002-02-27 09:51:33 +00:00
Peter Wemm	6bd95d70db	Work-in-progress commit syncing up pmap cleanups that I have been working on for a while: - fine grained TLB shootdown for SMP on i386 - ranged TLB shootdowns.. eg: specify a range of pages to shoot down with a single IPI, since the IPI is very expensive. Adjust some callers that used to trigger this inside tight loops to do a ranged shootdown at the end instead. - PG_G support for SMP on i386 (options ENABLE_PG_G) - defer PG_G activation till after we decide what we are going to do with PSE and the 4MB pages at the start of the kernel. This should solve some rumored strangeness about stale PG_G entries getting stuck underneath the 4MB pages. - add some instrumentation for the fine TLB shootdown - convert some asm instruction wrappers from functions to inlines. gcc seems to do a fair bit better with this. - [temporarily!] pessimize the tlb shootdown IPI handlers. I will fix this again shortly. This has been working fairly well for me for a while, but I have tweaked it again prior to commit since my last major testing round. The only outstanding problem that I know of is PG_G related, which is why there is an option for it (not on by default for SMP). I have seen a world speedups by a few percent (as much as 4 or 5% in one case) but I have not accurately measured this - I am a bit sceptical of these numbers.	2002-02-25 23:49:51 +00:00
Peter Wemm	963131fe0a	Tidy up some warnings	2002-02-25 21:42:23 +00:00
Peter Wemm	6a3e90ef2f	Some more tidy-up of stray "unsigned" variables instead of p[dt]_entry_t etc.	2002-02-20 01:05:57 +00:00
Peter Wemm	6729cb8800	Start bringing i386/pmap.c into line with cleanups that were done to alpha pmap. In particular - - pd_entry_t and pt_entry_t are now u_int32_t instead of a pointer. This is to enable cleaner PAE and x86-64 support down the track sor that we can change the pd_entry_t/pt_entry_t types to 64 bit entities. - Terminate "unsigned *ptep, pte" with extreme prejudice and use the correct pt_entry_t/pd_entry_t types. - Various other cosmetic changes to match cleanups elsewhere. - This eliminates a boatload of casts. - use VM_MAXUSER_ADDRESS in place of UPT_MIN_ADDRESS in a couple of places where we're testing user address space limits. Assuming the page tables start directly after the end of user space is not a safe assumption. There is still more to go.	2001-11-17 01:38:32 +00:00
Peter Wemm	f83fbaf22d	Introduce a new option, KVA_SPACE, which can be used to reconfigure the size of the kernel virtual address space relatively painlessly. Userland will adapt via the exported kernbase symbol. Increasing this causes the user part of address space to reduce.	2001-09-21 06:23:03 +00:00
Peter Wemm	083e9ed543	Increase NKPT from 17 to 30. This fixes the 4GB ram boot panic on both -current and RELENG_4 with GENERIC. NKPT is the number of initial bootstrap page table pages we create for the kernel during startup. Once VM is up, we resize it as needed, but with 4G ram, the size of the vm_page_t structures was pushing it over the limit. The fact that trimmed down kernels boot on 4G ram machines suggests that we were pretty close to the edge. The "30" is arbitary, but smaller than the 'nkpt' variable on all machines that I checked.	2000-11-30 01:53:02 +00:00
Tor Egge	d9b05734cc	Prepare for a cleanup of pmap module API pollution introduced by the suggested fix in PR 12378. Keep track of all existing pmaps independent of existing processes. This allows for a process to temporarily connect to a different address space without the risk of missing an update of the original address space if the kernel grows. pmap_pinit2() is no longer needed on the i386 platform but is left as a stub until the alpha pmap code is updated. PR: 12378	2000-08-16 21:24:44 +00:00
Jake Burkholder	e39756439c	Back out the previous change to the queue(3) interface. It was not discussed and should probably not happen. Requested by: msmith and others	2000-05-26 02:09:24 +00:00
Jake Burkholder	740a1973a6	Change the way that the queue(3) structures are declared; don't assume that the type argument to _HEAD and _ENTRY is a struct. Suggested by: phk Reviewed by: phk Approved by: mdodd	2000-05-23 20:41:01 +00:00
Peter Wemm	0385347c1a	Implement an optimization of the VM<->pmap API. Pass vm_page_t's directly to various pmap_*() functions instead of looking up the physical address and passing that. In many cases, the first thing the pmap code was doing was going to a lot of trouble to get back the original vm_page_t, or it's shadow pv_table entry. Inspired by: John Dyson's 1998 patches. Also: Eliminate pv_table as a seperate thing and build it into a machine dependent part of vm_page_t. This eliminates having a seperate set of structions that shadow each other in a 1:1 fashion that we often went to a lot of trouble to translate from one to the other. (see above) This happens to save 4 bytes of physical memory for each page in the system. (8 bytes on the Alpha). Eliminate the use of the phys_avail[] array to determine if a page is managed (ie: it has pv_entries etc). Store this information in a flag. Things like device_pager set it because they create vm_page_t's on the fly that do not have pv_entries. This makes it easier to "unmanage" a page of physical memory (this will be taken advantage of in subsequent commits). Add a function to add a new page to the freelist. This could be used for reclaiming the previously wasted pages left over from preloaded loader(8) files. Reviewed by: dillon	2000-05-21 12:50:18 +00:00
Peter Wemm	664a31e496	Change #ifdef KERNEL to #ifdef _KERNEL in the public headers. "KERNEL" is an application space macro and the applications are supposed to be free to use it as they please (but cannot). This is consistant with the other BSD's who made this change quite some time ago. More commits to come.	1999-12-29 04:46:21 +00:00
Peter Wemm	7ba4a77549	Reclaim UPAGES_HOLE (8k) that was chopped out of process address space. The UPAGES have not been there since Jan '96, but the hole was preserved for BSD/OS binary compatability. This has been fixed other ways (%ebx now has a pointer to PS_STRINGS), and the stack is nowhere near where it used to be so this hack isn't required anymore.	1999-12-11 10:54:06 +00:00
Peter Wemm	1a16554b8f	Make pmap_mapdev() deal with non-page-aligned requests. Add a corresponding pmap_unmapdev() to release the KVM back to kernel_map.	1999-09-11 20:31:32 +00:00
Peter Wemm	c3aac50f28	$Id$ -> $FreeBSD$	1999-08-28 01:08:13 +00:00
Luoqi Chen	541e018708	Do not setup 4M pdir until all APs are up.	1999-06-23 21:47:24 +00:00
Alan Cox	087e80a934	Put in place the infrastructure for improved UP and SMP TLB management. In particular, replace the unused field pmap::pm_flag by pmap::pm_active, which is a bit mask representing which processors have the pmap activated. (Thus, it is a simple Boolean on UPs.) Also, eliminate an unnecessary memory reference from cpu_switch() in swtch.s. Assisted by: John S. Dyson <dyson@iquest.net> Tested by: Luoqi Chen <luoqi@watermarkgroup.com>, Poul-Henning Kamp <phk@critter.freebsd.dk>	1999-04-02 17:59:49 +00:00

1 2 3

109 Commits