freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-21 11:13:30 +00:00

Author	SHA1	Message	Date
Andre Oppermann	d99b0dd2c5	Rewrite kern_sendfile() to work in two loops, the inner which turns as many VM pages into mbufs as it can -- up to the free send socket buffer space. The outer loop then drops the whole mbuf chain into the send socket buffer, calls tcp_output() on it and then waits until 50% of the socket buffer are free again to repeat the cycle. This way tcp_output() gets the full amount of data to work with and can issue up to 64K sends for TSO to chop up in the network adapter without using any CPU cycles. Thus it gets very efficient especially with the readahead the VM and I/O system do. The previous sendfile(2) code simply looped over the file, turned each 4K page into an mbuf and sent it off. This had the effect that TSO could only generate 2 packets per send instead of up to 44 at its maximum of 64K. Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of sleeping on mbuf allocation failures. Benchmarking shows significant improvements (95% confidence): 45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO) 83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 16:53:26 +00:00
John Baldwin	1ac27db5b7	Increment nb_allocated while holding the pt_mtx lock to avoid races.	2006-11-01 16:50:13 +00:00
John Baldwin	9045eda252	Comment and style tweak.	2006-11-01 16:48:33 +00:00
John Birrell	3d068827c2	Add a cnputs() function to write a string to the console with a lock to prevent interspersed strings written from different CPUs at the same time. To avoid putting a buffer on the stack or having to malloc one, space is incorporated in the per-cpu structure. The buffer size if 128 bytes; chosen because it's the next power of 2 size up from 80 characters. String writes to the console are buffered up the end of the line or until the buffer fills. Then the buffer is flushed to all console devices. Existing low level console output via cnputc() is unaffected by this change. ithread calls to log() are also unaffected to avoid blocking those threads. A minor change to the behaviour in a panic situation is that console output will still be buffered, but won't be written to a tty as before. This should prevent interspersed panic output as a number of CPUs panic before we end up single threaded running ddb. Reviewed by: scottl, jhb MFC after: 2 weeks	2006-11-01 04:54:51 +00:00
Pawel Jakub Dawidek	1a60c7fc8e	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl	2006-10-31 21:48:54 +00:00
Pawel Jakub Dawidek	c3618c657a	Add a new I/O request - BIO_FLUSH, which basically tells providers below to flush their caches. For now will mostly be used by disks to flush their write cache. Sponsored by: home.pl	2006-10-31 21:11:21 +00:00
Alan Cox	0c2b04b419	Refactor vfs_setdirty(), creating vfs_setdirty_locked_object(). Call vfs_setdirty_locked_object() from vfs_busy_pages() instead of vfs_setdirty(), thereby eliminating a second acquisition and release of the same vm object lock.	2006-10-29 00:04:39 +00:00
Alan Cox	20ed1b5b1b	In bufdone_finish() restrict the acquisition and release of the page queues lock to BIO_READ operations. Recent changes to the implementation of the per-page flags have eliminated the need for the page queues lock in the other cases.	2006-10-28 19:16:57 +00:00
David Xu	d21ac9b686	Remove member p_procscopegrp which is no longer used by libthr.	2006-10-27 05:45:44 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
Konstantin Belousov	9a969e626c	The attempt to rename "." with MAC framework compiled in would cause attempt to twice unlock the vnode. Check that ni_vp and ni_dvp are different before doing second unlock. Reviewed by: rwatson Approved by: pjd (mentor) MFC after: 1 week	2006-10-26 13:20:28 +00:00
Robert Watson	24076d138e	Increase usefulness of "show malloc" by moving from displaying the basic counters of allocs/frees/use for each malloc type to calculating InUse, MemUse, and Requests as displayed by the userspace vmstat -m. This is more useful when debugging malloc(9)-related memory leaks, where the count of allocs/frees may not usefully reflect that current memory allocation (i.e., when highly variable size allocations occur with the same malloc type, such as with contigmalloc). MFC after: 3 days Limitations observed by: scottl	2006-10-26 10:17:13 +00:00
David Xu	4c9b02c253	Optimize umtx_lock_pi() a bit by moving some heavy code out of the loop, make a fast path when a umtx_pi can be allocated without being blocked.	2006-10-26 09:33:34 +00:00
David Xu	7c24ae418a	In order to eliminate a branch, convert opcode to unsigned integer.	2006-10-25 06:38:46 +00:00
David Xu	91d0b4d615	Eliminate an unnecessary `if' statement.	2006-10-25 06:28:23 +00:00
David Xu	ff7668079f	Move sigqueue_take() call into proc_reparent(), this fixed bugs where proc_reparent() is called but sigqueue_take() is forgotten.	2006-10-25 06:18:04 +00:00
David Xu	e94cc4ac30	Protect sigqueue_take() call by child process's lock, it fixed a potential race with ptrace 'attach' which changes parent of the child process.	2006-10-24 12:04:21 +00:00
Poul-Henning Kamp	7ea93e912b	Better naming of fattime conversion functions, they do convert to timespec after all. Add 'utc' argument to control if fattimestamps are on UTC or local timezone calendar.	2006-10-24 10:27:23 +00:00
Alan Cox	2a53696fb8	The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.	2006-10-22 21:18:48 +00:00
Poul-Henning Kamp	b39be1b35c	Add two new functions to convert FAT filesystem format timestamps to and from struct timespec, to replace the crummy conversion function which have been copy&pasted into three different filesystems already. Apart from general crummyness as indicated by code like: for (year = 1970;; year++) { inc = year & 0x03 ? 365 : 366; if (days < inc) break; days -= inc; } They also contain specialized crummyness which tries to compensate for the general crummyness by caching recent conversion results, with no regard for locking or consistency. These replacement functions are smaller, O(1) and handle the Y2.1K leap-year correctly. Ideally, these functions should live in a module of their own, which the three offending filesystems would depend on, but the size is 877 bytes of code (on i386), so that would be false economy.	2006-10-22 18:19:08 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00
David Xu	5c28a8d474	Use macro TAILQ_FOREACH_SAFE instead of expanding it.	2006-10-22 00:09:41 +00:00
David Xu	f71e748d89	Since revision 1.333 of kern_sig.c no longer uses P_WEXIT, the change opened a race window which can cause memory leak in signal queue. Here we free memory for signal queue when process state is set to PRS_ZOMBIE.	2006-10-21 23:59:15 +00:00
John Baldwin	0fc32899f1	Remove the check that prevented signals from being delivered to exiting processes. It was originally added back when support for Linux threads (and thus shared sigacts objects) was added, but no one knows why. My guess is that at some point during the Linux threads patches, the sigacts object was torn down during exit1(), so this check was added to prevent a panic for that race. However, the stuff that was actually committed to the tree doesn't teardown sigacts until wait() making the above race moot. Re-allowing signals here lets one interrupt a NFS request during process teardown (such as closing descriptors) on an interruptible mount. Requested by: kib (long time ago) MFC after: 1 week	2006-10-20 16:19:21 +00:00
Konstantin Belousov	1663075c64	Fix the race between devfs_fp_check and devfs_reclaim. Derefence the vnode' v_rdev and increment the dev threadcount , as well as clear it (in devfs_reclaim) under the dev_lock(). Reviewed by: tegge Approved by: pjd (mentor)	2006-10-20 07:59:50 +00:00
Bruce Evans	1ca2c0183f	kern_intr.c: - Count (scheduling of) software interrupts (SWIs) as SWIs, not as hardware interrupts. - Don't count (scheduling of) delayed SWIs as interrupts at all, since in the delayed case it is expected that there are many more scheduling calls than handling calls. Perhaps all interrupts should be counted only when they are handled, but it is only counts of delayed SWIs that shouldn never be combined with the other counts. subr_trap.c: - Count (handling of) Asynchronous System Traps (ASTs) as traps, not as software interrupts. Before these changes, the counter for SWIs only counted ASTs, and SWIs weren't counted separately, but a subcounter for ASTs alone is less needed than for most other exception sources. 4.4BSD-Lite uses the counters for similar things (actually matching their names) on its main arches (hp300, ..., !i386) where more of the exceptions are in hardware.	2006-10-18 04:48:09 +00:00
David Xu	034b26fc65	Regenerate.	2006-10-17 02:28:58 +00:00
David Xu	5f641fc0fb	o Add keyword volatile for user mutex owner field. o Fix type consistent problem by using type long for old umtx and wait channel. o Rename casuptr to casuword.	2006-10-17 02:24:47 +00:00
Alexander Leidinger	6a1162d4cd	MFP4 (with some minor changes): Implement the linux_io_* syscalls (AIO). They are only enabled if the native AIO code is available (either compiled in to the kernel or as a module) at the time the functions are used. If the AIO stuff is not available there will be a ENOSYS. From the submitter: ---snip--- DESIGN NOTES: 1. Linux permits a process to own multiple AIO queues (distinguished by "context"), but FreeBSD creates only one single AIO queue per process. My code maintains a request queue (STAILQ of queue(3)) per "context", and throws all AIO requests of all contexts owned by a process into the single FreeBSD per-process AIO queue. When the process calls io_destroy(2), io_getevents(2), io_submit(2) and io_cancel(2), my code can pick out requests owned by the specified context from the single FreeBSD per-process AIO queue according to the per-context request queues maintained by my code. 2. The request queue maintained by my code stores contrast information between Linux IO control blocks (struct linux_iocb) and FreeBSD IO control blocks (struct aiocb). FreeBSD IO control block actually exists in userland memory space, required by FreeBSD native aio_XXXXXX(2). 3. It is quite troubling that the function io_getevents() of libaio-0.3.105 needs to use Linux-specific "struct aio_ring", which is a partial mirror of context in user space. I would rather take the address of context in kernel as the context ID, but the io_getevents() of libaio forces me to take the address of the "ring" in user space as the context ID. To my surprise, one comment line in the file "io_getevents.c" of libaio-0.3.105 reads: Ben will hate me for this REFERENCE: 1. Linux kernel source code: http://www.kernel.org/pub/linux/kernel/v2.6/ (include/linux/aio_abi.h, fs/aio.c) 2. Linux manual pages: http://www.kernel.org/pub/linux/docs/manpages/ (io_setup(2), io_destroy(2), io_getevents(2), io_submit(2), io_cancel(2)) 3. Linux Scalability Effort: http://lse.sourceforge.net/io/aio.html The design notes: http://lse.sourceforge.net/io/aionotes.txt 4. The package libaio, both source and binary: http://rpmfind.net/linux/rpm2html/search.php?query=libaio Simple transparent interface to Linux AIO system calls. 5. Libaio-oracle: http://oss.oracle.com/projects/libaio-oracle/ POSIX AIO implementation based on Linux AIO system calls (depending on libaio). ---snip--- Submitted by: Li, Xiao <intron@intron.ac>	2006-10-15 14:22:14 +00:00
Ruslan Ermilov	a1b0a18096	Prevent IOC_IN with zero size argument (this is only supported if backward copatibility options are present) from attempting to free memory that wasn't allocated. This is an old bug, and previously it would attempt to free a null pointer. I noticed this bug when working on the previous revision, but forgot to fix it. Security: local DoS Reported by: Peter Holm MFC after: 3 days	2006-10-14 19:01:55 +00:00
Tom Rhodes	f51bf07af8	Close a race condition where num can be larger than tmp, giving the user too large of a boundary. Reported by: Ilja Van Sprundel	2006-10-14 10:30:14 +00:00
Tor Egge	e0c33ad529	Wait for thread count to reach zero in destroy_devl() even when no purge method is defined, to avoid memory being modified after free. Temporarily increase refcount in destroy_devl() to avoid a double free if dev_rel() is called while waiting for thread count to reach zero.	2006-10-13 20:49:24 +00:00
Gleb Smirnoff	68a57ebfad	Improve ktr(4) logging for callout(9) subsystem. Log all inserts and removals, including failures, into the callwheel. XXX: Most of the CTR() macros are called with callout_lock spin mutex held, thus won't be logged into file, if KTR_ALQ is used. Moving the CTR() macros out from the spinlocked code would require copying of all arguments. I'm too lazy to do this.	2006-10-11 14:57:03 +00:00
David Xu	ae7d8a6766	Implement 32bit umtx_lock and umtx_unlock system calls, these two system calls are not used by libthr in RELENG_6 and HEAD, it is only used by the libthr in RELENG-5, the _umtx_op system call can do more incremental dirty works than these two system calls without having to introduce new system calls or throw away old system calls when things are going on.	2006-10-06 08:22:08 +00:00
David Xu	c6511aea86	Move some declaration of 32-bit signal structures into file freebsd32-signal.h, implement sigtimedwait and sigwaitinfo system calls.	2006-10-05 01:56:11 +00:00
Martin Blapp	89ff1e4cb8	Back out part of rev. 1.149. While adding a workaround in ptcopen() to avoid leaked ptys works fine, this opens a possible security hole. Submitted by: bde MFC after: 3 days	2006-10-04 05:43:39 +00:00
Robert Watson	531147aa3e	Regenerate.	2006-10-03 20:48:11 +00:00
Robert Watson	888db9e177	Audit creat() system call (compat code), and change type for getpagesize(), which isn't actually being audited anyway. MFC after: 3 days Obtained from: TrustedBSD Project	2006-10-03 20:46:52 +00:00
Konstantin Belousov	30af71199e	Fix the remaining race in the revs. 1.232, 1,233 that could occur during unmount when mp structure is reused while waiting for coveredvp lock. Introduce struct mount generation count, increment it on each reuse and compare the generations before and after obtaining the coveredvp lock. Reviewed by: tegge, pjd Approved by: pjd (mentor) MFC after: 2 weeks	2006-10-03 10:47:04 +00:00
Poul-Henning Kamp	e5037a18a9	Use utc_offset() where applicable, and hide the internals of it as static variables.	2006-10-02 18:23:37 +00:00
Poul-Henning Kamp	f97c1c4bf7	Introduce utc_offset() to capture a calculation currently done all over the place.	2006-10-02 16:17:23 +00:00
Poul-Henning Kamp	94d67e0fb8	Move tz_minuteswest and tz_dsttime to subr_clock.c	2006-10-02 16:06:26 +00:00
Poul-Henning Kamp	b69f71eb29	Second part of a little cleanup in the calendar/timezone/RTC handling. Split subr_clock.c in two parts (by repo-copy): subr_clock.c contains generic RTC and calendaric stuff. etc. subr_rtc.c contains the newbus'ified RTC interface. Centralize the machdep.{adjkerntz,disable_rtc_set,wall_cmos_clock} sysctls and associated variables into subr_clock.c. They are not machine dependent and we have generic code that relies on being present so they are not even optional.	2006-10-02 15:42:02 +00:00
Poul-Henning Kamp	f645b0b51c	First part of a little cleanup in the calendar/timezone/RTC handling. Move relevant variables to <sys/clock.h> and fix #includes as necessary. Use libkern's much more time- & spamce-efficient BCD routines.	2006-10-02 12:59:59 +00:00
Konstantin Belousov	45ea8737bf	Correct the comment: numvnodes is decreased on vdestroying the vnode. OKed by: tegge Approved by: pjd (mentor) MFC after: 1 week	2006-10-02 07:25:58 +00:00
Tor Egge	04aa807cb6	If the buffer lock has waiters after the buffer has changed identity then getnewbuf() needs to drop the buffer in order to wake waiters that might sleep on the buffer in the context of the old identity.	2006-10-02 02:06:27 +00:00
Martin Blapp	570d6457d1	Readd rev. 1.145 because of vfs bugs and races near revoke(). Until they are fixed we can't free any slaves. Add a workaround to not to leak ptys by number.	2006-09-30 22:51:05 +00:00
Pawel Jakub Dawidek	2342d5216e	Remove duplicated $FreeBSD$.	2006-09-30 16:33:29 +00:00
Martin Blapp	35dcc318f4	Any call of tty_close() with a tty refcount of <= 1 is wrong and we will free the tty in this case. This is a workaround until the underlaying devfs/tty problems are fixed. MFC after: 1 day	2006-09-30 08:11:51 +00:00

1 2 3 4 5 ...

9599 Commits