freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-21 11:13:30 +00:00

Author	SHA1	Message	Date
Kip Macy	44a96b46bd	Unbreak witness	2006-11-12 23:23:38 +00:00
Andre Oppermann	3e932ca715	In kern_sendfile() fix the calculation of sbytes (the total number of bytes written to the socket). The rewrite in revision 1.240 got confused by the FreeBSD 4.x bug compatibility code. For some reason lighttpd, that was used for testing the new sendfile code, was not affected by the problem but apache and others using headers/trailers in the sendfile call received incorrect sbytes values after return from non- blocking sockets. This then lead to restarts with wrong offsets and thus mixed up file contents when the socket was writeable again. All programs not using headers/trailers, like ftpd, were not affected by the bug. Reported by: Pawel Worach <pawel.worach-at-gmail.com> Tested by: Pawel Worach <pawel.worach-at-gmail.com>	2006-11-12 20:57:00 +00:00
David Xu	60d4823594	Copy base user priority in NO_KSE case.	2006-11-12 11:48:37 +00:00
Tom Rhodes	bedc1c9c96	Fix mispatch of includes list; allows my kernel to build successfully.	2006-11-12 03:34:03 +00:00
Kip Macy	54e57f7613	show lock class in profiling output for default case where type is not specified when initializing the lock Approved by: scottl (standing in for mentor rwatson)	2006-11-12 03:30:01 +00:00
David Xu	812fb4a89f	Use mi_switch, this should fix loadavg calculation problem in NO_KSE case.	2006-11-12 03:18:22 +00:00
Tom Rhodes	c4f7f0fd4a	Update includes for sys/posix4 move. Approved by: silence on -arch and -standards	2006-11-11 16:46:31 +00:00
Tom Rhodes	6aeb05d7be	Merge posix4/* into normal kernel hierarchy. Reviewed by: glanced at by jhb Approved by: silence on -arch@ and -standards@	2006-11-11 16:26:58 +00:00
Tom Rhodes	bdd04ab184	Update #includes list.	2006-11-11 16:19:12 +00:00
David Xu	5a21514727	Unbreak userland priority inheriting in NO_KSE case.	2006-11-11 13:11:29 +00:00
Kip Macy	ed6a7c42f6	tinderbox fix	2006-11-11 07:38:48 +00:00
Kip Macy	cf2c39e7a2	remove lingering call to rd(tick)	2006-11-11 07:28:45 +00:00
Kip Macy	83b72e3e25	missed nits replacing mutex with lock	2006-11-11 06:28:47 +00:00
Kip Macy	7c0435b933	MUTEX_PROFILING has been generalized to LOCK_PROFILING. We now profile wait (time waited to acquire) and hold times for all kernel locks. If the architecture has a system synchronized TSC, the profiling code will use that - thereby minimizing profiling overhead. Large chunks of profiling code have been moved out of line, the overhead measured on the T1 for when it is compiled in but not enabled is < 1%. Approved by: scottl (standing in for mentor rwatson) Reviewed by: des and jhb	2006-11-11 03:18:07 +00:00
Maxim Konovalov	f645b5da88	o Fix a couple of obvious typos.	2006-11-08 09:09:07 +00:00
Andre Oppermann	62b36a7fc2	Style cleanups to the sctp_* syscall functions.	2006-11-07 21:28:12 +00:00
John Baldwin	6b8de13ab4	Simplify operations with sync_mtx in sched_sync(): - Don't drop the lock just to reacquire it again to check rushjob, this only wastes time. - Use msleep() to drop the mutex while sleeping instead of explicitly unlocking around tsleep. Reviewed by: pjd	2006-11-07 19:45:05 +00:00
John Baldwin	8064e5d71f	Fix comment typo and function declaration.	2006-11-07 19:07:33 +00:00
Tor Egge	40dee3da29	Don't drop reference to tty in tty_close() if TS_ISOPEN is already cleared. Reviewed by: bde	2006-11-06 22:12:43 +00:00
Andre Oppermann	bda8b1f3b8	Handle early errors in kern_sendfile() by introducing a new goto 'out' label after the sbunlock() part. This correctly handles calls to sendfile(2) without valid parameters that was broken in rev. 1.240. Coverity error: 272162	2006-11-06 21:53:19 +00:00
Robert Watson	acd3428b7d	Sweep kernel replacing suser(9) calls with priv(9) calls, assigning specific privilege names to a broad range of privileges. These may require some future tweaking. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:42:10 +00:00
Robert Watson	800c940832	Add a new priv(9) kernel interface for checking the availability of privilege for threads and credentials. Unlike the existing suser(9) interface, priv(9) exposes a named privilege identifier to the privilege checking code, allowing more complex policies regarding the granting of privilege to be expressed. Two interfaces are provided, replacing the existing suser(9) interface: suser(td) -> priv_check(td, priv) suser_cred(cred, flags) -> priv_check_cred(cred, priv, flags) A comprehensive list of currently available kernel privileges may be found in priv.h. New privileges are easily added as required, but the comments on adding privileges found in priv.h and priv(9) should be read before doing so. The new privilege interface exposed sufficient information to the privilege checking routine that it will now be possible for jail to determine whether a particular privilege is granted in the check routine, rather than relying on hints from the calling context via the SUSER_ALLOWJAIL flag. For now, the flag is maintained, but a new jail check function, prison_priv_check(), is exposed from kern_jail.c and used by the privilege check routine to determine if the privilege is permitted in jail. As a result, a centralized list of privileges permitted in jail is now present in kern_jail.c. The MAC Framework is now also able to instrument privilege checks, both to deny privileges otherwise granted (mac_priv_check()), and to grant privileges otherwise denied (mac_priv_grant()), permitting MAC Policy modules to implement privilege models, as well as control a much broader range of system behavior in order to constrain processes running with root privilege. The suser() and suser_cred() functions remain implemented, now in terms of priv_check() and the PRIV_ROOT privilege, for use during the transition and possibly continuing use by third party kernel modules that have not been updated. The PRIV_DRIVER privilege exists to allow device drivers to check privilege without adopting a more specific privilege identifier. This change does not modify the actual security policy, rather, it modifies the interface for privilege checks so changes to the security policy become more feasible. Sponsored by: nCircle Network Security, Inc. Obtained from: TrustedBSD Project Discussed on: arch@ Reviewed (at least in part) by: mlaier, jmg, pjd, bde, ceri, Alex Lyashkov <umka at sevcity dot net>, Skip Ford <skip dot ford at verizon dot net>, Antoine Brodin <antoine dot brodin at laposte dot net>	2006-11-06 13:37:19 +00:00
Pawel Jakub Dawidek	a2ca03b3ad	Typo, 'from' vnode is locked here, not 'to' vnode.	2006-11-04 23:57:02 +00:00
Randall Stewart	af99851047	This commits the remake in kern/ make sysent to get the correct syscalls.master's $FreeBSD$ tag record and a make sysent in sys/compat/freebsd32. Thanks Ruslan for pointing out the steps I missed :-0 Approved by: gnn	2006-11-03 18:57:49 +00:00
Randall Stewart	f8829a4a40	Ok, here it is, we finally add SCTP to current. Note that this work is not just mine, but it is also the works of Peter Lei and Michael Tuexen. They both are my two key other developers working on the project.. and they need ata-boy's too: ** peterlei@cisco.com tuexen@fh-muenster.de ** I did do a make sysent which updated the syscall's and sysproto.. I hope that is correct... without it you don't build since we have new syscalls for SCTP :-0 So go out and look at the NOTES, add option SCTP (make sure inet and inet6 are present too) and play with SCTP. I will see about comitting some test tools I have after I figure out where I should place them. I also have a lib (libsctp.a) that adds some of the missing socketapi functions that I need to put into lib's.. I will talk to George about this :-) There may still be some 64 bit issues in here, none of us have a 64 bit processor to test with yet.. Michael may have a MAC but thats another beast too.. If you have a mac and want to use SCTP contact Michael he maintains a web site with a loadable module with this code :-) Reviewed by: gnn Approved by: gnn	2006-11-03 15:23:16 +00:00
John Birrell	35b927a8c4	Always init the console before trying to cnadd it to avoid the case where the console name isn't set and cnadd wants to use printf to complain about it.	2006-11-03 06:23:53 +00:00
Andre Oppermann	1ae4d97d51	Use the improved m_uiotombuf() function instead of home grown sosend_copyin() to do the userland to kernel copying in sosend_generic() and sosend_dgram(). sosend_copyin() is retained for ZERO_COPY_SOCKETS which are not yet supported by m_uiotombuf(). Benchmaring shows significant improvements (95% confidence): 66% less cpu (or 2.9 times better) with new sosend vs. old sosend (non-TSO) 65% less cpu (or 2.8 times better) with new sosend vs. old sosend (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 17:45:28 +00:00
Andre Oppermann	5e20f43d31	Rename m_getm() to m_getm2() and rewrite it to allocate up to page sized mbuf clusters. Add a flags parameter to accept M_PKTHDR and M_EOR mbuf chain flags. Provide compatibility macro for m_getm() calling m_getm2() with M_PKTHDR set. Rewrite m_uiotombuf() to use m_getm2() for mbuf allocation and do the uiomove() in a tight loop over the mbuf chain. Add a flags parameter to accept mbuf flags to be passed to m_getm2(). Adjust all callers for the extra parameter. Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 17:37:22 +00:00
Andre Oppermann	d99b0dd2c5	Rewrite kern_sendfile() to work in two loops, the inner which turns as many VM pages into mbufs as it can -- up to the free send socket buffer space. The outer loop then drops the whole mbuf chain into the send socket buffer, calls tcp_output() on it and then waits until 50% of the socket buffer are free again to repeat the cycle. This way tcp_output() gets the full amount of data to work with and can issue up to 64K sends for TSO to chop up in the network adapter without using any CPU cycles. Thus it gets very efficient especially with the readahead the VM and I/O system do. The previous sendfile(2) code simply looped over the file, turned each 4K page into an mbuf and sent it off. This had the effect that TSO could only generate 2 packets per send instead of up to 44 at its maximum of 64K. Add experimental SF_MNOWAIT flag to sendfile(2) to return ENOMEM instead of sleeping on mbuf allocation failures. Benchmarking shows significant improvements (95% confidence): 45% less cpu (or 1.81 times better) with new sendfile vs. old sendfile (non-TSO) 83% less cpu (or 5.7 times better) with new sendfile vs. old sendfile (TSO) (Sender AMD Opteron 852 (2.6GHz) with em(4) PCI-X-133 interface and receiver DELL Poweredge SC1425 P-IV Xeon 3.2GHz with em(4) LOM connected back to back at 1000Base-TX full duplex.) Sponsored by: TCP/IP Optimization Fundraise 2005 MFC after: 3 month	2006-11-02 16:53:26 +00:00
John Baldwin	1ac27db5b7	Increment nb_allocated while holding the pt_mtx lock to avoid races.	2006-11-01 16:50:13 +00:00
John Baldwin	9045eda252	Comment and style tweak.	2006-11-01 16:48:33 +00:00
John Birrell	3d068827c2	Add a cnputs() function to write a string to the console with a lock to prevent interspersed strings written from different CPUs at the same time. To avoid putting a buffer on the stack or having to malloc one, space is incorporated in the per-cpu structure. The buffer size if 128 bytes; chosen because it's the next power of 2 size up from 80 characters. String writes to the console are buffered up the end of the line or until the buffer fills. Then the buffer is flushed to all console devices. Existing low level console output via cnputc() is unaffected by this change. ithread calls to log() are also unaffected to avoid blocking those threads. A minor change to the behaviour in a panic situation is that console output will still be buffered, but won't be written to a tty as before. This should prevent interspersed panic output as a number of CPUs panic before we end up single threaded running ddb. Reviewed by: scottl, jhb MFC after: 2 weeks	2006-11-01 04:54:51 +00:00
Pawel Jakub Dawidek	1a60c7fc8e	Add gjournal specific code to the UFS file system: - Add FS_GJOURNAL flag which enables gjournal support on a file system. - Add cg_unrefs field to the cylinder group structure which holds number of unreferenced (orphaned) inodes in the given cylinder group. - Add fs_unrefs field to the super block structure which holds total number of unreferenced (orphaned) inodes. - When file or a directory is orphaned (last reference is removed, but object is still open), increase fs_unrefs and cg_unrefs fields, which is a hint for fsck in which cylinder groups looks for such (orphaned) objects. - When file is last closed, decrease {fs,cg}_unrefs fields. - Add VV_DELETED vnode flag which points at orphaned objects. Sponsored by: home.pl	2006-10-31 21:48:54 +00:00
Pawel Jakub Dawidek	c3618c657a	Add a new I/O request - BIO_FLUSH, which basically tells providers below to flush their caches. For now will mostly be used by disks to flush their write cache. Sponsored by: home.pl	2006-10-31 21:11:21 +00:00
Alan Cox	0c2b04b419	Refactor vfs_setdirty(), creating vfs_setdirty_locked_object(). Call vfs_setdirty_locked_object() from vfs_busy_pages() instead of vfs_setdirty(), thereby eliminating a second acquisition and release of the same vm object lock.	2006-10-29 00:04:39 +00:00
Alan Cox	20ed1b5b1b	In bufdone_finish() restrict the acquisition and release of the page queues lock to BIO_READ operations. Recent changes to the implementation of the per-page flags have eliminated the need for the page queues lock in the other cases.	2006-10-28 19:16:57 +00:00
David Xu	d21ac9b686	Remove member p_procscopegrp which is no longer used by libthr.	2006-10-27 05:45:44 +00:00
John Birrell	8460a577a4	Make KSE a kernel option, turned on by default in all GENERIC kernel configs except sun4v (which doesn't process signals properly with KSE). Reviewed by: davidxu@	2006-10-26 21:42:22 +00:00
Konstantin Belousov	9a969e626c	The attempt to rename "." with MAC framework compiled in would cause attempt to twice unlock the vnode. Check that ni_vp and ni_dvp are different before doing second unlock. Reviewed by: rwatson Approved by: pjd (mentor) MFC after: 1 week	2006-10-26 13:20:28 +00:00
Robert Watson	24076d138e	Increase usefulness of "show malloc" by moving from displaying the basic counters of allocs/frees/use for each malloc type to calculating InUse, MemUse, and Requests as displayed by the userspace vmstat -m. This is more useful when debugging malloc(9)-related memory leaks, where the count of allocs/frees may not usefully reflect that current memory allocation (i.e., when highly variable size allocations occur with the same malloc type, such as with contigmalloc). MFC after: 3 days Limitations observed by: scottl	2006-10-26 10:17:13 +00:00
David Xu	4c9b02c253	Optimize umtx_lock_pi() a bit by moving some heavy code out of the loop, make a fast path when a umtx_pi can be allocated without being blocked.	2006-10-26 09:33:34 +00:00
David Xu	7c24ae418a	In order to eliminate a branch, convert opcode to unsigned integer.	2006-10-25 06:38:46 +00:00
David Xu	91d0b4d615	Eliminate an unnecessary `if' statement.	2006-10-25 06:28:23 +00:00
David Xu	ff7668079f	Move sigqueue_take() call into proc_reparent(), this fixed bugs where proc_reparent() is called but sigqueue_take() is forgotten.	2006-10-25 06:18:04 +00:00
David Xu	e94cc4ac30	Protect sigqueue_take() call by child process's lock, it fixed a potential race with ptrace 'attach' which changes parent of the child process.	2006-10-24 12:04:21 +00:00
Poul-Henning Kamp	7ea93e912b	Better naming of fattime conversion functions, they do convert to timespec after all. Add 'utc' argument to control if fattimestamps are on UTC or local timezone calendar.	2006-10-24 10:27:23 +00:00
Alan Cox	2a53696fb8	The page queues lock is no longer required by vm_page_busy() or vm_page_wakeup(). Reduce or eliminate its use accordingly.	2006-10-22 21:18:48 +00:00
Poul-Henning Kamp	b39be1b35c	Add two new functions to convert FAT filesystem format timestamps to and from struct timespec, to replace the crummy conversion function which have been copy&pasted into three different filesystems already. Apart from general crummyness as indicated by code like: for (year = 1970;; year++) { inc = year & 0x03 ? 365 : 366; if (days < inc) break; days -= inc; } They also contain specialized crummyness which tries to compensate for the general crummyness by caching recent conversion results, with no regard for locking or consistency. These replacement functions are smaller, O(1) and handle the Y2.1K leap-year correctly. Ideally, these functions should live in a module of their own, which the three offending filesystems would depend on, but the size is 877 bytes of code (on i386), so that would be false economy.	2006-10-22 18:19:08 +00:00
Robert Watson	aed5570872	Complete break-out of sys/sys/mac.h into sys/security/mac/mac_framework.h begun with a repo-copy of mac.h to mac_framework.h. sys/mac.h now contains the userspace and user<->kernel API and definitions, with all in-kernel interfaces moved to mac_framework.h, which is now included across most of the kernel instead. This change is the first step in a larger cleanup and sweep of MAC Framework interfaces in the kernel, and will not be MFC'd. Obtained from: TrustedBSD Project Sponsored by: SPARTA	2006-10-22 11:52:19 +00:00
Alan Cox	9af80719db	Replace PG_BUSY with VPO_BUSY. In other words, changes to the page's busy flag, i.e., VPO_BUSY, are now synchronized by the per-vm object lock instead of the global page queues lock.	2006-10-22 04:28:14 +00:00

1 2 3 4 5 ...

9627 Commits