freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-28 11:57:28 +00:00

Author	SHA1	Message	Date
John Baldwin	71361470b1	- Move syscall function argument structure types to be just above the relevenat system call function. - Whitespace fixes.	2009-06-24 13:35:38 +00:00
Robert Watson	36fecbf302	Add stack_print_short() and stack_print_short_ddb() interfaces to stack(9), which generate a more compact rendition of a stack trace via the kernel's printf. MFC after: 1 week	2009-06-24 12:06:15 +00:00
Jeff Roberson	50c202c592	Implement a facility for dynamic per-cpu variables. - Modules and kernel code alike may use DPCPU_DEFINE(), DPCPU_GET(), DPCPU_SET(), etc. akin to the statically defined PCPU_. Requires only one extra instruction more than PCPU_ and is virtually the same as __thread for builtin and much faster for shared objects. DPCPU variables can be initialized when defined. - Modules are supported by relocating the module's per-cpu linker set over space reserved in the kernel. Modules may fail to load if there is insufficient space available. - Track space available for modules with a one-off extent allocator. Free may block for memory to allocate space for an extent. Reviewed by: jhb, rwatson, kan, sam, grehan, marius, marcel, stas	2009-06-23 22:42:39 +00:00
Jeff Roberson	c76ee82799	- Use cpuset_t and the CPU_ macros in place of cpumask_t so that ULE supports arbitrary numbers of cpus rather than being limited by cpumask_t to the number of bits in a long.	2009-06-23 22:12:37 +00:00
Ed Schouten	9801591468	Improve my last commit: use a separate condvar to serialize. The advantage of using a separate condvar is that we can just use cv_signal(9) instead of cv_broadcast(9). It makes no sense to wake up multiple threads. It also makes the TTY code easier to understand. t_dcdwait sounds totally unrelated.	2009-06-23 21:43:02 +00:00
Ed Schouten	2d41cf3a24	Use dcdwait to block threads to serialize writes. I suspect the usage of bgwait causes a lot of spurious wakeups when threads are blocked in the background, because they will be woken up each time a write() call is performed. Also wakeup dcdwait when the TTY is abandoned.	2009-06-23 21:33:26 +00:00
Konstantin Belousov	3364c323e6	Implement global and per-uid accounting of the anonymous memory. Add rlimit RLIMIT_SWAP that limits the amount of swap that may be reserved for the uid. The accounting information (charge) is associated with either map entry, or vm object backing the entry, assuming the object is the first one in the shadow chain and entry does not require COW. Charge is moved from entry to object on allocation of the object, e.g. during the mmap, assuming the object is allocated, or on the first page fault on the entry. It moves back to the entry on forks due to COW setup. The per-entry granularity of accounting makes the charge process fair for processes that change uid during lifetime, and decrements charge for proper uid when region is unmapped. The interface of vm_pager_allocate(9) is extended by adding struct ucred *, that is used to charge appropriate uid when allocation if performed by kernel, e.g. md(4). Several syscalls, among them is fork(2), may now return ENOMEM when global or per-uid limits are enforced. In collaboration with: pho Reviewed by: alc Approved by: re (kensmith)	2009-06-23 20:45:22 +00:00
Jamie Gritton	b97457e2e6	Add a limit for child jails via the "children.cur" and "children.max" parameters. This replaces the simple "allow.jails" permission. Approved by: bz (mentor)	2009-06-23 20:35:51 +00:00
Bjoern A. Zeeb	5736e6fb9d	After cleaning up rt_tables from vnet.h and cleaning up opt_route.h a lot of files no longer need route.h either. Garbage collect them. While here remove now unneeded vnet.h #includes as well.	2009-06-23 17:03:45 +00:00
Jamie Gritton	399645e1b9	Remove unnecessary/redundant includes. Approved by: bz (mentor)	2009-06-23 14:39:21 +00:00
Peter Holm	401679debe	vn_open_cred() needs a non NULL ucred pointer Reviewed by: kib	2009-06-23 11:29:54 +00:00
Andre Oppermann	ef760e6ad2	Add soreceive_stream(), an optimized version of soreceive() for stream (TCP) sockets. It is functionally identical to generic soreceive() but has a number stream specific optimizations: o does only one sockbuf unlock/lock per receive independent of the length of data to be moved into the uio compared to soreceive() which unlocks/locks per mbuf. o uses m_mbuftouio() instead of its own copy(out) variant. o much more compact code flow as a large number of special cases is removed. o much improved reability. It offers significantly reduced CPU usage and lock contention when receiving fast TCP streams. Additional gains are obtained when the receiving application is using SO_RCVLOWAT to batch up some data before a read (and wakeup) is done. This function was written by "reverse engineering" and is not just a stripped down variant of soreceive(). It is not yet enabled by default on TCP sockets. Instead it is commented out in the protocol initialization in tcp_usrreq.c until more widespread testing has been done. Testers, especially with 10GigE gear, are welcome. MFP4: r164817 //depot/user/andre/soreceive_stream/	2009-06-22 23:08:05 +00:00
Andre Oppermann	bc05b2f6fa	Add m_mbuftouio() helper function to copy(out) an arbitrary long mbuf chain into an arbitrary large uio in a single step. It is a functional mirror image of m_uiotombuf(). This function is supposed to be used instead of hand rolled code with the same purpose and to concentrate it into one place for potential further optimization or hardware assistance.	2009-06-22 22:20:38 +00:00
Andre Oppermann	4ad1c464fe	In sbappendstream_locked() demote all incoming packet mbufs (and chains) to pure data mbufs using m_demote(). This removes the packet header and all m_tag information as they are not meaningful anymore on a stream socket where mbufs are linked through m->m_next. Strictly speaking a packet header can be only ever valid on the first mbuf in an m_next chain. sbcompress() was doing this already when the mbuf chain layout lent itself to it (e.g. header splitting or merge-append), just not consistently. This frees resources at socket buffer append time instead of at sbdrop_internal() time after data has been read from the socket. For MAC the per packet information has done its duty and during socket buffer appending the policy of the socket itself takes over. With the append the packet boundaries disappear naturally and with it any context that was based on it. None of the residual information from mbuf headers in the socket buffer on stream sockets was looked at.	2009-06-22 21:46:40 +00:00
John Baldwin	e47a833f38	Regen.	2009-06-22 20:24:03 +00:00
John Baldwin	773fa740c9	Include definitions for the audit identifiers for compat system calls in sysproto.h. This makes it possible to use SYSCALL_MODULE() for compat system calls that live in kernel modules.	2009-06-22 20:14:10 +00:00
John Baldwin	0b0fe06a5e	Fix a typo in a comment.	2009-06-22 20:12:40 +00:00
Andre Oppermann	0cb6db1d21	Update m_demote: - remove HT_HEADER test (MT_HEADER == MT_DATA for some time now) - be more pedantic about m_nextpkt in other than first mbuf - update m_flags to be retained	2009-06-22 19:35:39 +00:00
Konstantin Belousov	c808c9632d	Add explicit struct ucred * argument for VOP_VPTOCNP, to be used by vn_open_cred in default implementation. Valid struct ucred is needed for audit and MAC, and curthread credentials may be wrong. This further requires modifying the interface of vn_fullpath(9), but it is out of scope of this change. Reviewed by: rwatson	2009-06-21 19:21:01 +00:00
Konstantin Belousov	e0c161b89c	Add another flags argument to vn_open_cred. Use it to specify that some vn_open_cred invocations shall not audit namei path. In particular, specify VN_OPEN_NOAUDIT for dotdot lookup performed by default implementation of vop_vptocnp, and for the open done for core file. vn_fullpath is called from the audit code, and vn_open there need to disable audit to avoid infinite recursion. Core file is created on return to user mode, that, in particular, happens during syscall return. The creation of the core file is audited by direct calls, and we do not want to overwrite audit information for syscall. Reported, reviewed and tested by: rwatson	2009-06-21 13:41:32 +00:00
Roman Divacky	f7490cfe01	In non-debugging mode make this define (void)0 instead of nothing. This helps to catch bugs like the below with clang. if (cond); <--- note the trailing ; something(); Approved by: ed (mentor) Discussed on: current@	2009-06-21 07:54:47 +00:00
Brooks Davis	7f92e57820	Change crsetgroups_locked() (called by crsetgroups()) to sort the supplemental groups using insertion sort. Use this property in groupmember() to let us use a binary search instead of the previous linear search.	2009-06-20 20:29:21 +00:00
Ed Schouten	f8f6146082	Improve nested jail awareness of devfs by handling credentials. Now that we start to use credentials on character devices more often (because of MPSAFE TTY), move the prison-checks that are in place in the TTY code into devfs. Instead of strictly comparing the prisons, use the more common prison_check() function to compare credentials. This means that pseudo-terminals are only visible in devfs by processes within the same jail and parent jails. Even though regular users in parent jails can now interact with pseudo-terminals from child jails, this seems to be the right approach. These processes are also capable of interacting with the jailed processes anyway, through signals for example. Reviewed by: kib, rwatson (older version)	2009-06-20 14:50:32 +00:00
Kip Macy	5b204a113c	define helper routines for deferred mbuf initialization	2009-06-19 21:14:39 +00:00
Brooks Davis	838d985825	Rework the credential code to support larger values of NGROUPS and NGROUPS_MAX, eliminate ABI dependencies on them, and raise the to 1024 and 1023 respectively. (Previously they were equal, but under a close reading of POSIX, NGROUPS_MAX was defined to be too large by 1 since it is the number of supplemental groups, not total number of groups.) The bulk of the change consists of converting the struct ucred member cr_groups from a static array to a pointer. Do the equivalent in kinfo_proc. Introduce new interfaces crcopysafe() and crsetgroups() for duplicating a process credential before modifying it and for setting group lists respectively. Both interfaces take care for the details of allocating groups array. crsetgroups() takes care of truncating the group list to the current maximum (NGROUPS) if necessary. In the future, crsetgroups() may be responsible for insuring invariants such as sorting the supplemental groups to allow groupmember() to be implemented as a binary search. Because we can not change struct xucred without breaking application ABIs, we leave it alone and introduce a new XU_NGROUPS value which is always 16 and is to be used or NGRPS as appropriate for things such as NFS which need to use no more than 16 groups. When feasible, truncate the group list rather than generating an error. Minor changes: - Reduce the number of hand rolled versions of groupmember(). - Do not assign to both cr_gid and cr_groups[0]. - Modify ipfw to cache ucreds instead of part of their contents since they are immutable once referenced by more than one entity. Submitted by: Isilon Systems (initial implementation) X-MFC after: never PR: bin/113398 kern/133867	2009-06-19 17:10:35 +00:00
John Baldwin	afd9f91c63	Fix a deadlock in the getpeername() method for UNIX domain sockets. Instead of locking the local unp followed by the remote unp, use the same locking model as accept() and read lock the global link lock followed by the remote unp while fetching the remote sockaddr. Reported by: Mel Flynn mel.flynn of mailing.thruhere.net Reviewed by: rwatson MFC after: 1 week	2009-06-18 20:56:22 +00:00
Alan Cox	c45c00344f	Utilize the new function kmem_alloc_contig() to implement the UMA back-end allocator for the jumbo frames zones. This change has two benefits: (1) a custom back-end deallocator is no longer required. UMA's standard deallocator suffices. (2) It eliminates a potentially confusing artifact of using contigmalloc(): The malloc(9) statistics contain bogus information about the usage of jumbo frames. Specifically, the malloc(9) statistics report all jumbo frames in use whereas the UMA zone statistics report the "truth" about the number in use vs. the number free.	2009-06-18 17:59:04 +00:00
John Baldwin	4b7b144f63	Regen.	2009-06-17 19:53:47 +00:00
John Baldwin	21def99b51	- Add the ability to mix multiple flags seperated by pipe ('\|') characters in the type field of system call tables. Specifically, one can now use the 'NO' types as flags in addition to the 'COMPAT' types. For example, to tag 'COMPAT' system calls as living in a KLD via NOSTD. The COMPAT type is required to be listed first in this case. - Add new functions 'type()' and 'flag()' to the embedded awk script in makesyscalls.sh that return true if a requested flag is found in the type field ($3). The flag() function checks all of the flags in the field, but type() only checks the first flag. type() is meant to be used in the top-level "switch" statement and flag() should be used otherwise. - Retire the CPT_NOA type, it is now replaced with "COMPAT\|NOARGS" using the flags approach. - Tweak the comment descriptions of COMPAT[46] system calls so that they say "freebsd[46] foo" rather than "old foo". - Document the COMPAT6 type. - Sync comments in compat32 syscall table with the master table.	2009-06-17 19:50:38 +00:00
John Baldwin	0ec0b41cf6	Remove the now-unused NOIMPL flag. It serves no useful purpose given the existing UNIMPL and NOSTD types.	2009-06-17 18:46:14 +00:00
John Baldwin	f425839118	- NOSTD results in lkmressys being used instead of lkmssys. - Mark nfsclnt as UNIMPL. It should have been NOSTD instead of NOIMPL back when it lived in nfsclient.ko, but it was removed from that a long time ago.	2009-06-17 18:44:15 +00:00
Bjoern A. Zeeb	ebd8672cc3	Add explicit includes for jail.h to the files that need them and remove the "hidden" one from vimage.h.	2009-06-17 15:01:01 +00:00
Konstantin Belousov	f02c9d2858	Decrement state->ls_threads when vnode appeared to be doomed. Reported and tested by: pho	2009-06-17 12:43:04 +00:00
Attilio Rao	651175c9db	Introduce support for adaptive spinning in lockmgr. Actually, as it did receive few tuning, the support is disabled by default, but it can opt-in with the option ADAPTIVE_LOCKMGRS. Due to the nature of lockmgrs, adaptive spinning needs to be selectively enabled for any interested lockmgr. The support is bi-directional, or, in other ways, it will work in both cases if the lock is held in read or write way. In particular, the read path is passible of further tunning using the sysctls debug.lockmgr.retries and debug.lockmgr.loops . Ideally, such sysctls should be axed or compiled out before release. Addictionally note that adaptive spinning doesn't cope well with LK_SLEEPFAIL. The reason is that many (and probabilly all) consumers of LK_SLEEPFAIL are mainly interested in knowing if the interlock was dropped or not in order to reacquire it and re-test initial conditions. This directly interacts with adaptive spinning because lockmgr needs to drop the interlock while spinning in order to avoid a deadlock (further details in the comments inside the patch). Final note: finding someone willing to help on tuning this with relevant workloads would be either very important and appreciated. Tested by: jeff, pho Requested by: many	2009-06-17 01:55:42 +00:00
Konstantin Belousov	01ed174831	Do not use casts (int )0 and (struct thread )0 for the arguments of vn_rdwr, use NULL. Reviewed by: jhb MFC after: 1 week	2009-06-16 15:13:45 +00:00
Ed Schouten	eaaaf1906b	Perform some more cleanups to in-kernel session handling. The code that was in place in exit1() was mainly based on code from the old TTY layer. The main reason behind this, was because at one moment I ran a system that had two TTY layers in place at the same time. It is now sufficient to do the following: - Remove references from the session structure to the TTY vnode and the session leader. - If we have a controlling TTY and the session used by the TTY is equal to our session, send the SIGHUP. - If we have a vnode to the controlling TTY which has not been revoked, revoke it. While there, change sys/kern/tty.c to use s_ttyp in the comparison instead of s_ttyvp. It should not make any difference, because s_ttyvp can only become null when the session leader already left, but it's nicer to compare against the proper value.	2009-06-15 20:45:51 +00:00
John Baldwin	6653a58307	Regen.	2009-06-15 20:40:23 +00:00
John Baldwin	c4f16b69e1	Add a new 'void closefrom(int lowfd)' system call. When called, it closes any open file descriptors >= 'lowfd'. It is largely identical to the same function on other operating systems such as Solaris, DFly, NetBSD, and OpenBSD. One difference from other *BSD is that this closefrom() does not fail with any errors. In practice, while the manpages for NetBSD and OpenBSD claim that they return EINTR, they ignore internal errors from close() and never return EINTR. DFly does return EINTR, but for the common use case (closing fd's prior to execve()), the caller really wants all fd's closed and returning EINTR just forces callers to call closefrom() in a loop until it stops failing. Note that this implementation of closefrom(2) does not make any effort to resolve userland races with open(2) in other threads. As such, it is not multithread safe. Submitted by: rwatson (initial version) Reviewed by: rwatson MFC after: 2 weeks	2009-06-15 20:38:55 +00:00
Ed Schouten	9c373a81a3	Make tcsetsid(3) work on revoked TTYs. Right now the only way to make tcsetsid(3)/TIOCSCTTY work, is by ensuring the session leader is dead. This means that an application that catches SIGHUPs and performs a sleep prevents us from assigning a new session leader. Change the code to make it work on revoked TTYs as well. This allows us to change init(8) to make the shutdown script run in a more clean environment.	2009-06-15 19:17:52 +00:00
Jamie Gritton	9ed47d01eb	Get vnets from creds instead of threads where they're available, and from passed threads instead of curthread. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 19:01:53 +00:00
Jamie Gritton	679e13901c	Manage vnets via the jail system. If a jail is given the boolean parameter "vnet" when it is created, a new vnet instance will be created along with the jail. Networks interfaces can be moved between prisons with an ioctl similar to the one that moves them between vimages. For now vnets will co-exist under both jails and vimages, but soon struct vimage will be going away. Reviewed by: zec, julian Approved by: bz (mentor)	2009-06-15 18:59:29 +00:00
Jamie Gritton	c1f192193d	Rename the host-related prison fields to be the same as the host.* parameters they represent, and the variables they replaced, instead of abbreviated versions of them. Approved by: bz (mentor)	2009-06-13 15:39:12 +00:00
Bjoern A. Zeeb	aed2a06174	Remove the static from int hardlink_check_uid. There is an external use in the opensolaris code. I am not sure how this ever worked but I have seen two reports of: link_elf: symbol hardlink_check_uid undefined lately. Reported by: Scott Ullrich (sullrich gmail.com), pfsense Reported by: Mister Olli (mister.olli googlemail.com)	2009-06-13 13:09:20 +00:00
Jamie Gritton	7455b100af	Add counterparts to getcredhostname: getcreddomainname, getcredhostuuid, getcredhostid Suggested by: rmacklem Approved by: bz	2009-06-13 00:12:02 +00:00
Ed Schouten	13ace80b16	Revert my previous change, because it reintroduces an old regression. Because our rc scripts also open the /etc/ttyv* nodes, it revokes the console, preventing startup messages from being displayed. I really have to think about this. Maybe we should just give the console its own TTY and let it build on top of other TTYs. I'm still not sure what to do with input handling there.	2009-06-12 21:21:17 +00:00
Ed Schouten	4650ad4cc0	Prevent yet another staircase effect bug in the console device. Even though I thought I fixed the staircase issue (and I was no longer able to reproduce it), I got some reports of the issue still being there. It turns out the staircase effect still occurred when /dev/console was kept open while killing the getty on the same TTY (ttyv0). For some reason I can't figure out how the old TTY code dealt with that, so I assume the issue has always been there. I only exposed it more by merging consolectl with ttyv0, which means that the issue was present, even on systems without a serial console. I'm now marking the console device as being closed when closing the regular TTY device node. This means that when the getty shuts down, init(8) will open /dev/console, which means the termios attributes will always be reset in this case.	2009-06-12 20:29:55 +00:00
Paul Saab	a03fb243dc	Stop asserting on exclusive locks in fsync since it can now support shared vnode locking on ZFS. Reviewed by: jhb	2009-06-11 17:06:45 +00:00
Andriy Gapon	e76f11f441	strict kobj signatures: linker_if fixes in symtab_get method symtab parameter is made constant as this reflects actual intention and usage of the method Reviewed by: imp, current@ Approved by: jhb (mentor)	2009-06-11 17:05:45 +00:00
Konstantin Belousov	d8b0556c6d	Adapt vfs kqfilter to the shared vnode lock used by zfs write vop. Use vnode interlock to protect the knote fields [1]. The locking assumes that shared vnode lock is held, thus we get exclusive access to knote either by exclusive vnode lock protection, or by shared vnode lock + vnode interlock. Do not use kl_locked() method to assert either lock ownership or the fact that curthread does not own the lock. For shared locks, ownership is not recorded, e.g. VOP_ISLOCKED can return LK_SHARED for the shared lock not owned by curthread, causing false positives in kqueue subsystem assertions about knlist lock. Remove kl_locked method from knlist lock vector, and add two separate assertion methods kl_assert_locked and kl_assert_unlocked, that are supposed to use proper asserts. Change knlist_init accordingly. Add convenience function knlist_init_mtx to reduce number of arguments for typical knlist initialization. Submitted by: jhb [1] Noted by: jhb [2] Reviewed by: jhb Tested by: rnoland	2009-06-10 20:59:32 +00:00
Konstantin Belousov	5dd6aaba88	Do not leak the state->ls_lock after VI_DOOMED check introduced in the r192683. Reported by: pho Submitted by: jhb	2009-06-10 16:17:38 +00:00

1 2 3 4 5 ...

11229 Commits