freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-12-15 10:17:20 +00:00

Author	SHA1	Message	Date
Tor Egge	57051fdc4b	Close race between vmspace_exitfree() and exit1() and races between vmspace_exitfree() and vmspace_free() which could result in the same vmspace being freed twice. Factor out part of exit1() into new function vmspace_exit(). Attach to vmspace0 to allow old vmspace to be freed earlier. Add new function, vmspace_acquire_ref(), for obtaining a vmspace reference for a vmspace belonging to another process. Avoid changing vmspace refcount from 0 to 1 since that could also lead to the same vmspace being freed twice. Change vmtotal() and swapout_procs() to use vmspace_acquire_ref(). Reviewed by: alc	2006-05-29 21:28:56 +00:00
Christian S.J. Peron	d5e5634075	Kill the last Giant acquisition in the exit(2) code. This Giant acquisition doesn't appear to be protecting anything. Most of consumers funsetownlst(9) do not appear to be picking up Giant anywhere. This was originally a part of my Giant exit(2) clean up revision 1.272 but I thought it was a good idea to leave it out until we were able to analyze it better. Tested by: kris MFC after: 3 weeks	2006-04-10 14:07:28 +00:00
Peter Wemm	b9eee07e36	Remove the unused sva and eva arguments from pmap_remove_pages().	2006-04-03 21:16:10 +00:00
David Xu	e170bfda56	1. Count last time slice, this intends to fix "calcru: runtime went backwards" bug for threaded process. 2. Add comment about possible logical problem with scheduler. MFC after: 3 days	2006-03-14 04:00:21 +00:00
John Baldwin	06ad42b2f7	Close some races between procfs/ptrace and exit(2): - Reorder the events in exit(2) slightly so that we trigger the S_EXIT stop event earlier. After we have signalled that, we set P_WEXIT and then wait for any processes with a hold on the vmspace via PHOLD to release it. PHOLD now KASSERT()'s that P_WEXIT is clear when it is invoked, and PRELE now does a wakeup if P_WEXIT is set and p_lock drops to zero. - Change proc_rwmem() to require that the processing read from has its vmspace held via PHOLD by the caller and get rid of all the junk to screw around with the vmspace reference count as we no longer need it. - In ptrace() and pseudofs(), treat a process with P_WEXIT set as if it doesn't exist. - Only do one PHOLD in kern_ptrace() now, and do it earlier so it covers FIX_SSTEP() (since on alpha at least this can end up calling proc_rwmem() to clear an earlier single-step simualted via a breakpoint). We only do one to avoid races. Also, by making the EINVAL error for unknown requests be part of the default: case in the switch, the various switch cases can now just break out to return which removes a _lot_ of duplicated PRELE and proc unlocks, etc. Also, it fixes at least one bug where a LWP ptrace command could return EINVAL with the proc lock still held. - Changed the locking for ptrace_single_step(), ptrace_set_pc(), and ptrace_clear_single_step() to always be called with the proc lock held (it was a mixed bag previously). Alpha and arm have to drop the lock while the mess around with breakpoints, but other archs avoid extra lock release/acquires in ptrace(). I did have to fix a couple of other consumers in kern_kse and a few other places to hold the proc lock and PHOLD. Tested by: ps (1 mostly, but some bits of 2-4 as well) MFC after: 1 week	2006-02-22 18:57:50 +00:00
John Baldwin	bd106be404	Move the ruadd() in kern_exit() to save our final stats in our child stats even further down in exit1() so that it includes the runtime and tick counts from the final time slice for the dying thread. Reviewed by: phk	2006-02-21 21:48:42 +00:00
Poul-Henning Kamp	e8444a7e6f	CPU time accounting speedup (step 2) Keep accounting time (in per-cpu) cputicks and the statistics counts in the thread and summarize into struct proc when at context switch. Don't reach across CPUs in calcru(). Add code to calibrate the top speed of cpu_tickrate() for variable cpu_tick hardware (like TSC on power managed machines). Don't enforce monotonicity (at least for now) in calcru. While the calibrated cpu_tickrate ramps up it may not be true. Use 27MHz counter on i386/Geode. Use TSC on amd64 & i386 if present. Use tick counter on sparc64	2006-02-11 09:33:07 +00:00
Poul-Henning Kamp	5b1a8eb397	Modify the way we account for CPU time spent (step 1) Keep track of time spent by the cpu in various contexts in units of "cputicks" and scale to real-world microsec^H^H^H^H^H^H^H^Hclock_t only when somebody wants to inspect the numbers. For now "cputicks" are still derived from the current timecounter and therefore things should by definition remain sensible also on SMP machines. (The main reason for this first milestone commit is to verify that hypothesis.) On slower machines, the avoided multiplications to normalize timestams at every context switch, comes out as a 5-7% better score on the unixbench/context1 microbenchmark. On more modern hardware no change in performance is seen.	2006-02-07 21:22:02 +00:00
John Baldwin	934ba9b2cf	- Move the wakeup() for exiting kthreads out of exit1() and into kthread_exit() as that is cleaner and less obscured. It also does the wakeup sooner. - Add some comments to kthread_exit().	2006-02-06 21:56:13 +00:00
Wayne Salamon	de3007e8f3	Audit the pid being requested in wait4(). Obtained from: TrustedBSD Project Approved by: rwatson (mentor)	2006-02-06 00:19:09 +00:00
Robert Watson	00c28d9678	On process exit, audit the return value of the process, and commit the record immediately, as this system call never returns. Obtained from: TrustedBSD Project	2006-02-05 21:08:25 +00:00
John Baldwin	37f84a6018	Add a comment.	2006-02-03 21:09:40 +00:00
Robert Watson	fcf7f27a36	Hook up audit to fork() and exit() events. These changes manage the audit state on processes, not auditing of these events. Much work by: wsalamon Obtained from: TrustedBSD Project	2006-02-02 01:32:58 +00:00
Stephan Uphoff	03001f59c8	Hopefully fix the "calcru: runtime went backwards from ..." problem by keeping the resource values locked (where needed) while we use them for calculations. MFC after: 3 days	2006-01-23 19:15:13 +00:00
Poul-Henning Kamp	25f6e35a05	Regenerate sysent with new abort2 system call. Implement abort2(const char reason, int narg, void *args); Submitted by: "Wojciech A. Koszek" <dunstan@freebsd.czest.pl>	2005-12-23 11:58:42 +00:00
David Xu	d26b1a1fb9	Register itimers_event_hook as a kernel event handler, so I don't have to duplicate code to call it in exec() and exit1().	2005-12-09 05:43:26 +00:00
Robert Watson	2c255e9df6	Moderate rewrite of kernel ktrace code to attempt to generally improve reliability when tracing fast-moving processes or writing traces to slow file systems by avoiding unbounded queueuing and dropped records. Record loss was previously possible when the global pool of records become depleted as a result of record generation outstripping record commit, which occurred quickly in many common situations. These changes partially restore the 4.x model of committing ktrace records at the point of trace generation (synchronous), but maintain the 5.x deferred record commit behavior (asynchronous) for situations where entering VFS and sleeping is not possible (i.e., in the scheduler). Records are now queued per-process as opposed to globally, with processes responsible for committing records from their own context as required. - Eliminate the ktrace worker thread and global record queue, as they are no longer used. Keep the global free record list, as records are still used. - Add a per-process record queue, which will hold any asynchronously generated records, such as from context switches. This replaces the global queue as the place to submit asynchronous records to. - When a record is committed asynchronously, simply queue it to the process. - When a record is committed synchronously, first drain any pending per-process records in order to maintain ordering as best we can. Currently ordering between competing threads is provided via a global ktrace_sx, but a per-process flag or lock may be desirable in the future. - When a process returns to user space following a system call, trap, signal delivery, etc, flush any pending records. - When a process exits, flush any pending records. - Assert on process tear-down that there are no pending records. - Slightly abstract the notion of being "in ktrace", which is used to prevent the recursive generation of records, as well as generating traces for ktrace events. Future work here might look at changing the set of events marked for synchronous and asynchronous record generation, re-balancing queue depth, timeliness of commit to disk, and so on. I.e., performing a drain every (n) records. MFC after: 1 month Discussed with: jhb Requested by: Marc Olzheim <marcolz at stack dot nl>	2005-11-13 13:27:44 +00:00
Christian S.J. Peron	576068804d	Giant clean up for exit(2) -Change unconditional aquisition of Giant to only pickup Giant if the vnode for the controlling tty resides on a non-mpsafe file system. -Pickup Giant around executable vnode reference counting operations only if the executable resides on a non-mpsafe file system. -If this process is being traced, pickup Giant for trace file reference count operations only if it resides on a non-mpsafe file system. Discussed with: jhb Tested by: kris	2005-11-08 17:11:03 +00:00
David Xu	ebceaf6dc7	Add support for queueing SIGCHLD same as other UNIX systems did. For each child process whose status has been changed, a SIGCHLD instance is queued, if the signal is stilling pending, and process changed status several times, signal information is updated to reflect latest process status. If wait() returns because the status of a child process is available, pending SIGCHLD signal associated with the child process is discarded. Any other pending SIGCHLD signals remain pending. The signal information is allocated at the same time when proc structure is allocated, if process signal queue is fully filled or there is a memory shortage, it can still send the signal to process. There is a booting time tunable kern.sigqueue.queue_sigchild which can control the behavior, setting it to zero disables the SIGCHLD queueing feature, the tunable will be removed if the function is proved that it is stable enough. Tested on: i386 (SMP and UP)	2005-11-08 09:09:26 +00:00
John Baldwin	68a17869c1	Push down Giant into fdfree() and remove it from two of the callers. Other callers such as some rfork() cases weren't locking Giant anyway. Reviewed by: csjp MFC after: 1 week	2005-11-01 17:13:05 +00:00
Gleb Smirnoff	c0bc2867c1	- Fix leak of struct nlminfo on process exit. - Fix malloc type collision, that made the above problem difficult to understand. Reported by: Vladimir Sharun <sharun ukr.net>	2005-10-26 07:18:37 +00:00
David Xu	60354683d9	Make p_itimers as a pointer, so file sys/proc.h does not need to include sys/timers.h.	2005-10-23 12:19:08 +00:00
David Xu	86857b368d	Implement POSIX timers. Current only CLOCK_REALTIME and CLOCK_MONOTONIC clock are supported. I have plan to merge XSI timer ITIMER_REAL and other two CPU timers into the new code, current three slots are available for the XSI timers. The SIGEV_THREAD notification type is not supported yet because our sigevent struct lacks of two member fields: sigev_notify_function sigev_notify_attributes I have found the sigevent is used in AIO, so I won't add the two members unless the AIO code is adjusted.	2005-10-23 04:22:56 +00:00
David Xu	9104847f21	1. Change prototype of trapsignal and sendsig to use ksiginfo_t *, most changes in MD code are trivial, before this change, trapsignal and sendsig use discrete parameters, now they uses member fields of ksiginfo_t structure. For sendsig, this change allows us to pass POSIX realtime signal value to user code. 2. Remove cpu_thread_siginfo, it is no longer needed because we now always generate ksiginfo_t data and feed it to libpthread. 3. Add p_sigqueue to proc structure to hold shared signals which were blocked by all threads in the proc. 4. Add td_sigqueue to thread structure to hold all signals delivered to thread. 5. i386 and amd64 now return POSIX standard si_code, other arches will be fixed. 6. In this sigqueue implementation, pending signal set is kept as before, an extra siginfo list holds additional siginfo_t data for signals. kernel code uses psignal() still behavior as before, it won't be failed even under memory pressure, only exception is when deleting a signal, we should call sigqueue_delete to remove signal from sigqueue but not SIGDELSET. Current there is no kernel code will deliver a signal with additional data, so kernel should be as stable as before, a ksiginfo can carry more information, for example, allow signal to be delivered but throw away siginfo data if memory is not enough. SIGKILL and SIGSTOP have fast path in sigqueue_add, because they can not be caught or masked. The sigqueue() syscall allows user code to queue a signal to target process, if resource is unavailable, EAGAIN will be returned as specification said. Just before thread exits, signal queue memory will be freed by sigqueue_flush. Current, all signals are allowed to be queued, not only realtime signals. Earlier patch reviewed by: jhb, deischen Tested on: i386, amd64	2005-10-14 12:43:47 +00:00
John Baldwin	83de502d59	Add witness warnings to panic if a thread tries to exit while holding any locks. Requested by: jeff MFC after: 3 days	2005-09-02 20:20:01 +00:00
John Baldwin	ddf9c4f771	- Slightly reorder the events around the setting of PRS_ZOMBIE to be less hokie and much more readable and expand the comment to explain why it is the way that it is. - Close a race where one CPU could free the process belonging to a thread on another CPU that hasn't quite finished exiting yet but is beyond the point of setting the process state as PRS_ZOMBIE. Reported and tested by: ps (2) MFC after: 3 days	2005-07-18 20:08:14 +00:00
Stephan Uphoff	d13ec71369	Use low level constructs borrowed from interrupt threads to wait for work in proc0. Remove the TDP_WAKEPROC0 workaround.	2005-05-23 23:01:53 +00:00
David Xu	5a2f73e624	Only check signal event, single threading event shouldn't be reported.	2005-05-05 06:42:02 +00:00
David Xu	bc247e78c0	Wake up swapper process if needed. PR: kern/78474 Submitted by: Sam Lawrance <boris at brooknet dot com dot au>	2005-04-23 05:06:44 +00:00
David Xu	902c0d8297	Clear P_STATCHILD earlier to avoid unnecessary retrying.	2005-04-19 12:31:15 +00:00
David Xu	95992d56f5	Fix a race condition between kern_wait() and thread_stopped(). Problem is in kern_wait(), parent process steps through children list, once a child process is skipped, and later even if the child is stopped, parent process still sleeps in msleep(), the race happens if parent masked SIGCHLD. Submitted by : Peter Edwards peadar.edwards at gmail dot com MFC after : 4 days	2005-04-19 08:07:28 +00:00
Robert Watson	babe9a2bb3	Introduce p_canwait() and MAC Framework and MAC Policy entry points mac_check_proc_wait(), which control the ability to wait4() specific processes. This permits MAC policies to limit information flow from children that have changed label, although has to be handled carefully due to common programming expectations regarding the behavior of wait4(). The cr_seeotheruids() check in p_canwait() is #if 0'd for this reason. The mac_stub and mac_test policies are updated to reflect these new entry points. Sponsored by: SPAWAR, SPARTA Obtained from: TrustedBSD Project	2005-04-18 13:36:57 +00:00
Jeff Roberson	2b3183a8b7	- A lock is required before calling VOP_REVOKE. Our reference protects us from accessing another vnode so a naked VOP_LOCK is sufficient. Sponsored by: Isilon Systems, Inc.	2005-03-13 11:47:04 +00:00
Poul-Henning Kamp	a258707313	In 1.276 of kern/subr_trap.c I introduced a mechanism for delaying a process return to userspace if it had pending GEOM events. We need to have the same check in the exit pass to catch the case where a GEOM related filedescriptor is not explicitly closed by the process. Bumped into by: people using dd(1) to build releases, nanobsd etc.	2005-01-29 14:03:41 +00:00
Robert Watson	fd544ee8f7	In kern_wait(), let the compiler copy the rusage structure rather than an explicit bcopy() -- it probably does a better job.	2005-01-08 04:17:48 +00:00
Warner Losh	9454b2d864	/* -> /*- for copyright notices, minor format tweaks as necessary	2005-01-06 23:35:40 +00:00
John Baldwin	c88379381b	- Move the function prototypes for kern_setrlimit() and kern_wait() to sys/syscallsubr.h where all the other kern_foo() prototypes live. - Resort kern_execve() while I'm there.	2005-01-05 22:19:44 +00:00
David Schultz	6004362e66	Don't include sys/user.h merely for its side-effect of recursively including other headers.	2004-11-27 06:51:39 +00:00
David Xu	c283653201	Remove P_STOPPED_TRACE bit if debugger dies without a chance to detach debugged process.	2004-10-23 11:20:26 +00:00
John Baldwin	78c85e8dfc	Rework how we store process times in the kernel such that we always store the raw values including for child process statistics and only compute the system and user timevals on demand. - Fix the various kern_wait() syscall wrappers to only pass in a rusage pointer if they are going to use the result. - Add a kern_getrusage() function for the ABI syscalls to use so that they don't have to play stackgap games to call getrusage(). - Fix the svr4_sys_times() syscall to just call calcru() to calculate the times it needs rather than calling getrusage() twice with associated stackgap, etc. - Add a new rusage_ext structure to store raw time stats such as tick counts for user, system, and interrupt time as well as a bintime of the total runtime. A new p_rux field in struct proc replaces the same inline fields from struct proc (i.e. p_[isu]ticks, p_[isu]u, and p_runtime). A new p_crux field in struct proc contains the "raw" child time usage statistics. ruadd() has been changed to handle adding the associated rusage_ext structures as well as the values in rusage. Effectively, the values in rusage_ext replace the ru_utime and ru_stime values in struct rusage. These two fields in struct rusage are no longer used in the kernel. - calcru() has been split into a static worker function calcru1() that calculates appropriate timevals for user and system time as well as updating the rux_[isu]u fields of a passed in rusage_ext structure. calcru() uses a copy of the process' p_rux structure to compute the timevals after updating the runtime appropriately if any of the threads in that process are currently executing. It also now only locks sched_lock internally while doing the rux_runtime fixup. calcru() now only requires the caller to hold the proc lock and calcru1() only requires the proc lock internally. calcru() also no longer allows callers to ask for an interrupt timeval since none of them actually did. - calcru() now correctly handles threads executing on other CPUs. - A new calccru() function computes the child system and user timevals by calling calcru1() on p_crux. Note that this means that any code that wants child times must now call this function rather than reading from p_cru directly. This function also requires the proc lock. - This finishes the locking for rusage and friends so some of the Giant locks in exit1() and kern_wait() are now gone. - The locking in ttyinfo() has been tweaked so that a shared lock of the proctree lock is used to protect the process group rather than the process group lock. By holding this lock until the end of the function we now ensure that the process/thread that we pick to dump info about will no longer vanish while we are trying to output its info to the console. Submitted by: bde (mostly) MFC after: 1 month	2004-10-05 18:51:11 +00:00
John Baldwin	a9a64385e7	Some more whitespace, style, and comment fixes. Submitted by: bde (mostly)	2004-09-24 20:27:04 +00:00
John Baldwin	6111dcd2ef	A modest collection of various and sundry style, spelling, and whitespace fixes. Submitted by: bde (mostly)	2004-09-24 00:38:15 +00:00
John Baldwin	7eaec467d8	Various small style fixes.	2004-09-22 15:24:33 +00:00
Julian Elischer	ed062c8d66	Refactor a bunch of scheduler code to give basically the same behaviour but with slightly cleaned up interfaces. The KSE structure has become the same as the "per thread scheduler private data" structure. In order to not make the diffs too great one is #defined as the other at this time. The KSE (or td_sched) structure is now allocated per thread and has no allocation code of its own. Concurrency for a KSEGRP is now kept track of via a simple pair of counters rather than using KSE structures as tokens. Since the KSE structure is different in each scheduler, kern_switch.c is now included at the end of each scheduler. Nothing outside the scheduler knows the contents of the KSE (aka td_sched) structure. The fields in the ksegrp structure that are to do with the scheduler's queueing mechanisms are now moved to the kg_sched structure. (per ksegrp scheduler private data structure). In other words how the scheduler queues and keeps track of threads is no-one's business except the scheduler's. This should allow people to write experimental schedulers with completely different internal structuring. A scheduler call sched_set_concurrency(kg, N) has been added that notifies teh scheduler that no more than N threads from that ksegrp should be allowed to be on concurrently scheduled. This is also used to enforce 'fainess' at this time so that a ksegrp with 10000 threads can not swamp a the run queue and force out a process with 1 thread, since the current code will not set the concurrency above NCPU, and both schedulers will not allow more than that many onto the system run queue at a time. Each scheduler should eventualy develop their own methods to do this now that they are effectively separated. Rejig libthr's kernel interface to follow the same code paths as linkse for scope system threads. This has slightly hurt libthr's performance but I will work to recover as much of it as I can. Thread exit code has been cleaned up greatly. exit and exec code now transitions a process back to 'standard non-threaded mode' before taking the next step. Reviewed by: scottl, peter MFC after: 1 week	2004-09-05 02:09:54 +00:00
John-Mark Gurney	ad3b9257c2	Add locking to the kqueue subsystem. This also makes the kqueue subsystem a more complete subsystem, and removes the knowlege of how things are implemented from the drivers. Include locking around filter ops, so a module like aio will know when not to be unloaded if there are outstanding knotes using it's filter ops. Currently, it uses the MTX_DUPOK even though it is not always safe to aquire duplicate locks. Witness currently doesn't support the ability to discover if a dup lock is ok (in some cases). Reviewed by: green, rwatson (both earlier versions)	2004-08-15 06:24:42 +00:00
Alan Cox	9be60284a6	Giant is no longer required by vm_waitproc() and vmspace_exitfree(). Eliminate it acquisition and release around vm_waitproc() in kern_wait().	2004-07-30 20:31:02 +00:00
Alan Cox	1a276a3f91	- Use atomic ops for updating the vmspace's refcnt and exitingcnt. - Push down Giant into shmexit(). (Giant is acquired only if the vmspace contains shm segments.) - Eliminate the acquisition of Giant from proc_rwmem(). - Reduce the scope of Giant in exit1(), uncovering the destruction of the address space.	2004-07-27 03:53:41 +00:00
Julian Elischer	55d44f79ea	When calling scheduler entrypoints for creating new threads and processes, specify "us" as the thread not the process/ksegrp/kse. You can always find the others from the thread but the converse is not true. Theorotically this would lead to runtime being allocated to the wrong entity in some cases though it is not clear how often this actually happenned. (would only affect threaded processes and would probably be pretty benign, but it WAS a bug..) Reviewed by: peter	2004-07-18 23:36:13 +00:00
David Xu	cbf4e354ec	Add code to support debugging threaded process. 1. Add tm_lwpid into kse_thr_mailbox to indicate which kernel thread current user thread is running on. Add tm_dflags into kse_thr_mailbox, the flags is written by debugger, it tells UTS and kernel what should be done when the process is being debugged, current, there two flags TMDF_SSTEP and TMDF_DONOTRUNUSER. TMDF_SSTEP is used to tell kernel to turn on single stepping, or turn off if it is not set. TMDF_DONOTRUNUSER is used to tell kernel to schedule upcall whenever possible, to UTS, it means do not run the user thread until debugger clears it, this behaviour is necessary because gdb wants to resume only one thread when the thread's pc is at a breakpoint, and thread needs to go forward, in order to avoid other threads sneak pass the breakpoints, it needs to remove breakpoint, only wants one thread to go. Also, add km_lwp to kse_mailbox, the lwp id is copied to kse_thr_mailbox at context switch time when process is not being debugged, so when process is attached, debugger can map kernel thread to user thread. 2. Add p_xthread to proc strcuture and td_xsig to thread structure. p_xthread is used by a thread when it wants to report event to debugger, every thread can set the pointer, especially, when it is used in ptracestop, it is the last thread reporting event will win the race. Every thread has a td_xsig to exchange signal with debugger, thread uses TDF_XSIG flag to indicate it is reporting signal to debugger, if the flag is not cleared, thread will keep retrying until it is cleared by debugger, p_xthread may be used by debugger to indicate CURRENT thread. The p_xstat is still in proc structure to keep wait() to work, in future, we may just use td_xsig. 3. Add TDF_DBSUSPEND flag, the flag is used by debugger to suspend a thread. When process stops, debugger can set the flag for thread, thread will check the flag in thread_suspend_check, enters a loop, unless it is cleared by debugger, process is detached or process is existing. The flag is also checked in ptracestop, so debugger can temporarily suspend a thread even if the thread wants to exchange signal. 4. Current, in ptrace, we always resume all threads, but if a thread has already a TDF_DBSUSPEND flag set by debugger, it won't run. Encouraged by: marcel, julian, deischen	2004-07-13 07:20:10 +00:00
Alan Cox	ce8da3091f	Push down the acquisition and release of the page queues lock into pmap_remove_pages(). (The implementation of pmap_remove_pages() is optional. If pmap_remove_pages() is unimplemented, the acquisition and release of the page queues lock is unnecessary.) Remove spl calls from the alpha, arm, and ia64 pmap_remove_pages().	2004-07-13 02:49:22 +00:00

1 2 3 4 5 ...

286 Commits