Remove some uses of PHOLD which were there only to prevent the process'
threads from being swapped out.
Tested by: pho
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D46118
- Modify PHOLD() to no longer fault in the process.
- Remove _PHOLD_LITE(), which is now the same as _PHOLD(), fix up
consumers.
- Remove faultin() and its callees.
Tested by: pho
Reviewed by: imp, kib
Differential Revision: https://reviews.freebsd.org/D46114
Add sys/errno.h, sys/malloc.h, sys/queue.h, and vm/uma.h as needed.
sys/sysproto.h currently includes sys/acl.h which currently includes
sys/param.h, sys/queue.h, and vm/uma.h which in turn bring in
sys/errno.h sys/malloc.h.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44465
An initial reading of the preamble of sys_procctl() gives the impression
that no test prevents a malicious user from passing a negative commands
index (in 'uap->com'), which is soon used as an index into the static
array procctl_cmds_info[].
However, a closer examination leads to the conclusion that the existing
code is technically correct. Indeed, the comparison of 'uap->com' to
the nitems() expression, which expands to a ratio of sizeof(), leads to
a conversion of 'uap->com' to an 'unsigned int' as per Usual Arithmetic
Conversions/Integer Promotions applied by '<=', because sizeof() returns
'size_t' values, and we define 'size_t' as an equivalent of 'unsigned
int' (which is not mandated by the standard, the latter allowing, e.g.,
integers of lower ranks).
With this conversion, negative values of 'uap->com' are automatically
ruled-out since they are converted to very big unsigned integers which
are caught by the test. An analysis of assembly code produced by LLVM
16 on amd64 and practical tests confirm that no exploitation is possible.
However, the guard code as written is misleading to readers and might
trip up static analysis tools. Make sure that negative values are
explicitly excluded so that it is immediately clear that EINVAL will be
returned in this case.
Build tested with clang 16 and GCC 12.
Approved by: markj (mentor)
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Report the delivery of signals to processes other than self while
Capsicum violation tracing with CAPFAIL_SIGNAL.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D40679
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
Nothing prevents the signalled process from exiting, and then other
process among eligible for signalling to reuse the exited process pid.
In this case, presence of the pid in the 'pids' unr set prevents it from
getting the deserved signal.
Handle it by marking each process with the new flag P2_REAPKILLED when
we are about to send the signal. If the process pid is present in the
pids unr, but the struct proc is not marked with P2_REAPKILLED, we must
send signal to the pid again.
The use of the flag relies on the global sapblk preventing parallel
reapkills.
The pids unr must be used to clear the flags to all signalled processes.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40089
For a process supervisor using the reaper API to track process subtrees,
it is very useful to know the state of the processes on the list.
Sponsored by: https://www.patreon.com/valpackett
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D39585
It does not serve any purpose after we stopped doing
thread_single(SINGLE_ALLPROC) from stoppable user processes.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36207
There is a problem still left after the fixes to REAP_KILL_PROC. The
handling of the stopping signals by sig_suspend_threads() can occur
outside the stopping process context by tdsendsignal(), and it uses
mostly the same mechanism of aborting sleeps as suspension. In other
words, it badly interacts with thread_single(SINGLE_ALLPROC).
But unlike single threading from the process context, we cannot wait by
sleep for other single threading requests to pass, because we own
spinlock(s).
Fix this by moving both the thread_single(p2, SINGLE_ALLPROC), and the
signalling, to the threaded taskqueue which cannot be single-threaded
itself.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D36207
This is racy because curproc process lock is not used, but allows the
process to exit faster. It is userspace issue to create such race
anyway, and not fullfilling the guarantee that all reaper descendants
are signalled should be fine.
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310
We drop proctree_lock, which allows the process to exit while memoized
in the list to proceed.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310
Recorded reaper might loose its reaper status, so we should not assert
it, but check and avoid signalling if this happens.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 week
Differential revision: https://reviews.freebsd.org/D35310
The failure means that the process does single-threading itself, which
makes our action not needed.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310
If we try to single-thread a process which thread entered
procctl(REAP_KILL_SUBTREE), and sleeping waiting for us unlocking
stop_all_proc_blocker, we must be able to finish single-threading. This
requires the sleep to be interruptible.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310
Since both self single-threading and remote single-threading rely on
suspending the thread doing thread_single(), it cannot be mixed: thread
doing thread_suspend_switch() might be subject to thread_suspend_one()
and vice versa.
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310
before the process itself does thread_single(SINGLE_EXIT). We cannot
single-thread such process in ALLPROC (external) mode, and properly
detect and report the failure to do so due to the process becoming
zombie is easier to prevent than handle.
In collaboration with: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D35310
where we might not yet see a new child when signalling a process.
Ensure that this cannot happen by stopping all reapping subtree,
which ensures that the child is not inside a syscall, in particular
fork(2).
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35014
by repeating iteration over the subtree until there are no new processes
to signal.
Reported and tested by: pho
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35014
stop_allo_proc_block() must be taken before proctree_lock.
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D35014
When iterating over the process group members, skip zombies same as it
is done by pfind() for single-process operation.
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32513
for state control over TRACE, TRAPCAP, ASLR, PROTMAX, STACKGAP,
NO_NEWPRIVS, and WXMAP.
Reported by: emaste
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32513
and remove zeroing of it from specific functions. This way it is
guaranteed that we do not leak kernel data.
Suggested by: markj
Reviewed by: emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D32513
It allows to override kern.elf{32,64}.allow_wx on per-process basis.
In particular, it makes it possible to run binaries without PT_GNU_STACK
and without elfctl note while allow_wx = 0.
Reviewed by: brooks, emaste, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D31779
This introduces a new, per-process flag, "NO_NEW_PRIVS", which
is inherited, preserved on exec, and cannot be cleared. The flag,
when set, makes subsequent execs ignore any SUID and SGID bits,
instead executing those binaries as if they not set.
The main purpose of the flag is implementation of Linux
PROC_SET_NO_NEW_PRIVS prctl(2), and possibly also unpriviledged
chroot.
Reviewed By: kib
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D30939
Previously procctl(PROC_PROTMAX_STATUS, ... used the PROC_ASLR_NOFORCE
macro for the "system-wide configured policy" status, instead of
PROC_PROTMAX_NOFORCE.
They both have a value of 3, so no functional change.
Sponsored by: The FreeBSD Foundation
It allows a process to request that stack gap was not applied to its
stacks, retroactively. Also it is possible to control the gaps in the
process after exec.
PR: 239894
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21352
feature bit.
In particular, allocate the bit to opt-out the image from implicit
PROTMAX enablement. Provide procctl(2) verbs to set and query
implicit PROTMAX handling. The knobs mimic the same per-image flag
and per-process controls for ASLR.
Reviewed by: emaste, markj (previous version)
Discussed with: brooks
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D20795
Add the infrastructure to allow MD procctl(2) commands, and use it to
introduce amd64 PTI control and reporting. PTI mode cannot be
modified for existing pmap, the knob controls PTI of the new vmspace
created on exec.
Requested by: jhb
Reviewed by: jhb, markj (previous version)
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D19514
With this change, randomization can be enabled for all non-fixed
mappings. It means that the base address for the mapping is selected
with a guaranteed amount of entropy (bits). If the mapping was
requested to be superpage aligned, the randomization honours the
superpage attributes.
Although the value of ASLR is diminshing over time as exploit authors
work out simple ASLR bypass techniques, it elimintates the trivial
exploitation of certain vulnerabilities, at least in theory. This
implementation is relatively small and happens at the correct
architectural level. Also, it is not expected to introduce
regressions in existing cases when turned off (default for now), or
cause any significant maintaince burden.
The randomization is done on a best-effort basis - that is, the
allocator falls back to a first fit strategy if fragmentation prevents
entropy injection. It is trivial to implement a strong mode where
failure to guarantee the requested amount of entropy results in
mapping request failure, but I do not consider that to be usable.
I have not fine-tuned the amount of entropy injected right now. It is
only a quantitive change that will not change the implementation. The
current amount is controlled by aslr_pages_rnd.
To not spoil coalescing optimizations, to reduce the page table
fragmentation inherent to ASLR, and to keep the transient superpage
promotion for the malloced memory, locality clustering is implemented
for anonymous private mappings, which are automatically grouped until
fragmentation kicks in. The initial location for the anon group range
is, of course, randomized. This is controlled by vm.cluster_anon,
enabled by default.
The default mode keeps the sbrk area unpopulated by other mappings,
but this can be turned off, which gives much more breathing bits on
architectures with small address space, such as i386. This is tied
with the question of following an application's hint about the mmap(2)
base address. Testing shows that ignoring the hint does not affect the
function of common applications, but I would expect more demanding
code could break. By default sbrk is preserved and mmap hints are
satisfied, which can be changed by using the
kern.elf{32,64}.aslr.honor_sbrk sysctl.
ASLR is enabled on per-ABI basis, and currently it is only allowed on
FreeBSD native i386 and amd64 (including compat 32bit) ABIs. Support
for additional architectures will be added after further testing.
Both per-process and per-image controls are implemented:
- procctl(2) adds PROC_ASLR_CTL/PROC_ASLR_STATUS;
- NT_FREEBSD_FCTL_ASLR_DISABLE feature control note bit makes it possible
to force ASLR off for the given binary. (A tool to edit the feature
control note is in development.)
Global controls are:
- kern.elf{32,64}.aslr.enable - for non-fixed mappings done by mmap(2);
- kern.elf{32,64}.aslr.pie_enable - for PIE image activation mappings;
- kern.elf{32,64}.aslr.honor_sbrk - allow to use sbrk area for mmap(2);
- vm.cluster_anon - enables anon mapping clustering.
PR: 208580 (exp runs)
Exp-runs done by: antoine
Reviewed by: markj (previous version)
Discussed with: emaste
Tested by: pho
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D5603
-> PROC_PDEATHSIG_STATUS for consistency with other procctl(2)
operations names.
Requested by: emaste
Sponsored by: The FreeBSD Foundation
MFC after: 13 days
Allow processes to request the delivery of a signal upon death of
their parent process. Supposed consumer of the feature is PostgreSQL.
Submitted by: Thomas Munro
Reviewed by: jilles, mjg
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D15106