Use __libc_interposing_slot() in favor of __libsys_interposing_slot() so
that the interposing interface is entierly between libc and libthr with
libsys only involved as an implementation detail.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44880
The (optional) third argument of fcntl is sometimes a pointer so change
the type to intptr_t. Update the libc-internal defintion (actually used
by libthr) to take a fixed intptr_t argument rather than pretending it's
a variadic function. (That worked because all supported architectures
pass variadic arguments as though the function was declared with those
types. In CheriBSD that changes because variadic arguments are passed
via a bounded array.)
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44381
The function was renamed to _thr_cond_timedwait in commit 0ab1bfc7b2
and for some reason did not get the same __weak_reference treatment as
other _pthread_cond symbols.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D44244
To allow gcc -m32 to work, link libc and libthr with --rpath-/usr/lib32.
When called with -m32, gcc is currently unable to communicate to
the bfd linker that it should look in /usr/lib32 to resolve needed (as
opposed to explicitly linked) libraries so we need to provide a hint.
See also: https://sourceware.org/bugzilla/show_bug.cgi?id=31395
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D43910
Continue to filter the public interface (elf_aux_info()), but entierly
relocate the private interfaces (_elf_aux_info(),
__init_elf_aux_vector(), and __elf_aux_vector) to libsys.
This ensures that rtld updates the correct (only) copy of
__elf_aux_vector. After 968a18975a
updates were confused and __getosreldate was failing, causing
the system to fall back to compat compat12 syscalls in some cases.
Return to explicitly linking libc to libsys and link libthr with libc
and libsys (in that order).
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D43910
The allows gcc + GNU ld to link programs with -m32 -pthread without
erroring out due to _umtx_op_err being undefined (unless -lsys is added
to the link command.
We now always link _umtx_op_err into libthr (not just when it's static)
and filter it with libsys so we call that implementation. The dynamic
implementations (at least the assembly ones) should likely become stubs
as a further refinement.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D43783
Declare in sys/umtx.h and implement in libsys. Explicitly link libthr
with libsys.
When building libthr static include _umtx_op_err so we don't break static
linkage with -lpthread.
Reviewed by: kib, emaste, imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/908
System calls or their wrappers are now interposed by
__libsys_interposing with purely libc entries remaining in
__libc_interposing.
Use __libsys_interposing_slot in libthr to update __libsys_interposing,
but also make __libc_interposing_slot fall back to
__libsys_interposing_slot so an out of date libc has a chance of working
during updates.
Reviewed by: kib, emaste, imp
Pull Request: https://github.com/freebsd/freebsd-src/pull/908
Otherwise the lock upgrade performed by rtld's load_filtees() can result
in infinite recursion, wherein:
1. _rtld_bind() acquires the bind read lock,
2. the source DSO's filtees haven't been loaded yet, so the lock upgrade
in load_filtees() cause rtld to jump to _rtld_bind() and release the
bind lock,
3. _thr_rtld_lock_release() calls _thr_ast(), which calls thr_wake(),
which hasn't been resolved yet,
4. _rtld_bind() acquires the bind read lock in order to resolve
thr_wake(),
5. ...
See the linked pull request for an instance of this problem arising with
libsys. That particular instance is also worked around by commit
e7951d0b04.
Reported by: brooks
Reviewed by: kib
Pull Request: https://github.com/freebsd/freebsd-src/pull/908
MFC after: 1 week
Sponsored by: Innovate UK
Similarly as in the previous commit, using calloc() instead of malloc()
is useless here in the regular case since the subsequent call to
cpuset_getaffinify() is going to completely fill the allocated memory.
However, there is an additional complication. This function tries to
allocate memory to hold the cpuset if it previously wasn't, and does so
before the thread lock is acquired, which can fail on a bad thread ID.
In this case, it is necessary to deallocate the memory allocated in this
function so that the attributes object appears unmodified to the caller
when an error is returned. Without this, a subsequent call to
pthread_attr_getaffinity_np() would expose uninitialized memory (not
a security problem per se, since it comes from the same process) instead
of returning a full mask as it would before the failing call to
pthread_attr_get_np(). So the caller would be able to notice a change
in the state of the attributes object even if pthread_attr_get_np()
reported failure, which would be quite surprising. A similar problem
that could occur on failure of cpuset_setaffinity() has been fixed.
Finally, we shall always report memory allocation failure. This already
goes for pthread_attr_init(), so, if for nothing else, just be
consistent.
Reviewed by: emaste, kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43329
Using calloc() instead of malloc() is useless here since the allocated
memory is to be wholly crushed by the memcpy() call that follows.
Suggested by: kib
Reviewed by: emaste, kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D43328
The change of argument for sizeof() (from a type to an object) is to be
consistent with the change done for the malloc() code just above in the
preceding commit touching this file.
Consider bit flags as integers and test whether they are set with an
explicit comparison with 0.
Use an explicit flag value (PTHREAD_SCOPE_SYSTEM) in place of a variable
that has this value at point of substitution.
All other changes are straightforward.
Suggested by: kib
Reviewed by: kib
Approved by: emaste (mentor)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D43327
On first read, POSIX may seem ambiguous about the return code for some
scheduling-related pthread functions on invalid arguments. But a more
thorough reading and a bit of standards archeology strongly suggests
that this case should be handled by EINVAL and that ENOTSUP is reserved
for implementations providing only part of the functionality required by
the POSIX option POSIX_PRIORITY_SCHEDULING (e.g., if an implementation
doesn't support SCHED_FIFO, it should return ENOTSUP on a call to, e.g.,
sched_setscheduler() with 'policy' SCHED_FIFO).
This reading is supported by the second sentence of the very definition
of ENOTSUP, as worded in CAE/XSI Issue 5 and POSIX Issue 6: "The
implementation does not support this feature of the Realtime Feature
Group.", and the fact that an additional ENOTSUP case was added to
pthread_setschedparam() in Issue 6, which introduces SCHED_SPORADIC,
saying that pthread_setschedparam() may return it when attempting to
dynamically switch to SCHED_SPORADIC on systems that doesn't support
that.
glibc, illumos and NetBSD also support that reading by always returning
EINVAL, and OpenBSD as well, since it always returns EINVAL but the
corresponding code has a comment suggesting returning ENOTSUP for
SCHED_FIFO and SCHED_RR, which it effectively doesn't support.
Additionally, always returning EINVAL fixes inconsistencies where EINVAL
would be returned on some out-of-range values and ENOTSUP on others.
Reviewed by: markj
Approved by: markj (mentor)
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D43006
Apply the following automated changes to try to eliminate
no-longer-needed sys/cdefs.h includes as well as now-empty
blank lines in a row.
Remove /^#if.*\n#endif.*\n#include\s+<sys/cdefs.h>.*\n/
Remove /\n+#include\s+<sys/cdefs.h>.*\n+#if.*\n#endif.*\n+/
Remove /\n+#if.*\n#endif.*\n+/
Remove /^#if.*\n#endif.*\n/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/types.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/param.h>/
Remove /\n+#include\s+<sys/cdefs.h>\n#include\s+<sys/capsicum.h>/
Sponsored by: Netflix
pthread_getname_np needs to be provided by libc in order to import
jemalloc 5.3.0.
A stub implementation for libc pthread_getname_np() is added for
_pthread_stubs.c, which always reports empty name for the main thread.
Internal _pthread_getname_np() is not exported, but provided for libc
own use.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D41461
The acquisition and release of an uncontended default/normal pthread
mutex on FreeBSD is suprisingly slow, e.g., pthread wrlocks and binary
semaphores both exhibit roughly 33% lower latency, while default/normal
mutexes on Linux exhibit roughly 67% lower latency than FreeBSD. This is
likely explained by the fact that AFAICT in the best case to acquire an
uncontended mutex on Linux one need touch only 1 page and read+modify
only 1 cacheline, whereas on FreeBSD we need to touch at least 4 pages,
read 6 cachelines, and modify at least 4 cachelines.
This patch does not address the pthread mutex architecture. Instead,
it improves performance by adding the __always_inline attribute to
mutex_lock_common() and mutex_unlock_common() to encourage constant
folding and propagation, thereby lowering the latency to acquire and
release a mutex due to a shorter code path with fewer compares, jumps,
and mispredicts.
With this patch on a stock build I see a reduction in latency of roughly
7% for default/normal mutexes, and 17% for robust mutexes. When built
without PTHREADS_ASSERTIONS enabled I see a reduction in latency of
roughly 15% and 26%, respectively. Suprisingly, I see similar reductions
in latency for heavily contended mutexes.
By default, this patch increases the size of libthr.so.3 by 2448 bytes,
but when built without PTHREAD_ASSERTIONS enabled it only increases by
448 bytes.
Reviewed by: jhb (previous version), kib
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40912
This patch fixes a bug which prevents building libthr without
_PTHREADS_INVARIANTS defined. The default remains to build libthr
with -D_PTHREADS_INVARIANTS. However, with this patch, if one builds
libthr with WITHOUT_PTHREADS_ASSERTIONS=true then the latency to
acquire+release a default pthread mutex is reduced by roughly 5%, and a
robust mutex by roughly 18% (as measured by a simple synthetic test on a
Xeon E5-2697a based machine).
Reviewed by: jhb, kib, mjg
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40900
Reduces severe performance degradation due to false-sharing. Note that this
does not account for hardware which can perform adjacent cacheline prefetch.
[mjg: massaged the commit message and the patch to use aligned_alloc
instead of malloc]
PR: 272238
MFC after: 1 week
Since there is only the current thread in the child, no pending readers
exist. Clear the bit, since it confuses future attempts to acquire
write ownership of the rtld locks, due to URWLOCK_PREFER_READERS flag.
To be future-proof, clear all state about pending writers and readers.
PR: 271490
Reported and tested by: KJ Tsanaktsidis <kj@kjtsanaktsidis.id.au>
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D40178
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix
When __thr_pshared_offpage() is called for allocation, it must not use
the cached offpage for the key. Instead, the cached offpage must be
unmapped and removed from the cache, if any.
It is legitimate for the user code to unmap the shared lock object without
destroying it, and then mapping something over the freed VA to carry
another shared lock. In this case the cached offpage must be un-cached.
PR: 269277
Reported by: rau8344@gmail.com
Reviewed by: markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D38345
Since f35093f8 semantics of a thread affinity functions is changed to be a
compatible with Linux:
In case of getaffinity(), the minimum cpuset_t size that the kernel permits is
the maximum CPU id, present in the system, / NBBY bytes, the maximum size is not
limited.
In case of setaffinity(), the kernel does not limit the size of the user-provided
cpuset_t, internally using only the meaningful part of the set, where the upper
bound is the maximum CPU id, present in the system, no larger than the size of
the kernel cpuset_t.
To match pthread_attr_[g|s]etaffinity_np checks of the user-provided cpusets to
the kernel behavior export the minimum cpuset_t size allowed by running kernel
via new sysctl kern.sched.cpusetsizemin and use it in checks.
Reviewed by:
Differential Revision: https://reviews.freebsd.org/D38112
MFC after: 1 week
In libthr we use PAGE_SIZE when allocating memory with mmap and to check
various structs will fit into a single page so we can use this allocator
for them.
Ask the kernel for the page size on init for use by the page allcator
and add a new machine dependent macro to hold the smallest page size
the architecture supports to check the structure is small enough.
This allows us to use the same libthr on arm64 with either 4k or 16k
pages.
Reviewed by: kib, markj, imp
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34984
Rather than calling getpagesize() twice use the value saved after the
first call to size a mmap allocation.
Reviewed by: kib, markj
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34983
When a thread exits, _Unwind_ForcedUnwind() is used to walk up stack
frames executing pending cleanups pushed by pthread_cleanup_push().
The cleanups are popped by thread_unwind_stop() which is passed as a
callback function to _Unwind_ForcedUnwind().
LLVM's libunwind uses a different function type for the callback on
32-bit ARM relative to all other platforms. The previous unwind.h
header (as well as the unwind.h from libcxxrt) use the non-ARM type on
all platforms, so this has likely been broken on 32-bit arm since it
switched to using LLVM's libunwind.
For now, just disable stack unwinding on 32-bit arm to unbreak the
build until a proper fix is tested.
Install headers from LLVM's libunwind in place of the headers from
libcxxrt and allow C applications to use the library.
As part of this, remove include/unwind.h and switch libthr over to
using the installed unwind.h.
Reviewed by: dim, emaste
MFC after: 10 days
Differential Revision: https://reviews.freebsd.org/D34065
This matches the type in other unwind headers (LLVM libunwind,
libcxxrt, glibc).
NB: include/unwind.h is not installed but is only used by libthr
Reviewed by: imp, dim, emaste
Differential Revision: https://reviews.freebsd.org/D34049
This matches libc and rtld in using the alignment (TLS_TCB_ALIGN) from
machine/tls.h instead of hardcoding 16.
Reviewed by: kib
Sponsored by: The University of Cambridge, Google Inc.
Differential Revision: https://reviews.freebsd.org/D34023
The current ASLR stack gap feature will be removed, and with that the
need for this change, and the kern.stactop sysctl, is gone. Moreover,
the approach taken in this revision does not provide compatibility for
old copies of libthr.so, and the revision should have also updated
__libc_map_stacks_exec().
This reverts commit 78df56ccfc.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D33704