After the previous changes to fix requests on blocking sockets to complete
across multiple operations, an edge case exists where a request can be
cancelled after it has partially completed. POSIX doesn't appear to
dictate exactly how to handle this case, but in general I feel that
aio_cancel() should arrange to cancel any request it can, but that any
partially completed requests should return a partial completion rather
than ECANCELED. To that end, fix the socket AIO cancellation routine to
return a short read/write if a partially completed request is cancelled
rather than ECANCELED.
Sponsored by: Chelsio Communications
Always requeue an AIO job at the head of the socket buffer's queue if
sosend() or soreceive() returns EWOULDBLOCK on a blocking socket.
Previously, requests were only requeued if they returned EWOULDBLOCK
and completed no data. Now after a partial completion on a blocking
socket the request is queued and the remaining request is retried when
the socket is ready. This allows writes larger than the currently
available space on a blocking socket to fully complete. Reads on a
blocking socket that satifsy the low watermark can still return a short
read (same as read()).
In order to track previously completed data, the internal 'status'
field of the AIO job is used to store the amount of previously
computed data.
Non-blocking sockets continue to return short completions for both
reads and writes.
Add a test for a "large" AIO write on a blocking socket that writes
twice the socket buffer size to a UNIX domain socket.
Sponsored by: Chelsio Communications
Add a bit_count function, which efficiently counts the number of bits set in
a bitstring.
sys/sys/bitstring.h
tests/sys/sys/bitstring_test.c
share/man/man3/bitstring.3
Add bit_alloc
sys/kern/subr_unit.c
Use bit_count instead of a naive counting loop in check_unrhdr, used
when INVARIANTS are enabled. The userland test runs about 6x faster
in a generic build, or 8.5x faster when built for Nehalem, which has
the POPCNT instruction.
sys/sys/param.h
Bump __FreeBSD_version due to the addition of bit_alloc
UPDATING
Add a note about the ABI incompatibility of the bitstring(3)
changes, as suggested by lidl.
Suggested by: gibbs
Reviewed by: gibbs, ngie
MFC after: 9 days
X-MFC-With: 299090, 300538
Relnotes: yes
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D6255
tests/sys/kern/Makefile
Reenable a disabled compiler warning, the need for which was
eliminated by r299090.
Reviewed by: ngie
MFC after: 4 weeks
X-MFC-With: 299090
Sponsored by: Spectra Logic Corp
after r298107
Summary of changes:
- Replace all instances of FILES/TESTS with ${PACKAGE}FILES. This ensures that
namespacing is kept with FILES appropriately, and that this shouldn't need
to be repeated if the namespace changes -- only the definition of PACKAGE
needs to be changed
- Allow PACKAGE to be overridden by callers instead of forcing it to always be
`tests`. In the event we get to the point where things can be split up
enough in the base system, it would make more sense to group the tests
with the blocks they're a part of, e.g. byacc with byacc-tests, etc
- Remove PACKAGE definitions where possible, i.e. where FILES wasn't used
previously.
- Remove unnecessary TESTSPACKAGE definitions; this has been elided into
bsd.tests.mk
- Remove unnecessary BINDIRs used previously with ${PACKAGE}FILES;
${PACKAGE}FILESDIR is now automatically defined in bsd.test.mk.
- Fix installation of files under data/ subdirectories in lib/libc/tests/hash
and lib/libc/tests/net/getaddrinfo
- Remove unnecessary .include <bsd.own.mk>s (some opportunistic cleanup)
Document the proposed changes in share/examples/tests/tests/... via examples
so it's clear that ${PACKAGES}FILES is the suggested way forward in terms of
replacing FILES. share/mk/bsd.README didn't seem like the appropriate method
of communicating that info.
MFC after: never probably
X-MFC with: r298107
PR: 209114
Relnotes: yes
Tested with: buildworld, installworld, checkworld; buildworld, packageworld
Sponsored by: EMC / Isilon Storage Division
Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow
for efficient searching of set or cleared bits starting from any bit offset
within the bit string.
Performance is improved by operating on longs instead of bytes and using
ffsl() for searches within a long. ffsl() is a compiler builtin in both
clang and gcc for most architectures, converting what was a brute force
while loop search into a couple of instructions.
All of the bitstring(3) API continues to be contained in the header file.
Some of the functions are large enough that perhaps they should be uninlined
and moved to a library, but that is beyond the scope of this commit.
sys/sys/bitstring.h:
Convert the majority of the existing bit string implementation from
macros to inline functions.
Properly protect the implementation from inadvertant macro expansion
when included in a user's program by prefixing all private
macros/functions and local variables with '_'.
Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and
bit_ffc() in terms of their "at" counterparts.
Provide a kernel implementation of bit_alloc(), making the full API
usable in the kernel.
Improve code documenation.
share/man/man3/bitstring.3:
Add pre-exisiting API bit_ffc() to the synopsis.
Document new APIs.
Document the initialization state of the bit strings
allocated/declared by bit_alloc() and bit_decl().
Correct documentation for bitstr_size(). The original code comments
indicate the size is in bytes, not "elements of bitstr_t". The new
implementation follows this lead. Only hastd assumed "elements"
rather than bytes and it has been corrected.
etc/mtree/BSD.tests.dist:
tests/sys/Makefile:
tests/sys/sys/Makefile:
tests/sys/sys/bitstring.c:
Add tests for all existing and new functionality.
include/bitstring.h
Include all headers needed by sys/bitstring.h
lib/libbluetooth/bluetooth.h:
usr.sbin/bluetooth/hccontrol/le.c:
Include bitstring.h instead of sys/bitstring.h.
sbin/hastd/activemap.c:
Correct usage of bitstr_size().
sys/dev/xen/blkback/blkback.c
Use new bit_alloc.
sys/kern/subr_unit.c:
Remove hard-coded assumption that sizeof(bitstr_t) is 1. Get rid of
unrb.busy, which caches the number of bits set in unrb.map. When
INVARIANTS are disabled, nothing needs to know that information.
callapse_unr can be adapted to use bit_ffs and bit_ffc instead.
Eliminating unrb.busy saves memory, simplifies the code, and
provides a slight speedup when INVARIANTS are disabled.
sys/net/flowtable.c:
Use the new kernel implementation of bit-alloc, instead of hacking
the old libc-dependent macro.
sys/sys/param.h
Update __FreeBSD_version to indicate availability of new API
Submitted by: gibbs, asomers
Reviewed by: gibbs, ngie
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D6004
Build and install the subr_unit test program originally written by phk, and
run it with the other ATF tests.
tests/sys/kern/Makefile
* Build and install the subr_unit test as a plain test
sys/kern/subr_unit.c
* Reduce the default number of repetitions from 100 to 1, and add a
command-line parser to override it.
* Don't be so noisy by default
* Fix an include problem for the test build
Reviewed by: ngie
MFC after: 4 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D6038
- Always munmap memory regions after mmap'ing them.
- Make sure getpagesize() returns a value greater than 0 and use a
cached value instead of always calling getpagesize(3).
- Remove intermediate variable for assigning from $TMPDIR if set in the
environment to eliminate warnings about pointer conversions with "/tmp",
and to mute an invalid buffer overflow concern from Coverity
(snprintf and tacking on a NUL terminator was alleviating that concern
before).
- Remove useless self-test of psize before it's initialized.
- Check the return values of getrlimit/setrlimit.
Cosmetic changes:
- Replace a `(void*)0` with NULL.
- Do some minor whitespace clean up.
- Remove an unnecessary cast to mmap.
- Make all munmap calls use ATF_REQUIRE_MSG instead of using the:
> if (munmap(..) == -1)
> atf_tc_fail(..)
idiom. Employ the new idiom consistently when calling munmap.
CID: 1331351, 1331382-1331386, 1331513, 1331514, 1331565, 1331583, 1331694
Differential Revision: https://reviews.freebsd.org/D6012
MFC after: 2 weeks
Reported by: Coverity
Reviewed by: markj
Sponsored by: EMC / Isilon Storage Division
- close file descriptors after use.
- Always munmap memory regions after mmap'ing them.
- Make sure getpagesize() returns a value greater than 0 and use a
cached value instead of always calling getpagesize(3).
CID: 1331374-1331377, 1331653-1331662
Differential Revision: https://reviews.freebsd.org/D6011
MFC after: 2 weeks
Reported by: Coverity
Reviewed by: cem
Sponsored by: EMC / Isilon Storage Division
The older AIO code awakened all pending AIO requests on a socket
when any data arrived. This could result in AIO daemons blocking on
an empty socket buffer. These requests could not be cancelled
which led to a deadlock during process exit. This test reproduces
this case. The newer AIO code is able to cancel the pending AIO
request correctly.
Reviewed by: ngie (-ish)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D4363
The large read test uses an empty file created via mkstemp() rather than
/dev/null as character devices are subject to two different clamping
sysctls. However, I forgot to update some of the error messages after
changing to mkstemp() that were still referring to /dev/null.
root, or the geom class can't be loaded cleanly [*]
This makes sure that scenarios that are easy to hit aren't counted
as false positives with kyua test
MFC after: 1 week
PR: 208101
Sponsored by: EMC / Isilon Storage Division
First, update the return types of aio_return() and aio_waitcomplete() to
ssize_t.
POSIX requires aio_return() to return a ssize_t so that it can represent
all return values from read() and write(). aio_waitcomplete() should use
ssize_t for the same reason.
aio_return() has used ssize_t in <aio.h> since r31620 but the manpage and
system call entry were not updated. aio_waitcomplete() has always
returned int.
Note that this does not require new system call stubs as this is
effectively only an API change in how the compiler interprets the return
value.
Second, allow aio_nbytes values up to IOSIZE_MAX instead of just INT_MAX.
aio_read/write should now honor the same length limits as normal read/write.
Third, use longs instead of ints in the aio_return() and aio_waitcomplete()
system call functions so that the 64-bit size_t in the in-kernel aiocb
isn't truncated to 32-bits before being copied out to userland or
being returned.
Finally, a simple test has been added to verify the bounds checking on the
maximum read size from a file.
improve cancellation robustness.
Introduce a new file operation, fo_aio_queue, which is responsible for
queueing and completing an asynchronous I/O request for a given file.
The AIO subystem now exports library of routines to manipulate AIO
requests as well as the ability to run a handler function in the
"default" pool of AIO daemons to service a request.
A default implementation for file types which do not include an
fo_aio_queue method queues requests to the "default" pool invoking the
fo_read or fo_write methods as before.
The AIO subsystem permits file types to install a private "cancel"
routine when a request is queued to permit safe dequeueing and cleanup
of cancelled requests.
Sockets now use their own pool of AIO daemons and service per-socket
requests in FIFO order. Socket requests will not block indefinitely
permitting timely cancellation of all requests.
Due to the now-tight coupling of the AIO subsystem with file types,
the AIO subsystem is now a standard part of all kernels. The VFS_AIO
kernel option and aio.ko module are gone.
Many file types may block indefinitely in their fo_read or fo_write
callbacks resulting in a hung AIO daemon. This can result in hung
user processes (when processes attempt to cancel all outstanding
requests during exit) or a hung system. To protect against this, AIO
requests are only permitted for known "safe" files by default. AIO
requests for all file types can be enabled by setting the new
vfs.aio.enable_usafe sysctl to a non-zero value. The AIO tests have
been updated to skip operations on unsafe file types if the sysctl is
zero.
Currently, AIO requests on sockets and raw disks are considered safe
and are enabled by default. aio_mlock() is also enabled by default.
Reviewed by: cem, jilles
Discussed with: kib (earlier version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D5289
r274560 modified kqueue_register() to only test the event condition if the
corresponding knote is not disabled. However, this check takes place before
the EV_ENABLE flag is used to clear the KN_DISABLED flag on the knote, so
enabling a previously-disabled kevent would not result in a notification for
a triggered event. This change fixes the problem by testing for EV_ENABLED
before possibly checking the event condition.
This change also updates a kqueue regression test to exercise this case.
PR: 206368
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5307
before calling dd to defeat a race when writing out to the geom_gate(4)
device
MFC after: 1 month
Reported by: Jenkins
Sponsored by: EMC / Isilon Storage Division
fact that Jenkins hardcodes image sizes to 2GB with the FreeBSD_HEAD
job
This is to stop the unnecessary failure emails because we've gone
over the 2GB limit
MFC after: 1 week
X-MFC with: r295341
Sponsored by: EMC / Isilon Storage Division
For cases where these utilities aren't installed, the tests would fail today
in a non-intuitive manner on sub-testcase #3 in each of the test scripts
MFC after: 1 week
Reviewed by: markj
Sponsored by: EMC / Isilon Storage Division
ggated(8) daemon used by the tests is the instance specifically invoked by
the tests instead of one or more daemon instances running on the system
MFC after: 1 month
Sponsored by: EMC / Isilon Storage Division
dd to defeat a race when writing out to the geom_gate(4) device
This will quell the Jenkins failure emails until I come up with a better
solution
MFC after: 1 month
Reported by: Jenkins
Sponsored by: EMC / Isilon Storage Division
NOTE_CHILD and NOTE_EXIT return something in kevent.data: the parent
pid (ppid) for NOTE_CHILD and the exit status for NOTE_EXIT.
Do not let the two events be combined, since one would overwrite
the other's data.
PR: 180385
Submitted by: David A. Bright <david_a_bright@dell.com>
Reviewed by: jhb
MFC after: 1 month
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D4900
tools/regression/geom_{concat,eli,gate,mirror,nop,raid3,shsec,stripe,uzip}
in to the FreeBSD test suite as
tests/sys/geom/class/{concat,eli,gate,mirror,nop,raid3,shsec,stripe,uzip}
The tools/regression/geom and tools/regression/geom_part testcases are being
left alone because both test sets are both currently broken.
The majority of this work was done on ^/user/ngie/more-tests2 . The differences
are as follows:
- tests/sys/geom/class/Makefile.inc is not present; it was
inlined into the class's Makefiles for explicitness.
- The testcases officially require root via kyua
- The geom_gate(4) tests don't use the pidfile changes proposed in
https://reviews.freebsd.org/D4836 .
MFC after: 1 month
Sponsored by: EMC / Isilon Storage Division
suite as tests/sys/kern/unix_passfd_test
- Convert testcases to ATF
- Fix an alignment issues
- Mark rights_creds_payload(..) as an expected failure (see PR # 181741)
Based [in part] on the following Differential Revision:
https://reviews.freebsd.org/D689
MFC after: 1 week
Submitted by: markj
Sponsored by: EMC / Isilon Storage Division
Add two new LWPINFO flags: PL_FLAG_BORN and PL_FLAG_EXITED for reporting
thread creation and destruction. Newly created threads will stop to report
PL_FLAG_BORN before returning to userland and exiting threads will stop to
report PL_FLAG_EXIT before exiting completely. Both of these events are
only enabled and reported if PT_LWP_EVENTS is enabled on a process.
- Add #ifdef TEST_SEQ_PACKET_SOURCE_ADDRESS` for untestable code
because FreeBSD doesn't have a means to map source addresses for
SEQ_PACKET AF_UNIX sockets (paraphrased). Put pathname variable
under the #ifdef to mute another unused but set variable warning
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
sane
- Push the kqueue(2) initialization down so the errno will correspond with
the failure instead of potentially being stomped on by functions called
by `PLAIN_REQUIRE_KERNEL_MODULE`
- Delete trailing whitespace
- Add spaces between braces for conditional and control blocks (for/if)
- Use err/errx instead of perror+printf+exit/printf+exit.
- Remove braces for single-line conditionals
Tested with and without -DDEBUG
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
-Wunused-but-set-variable warnings reported by gcc 4.9
Remove some trailing whitespace as well
Tested with and without -DDEBUG
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division