Fix for hexdump -s not being able to skip files residing in
pseudo-filesystems that advertise a zero size value.
Historically, many pseudofs-based filesystems (e.g., procfs) report
a va_size of 0 for numerous files classified as regular files.
Typically, the contents of these files are generated on demand
from kernel data as sbuf(9) strings at the time they are read.
Accurately reporting the size of these files is challenging, as it
often involves generating their contents. These pseudofs implementations
frequently report the size as 0. This is a historical behavior and also
aligns with Linux behavior. To maintain compatibility, we have chosen
to preserve the existing behavior and address it in the userland
application, rather than modifying it in the kernel (by updating the
correct value for va_size).
PR: bin/276106
MFC after: 1 week
explicit_loader_fd should have been initialized to -1, not 0, but my
last round of testing was only with -l...
Fixes: bf7c4fcbbb ("bhyveload: hold /boot and do relative [...]")
Pointy hat: kevans
The next change will push bhyveload into capability mode right after we
allocate vcpu state, before we've setup or entered the loader, to limit
the surface area that a rogue loader script can touch.
With an explicit -l loader, we don't need to preopen /boot because
changing interpreters isn't allowed. We'll just dlopen() entirely in
advance in that case to eliminate some complexity.
Reviewed by: allanjude (earlier version), markj
Differential Revision: https://reviews.freebsd.org/D43285
Don't allow lookups from the loader scripts, which in rare cases may be
in guest control depending on the setup, to leave the specified host
root. Open the root dir and strictly do RESOLVE_BENEATH lookups from
there.
cb_open() has been restructured a bit to work nicely with this, using
fdopendir() in the directory case and just using the fd we already
opened in the regular file case.
hostbase_open() was split out to provide an obvious place to apply
rights(4) if that's something we care to do.
Reviewed by: allanjude (earlier version), markj
Differential Revision: https://reviews.freebsd.org/D43284
Replace the DOOMED flag with a transient DETACHING flag that is cleared
when VI is detached. This fixes VI reattach when only the VI and not
the parent nexus is detached. The old flag was never cleared and
prevented subsequent synch op's related to the VI.
PR: 275260
Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43287
Sponsored by: Chelsio Communications
We don't need make_check to work in a fmake world anymore (nor have we
in the past decade). Just remove it here.
Note in passing it's been 10 years since we've added a new test here and
maybe we're past the need for this part of the build (or need to revamp
it to include all the features added to bmake since 2016 that the build
system silently depends on).
Sponsored by: Netflix
Reviewed by: brooks
Pull Request: https://github.com/freebsd/freebsd-src/pull/980
We used to exclude a lot of extra hooks to allow for local
customizations of the build which couldn't be done outside of sys.mk,
but excluded that support for fmake. Remove those hacks.
Sponsored by: Netflix
Reviewed by: brooks
Pull Request: https://github.com/freebsd/freebsd-src/pull/980
There's no need to support fmake anymore. Always assume we can use
bmake's :tA modifier. The ports tree hasn't supported fmake in about a
decade anyway. Simplify here.
Sponsored by: Netflix
Reviewed by: brooks, emaste
Pull Request: https://github.com/freebsd/freebsd-src/pull/980
Commit 83cb5bae96 added a check for MAKE_VERSION being new enough to
handle CTFCONVERT_CMD being an empty string since fmake of the time
didn't support it until just a few commits before 83cb5bae96. Later,
it was augmented with a check for .PARSEDIR to see if bmake was
running. fmake and boostrapping from fmake haven't worked in maybe 6 or
8 years, so we can remove the check here. If you want to update from
your FreeBSD 7 or FreeBSD 8 systems, you're even more out of luck than
you were before and must jump to an older version before jumping to
current for the source upgrade path.
Sponsored by: Netflix
Reviewed by: brooks, emaste
Pull Request: https://github.com/freebsd/freebsd-src/pull/980
fmake has been out of the tree for 10 years / 5 major releases now. The
need to bootstrap from it has been gone for at least 6 if not 8
years. While we may still need to bootstrap bmake, we don't need to do
it from fmake, so only retail the infrastructure to update from bmake to
bmake. Retain, for now, the WANT_MAKE_VERSION stuff, though we're always
up to date when building from supported and quasi-supported platforms.
Also remove all the checks to see if .PARSEDIR is defined. It is always
defined and was an early, fail-safe way to tell fmake from bmake during
the transition.
Adjust comments that refer to old fmake and remove those no longer
relevant.
Sponsored by: Netflix
Reviewed by: brooks
Pull Request: https://github.com/freebsd/freebsd-src/pull/980
Explicit Congestion Notification (ECN) is a mechanism that allows
end-to-end notification of network congestion without dropping packets
by explicitly setting the ECN code point (2 bits).
Per RFC 8087, section 3.5, network devices should not be configured to
change the ECN code point in the packets that they forward, except to
set the CE (Congestion Experienced) code point ('11') to signal
incipient congestion.
The current commit adds an -E flag to traceroute that crafts a packet
with an ECT(1) code point ('01').
If the packet is received back with a zero ECN code point ('00'), it
outputs that the hop in question erases or "bleaches" the ECN code point
values. Bleaching may occur for various reasons (including normalizing
packets to hide which equipment supports ECN). This policy prevents the
use of ECN by applications.
If the packet is received back with an all-ones ECN code point ('11'),
it outputs that the hop in question is experiencing "congestion".
If the packet is received back with a different ECN code point ('10'),
it outputs that the hop in question changes or "mangles" the ECN code
point values.
If the packet is received with the same ECN code point that was sent
('01'), it outputs that the hop has "passed" the ECN bits appropriately.
Inspired by: Darwin
Reviewed by: imp, markj
MFC after: 1 month
Pull Request: https://github.com/freebsd/freebsd-src/pull/879
Explicit Congestion Notification (ECN) is a mechanism that allows
end-to-end notification of network congestion without dropping packets
by explicitly setting the ECN code point (2 bits).
Per RFC 8087, section 3.5, network devices should not be configured to
change the ECN code point in the packets that they forward, except to
set the CE (Congestion Experienced) code point ('11') to signal
incipient congestion.
The current commit adds an -E flag to traceroute6 that crafts a packet
with an ECT(1) code point ('01').
If the packet is received back with a zero ECN code point ('00'), it
outputs that the hop in question erases or "bleaches" the ECN code point
values. Bleaching may occur for various reasons (including normalizing
packets to hide which equipment supports ECN). This policy prevents the
use of ECN by applications.
If the packet is received back with an all-ones ECN code point ('11'),
it outputs that the hop in question is experiencing "congestion".
If the packet is received back with a different ECN code point ('10'),
it outputs that the hop in question changes or "mangles" the ECN code
point values.
If the packet is received with the same ECN code point that was sent
('01'), it outputs that the hop has "passed" the ECN bits appropriately.
Inspired by: Darwin
Reviewed by: imp, markj
MFC after: 1 month
Pull Request: https://github.com/freebsd/freebsd-src/pull/879
Define a mask for the code point used for ECN in the Traffic Class field
(2 bits) of an IPv6 header.
BE: 0 0 3 0 0 0 0 0
Bit: 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class | Flow Label |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |
For BE (Big Endian), or network-byte order, this corresponds to 0x00300000.
For Little Endian, it corresponds to 0x00003000.
Reviewed by: imp, markj
MFC after: 1 week
Pull Request: https://github.com/freebsd/freebsd-src/pull/879
You can't ever safely map a single page and then map a superpage sized
mapping over it with MAP_FIXED. Even in a single-threaded program, ASLR
might mean you land too close to another mapping and on CheriBSD we
don't allow the initial reservation to grow because doing so requires
program changes that are hard to automate.
To avoid this, map the entire region we want to use upfront.
Reviewed by: markj
Obtained from: CheriBSD
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D43282
You can't return 0 and not write the mode if mode_p is non-NULL. That
violates the API contract and in common usage leaves stack trash in
*mode_p.
The acl_equiv_mode_test test passed by accident.
Reviewed by: kevans, markj
Sponsored by: DARPA
Differential Revision: https://reviews.freebsd.org/D43278
This avoids a mutex reinitialization when the VI is detached and
reattached.
Fixes: 516fe911a6 cxgbe(4): Always use the per-VI callout to read interface stats.
MFC after: 1 week
Sponsored by: Chelsio Communications
Commit 2619c5ccfe ("Avoid waiting on physical allocations that can't
possibly be satisfied") changed the return value from bool to errno.
Adjust the function description to match reality.
i386 kernels without 'options PAE' will still use PAE page tables if
the CPU supports PAE both to support larger amounts of RAM and for
PG_NX permissions. However, to avoid changing the API, bus_addr_t and
related constants (e.g. BUS_SPACE_MAXADDR) are still limited to
32 bits.
To cope with this, the x86 bus_dma code included an extra check to
bounce requests for addresses above BUS_SPACE_MAXADDR. This check was
elided (probably because it looks always-true on its face and had no
comment explaining its purpose) in recent refactoring. To fix,
restore a custom address-validation function for i386 kernels without
options PAE that includes this check.
Reported by: ci.freebsd.org
Reviewed by: markj
Fixes: 3933ff56f9 busdma: tidy bus_dma_run_filter() functions
Differential Revision: https://reviews.freebsd.org/D43277
Netlink should return a very simple control data on every recvmsg(2)
syscall. This data is associated with a syscall, not with an nlmsg,
neither with internal our internal representation (nl_bufs). There is
no need to pre-allocate it in non-sleepable context and attach to
nl_buf. Allocate right in the syscall with M_WAITOK. This also
shaves lots of code and simplifies things.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42989
The previous commit conservatively mimiced operation of soreceive_generic().
The new code does two things:
- parses Netlink message headers and always returns at least one full nlmsg
- hides nl_buf boundaries from the userland, copying out several at once
More details can be found in the large comment block added.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42785
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's,
same part of struct sockbuf that is used for send buffer already.
This shaves a lot of code and a lot of extra processing. The pcb rids
of the I/O queues as the socket buffer is exactly the queue. The
message writer is simplified a lot, as we now always deal with linear
buf. Notion of different buffer types goes away as way as different
kinds of writers. The only things remaining are: a socket writer and
a group writer.
The impact on the network stack is that we no longer use mbufs, so
a workaround from d187154750 disappears.
Note on message throttling. Now the taskqueue throttling mechanism
needs to look at both socket buffers protected by their respective
locks and on flags in the pcb that are protected by the pcb lock.
There is definitely some room for optimization, but this changes tries
to preserve as much as possible.
Note on new nl_soreceive(). It emulates soreceive_generic(). It
must undergo further optimization, see large comment put in there.
Note on tests/sys/netlink/test_netlink_message_writer.py. This test
boiled down almost to nothing with mbufs removed. However, I left
it with minimal functionality (it basically checks that allocating N
bytes we get N bytes) as it is one of not so many examples of ktest
framework that allows to test KPIs with python.
Note on Linux support. It got much simplier: Netlink message writer
loses notion of Linux support lifetime, it is same regardless of
process ABI. On socket write from Linux process we perform
conversion immediately in nl_receive_message() and on an output
conversion to Linux happens in in nl_send_one(). XXX: both
conversions use M_NOWAIT allocation, which used to be the case
before this change, too.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42524
With upcoming protocol specific socket buffer for Netlink we need some
additional tests that cover basic socket operations, w/o much of actual
Netlink knowledge. Following tests are performed:
1) Overflow. If an application keeps sending messages to the kernel,
but doesn't read out the replies, then first the receive buffer shall
fill and after that further messages from applications will be queued
on the send buffer until it is filled. After that socket operations
should block. However, reading from the receive buffer some data should
wake up the taskqueue and the send buffer should start draining again.
2) Peek & trunc. Check that socket correctly reports amount of readable
data with MSG_PEEK & MSG_TRUNC. This is typical pattern of Netlink apps.
3) Sizes. Check that zero size read doesn't affect the socket, undersize
read will return one truncated message and the message is removed from
the buffer. Check that large buffer will be filled in one read, without
any boundaries imposed by internal representation of the buffer. Check
that any meaningful read is amended with control data if requested so.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42525
These functions work with a buffer embedded into nl_writer, which
is going to go opaque with upcoming changes. Make them private to
the netlink module. No functional change intended.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42523
Instead of using generic socket code, create Netlink specific socket
buffer. It is a simple TAILQ of writes that came from userland. This
saves us one memory allocation that could fail and one memory copy.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D42522
The glibc fts_open() callback type does not have the second const
qualifier and it appears that Clang 16 errors by default for mismatched
function pointer types. Add an ifdef to handle this case.
This is more intuitive than having to run `pciconf -l` and figure out
the bus/slot/func entry manually.
Reviewed by: markj
Sponsored by; The FreeBSD Foundation
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43270
Do not use the cached route if the destination isn't the same.
This fix a problem where an UDP packet will be sent via the wrong route
and interface if a previous one was sent via them.
PR: 275774
Reviewed by: glebius, tuexen
Sponsored by: Beckhoff Automation GmbH & Co. KG
The header gives an offset in 32-bit words, and the translator is
supposed to convert that to a byte count. But, the conversion was
incorrect.
Reviewed by: tuexen, rscheff
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43264
sbuf_cpy() resets the sbuf state, which is wrong for sbufs allocated by
sbuf_new_for_sysctl(). In particular, this code triggers an assertion
failure in sbuf_clear().
Simplify by just using sysctl_handle_string() for both reading and
setting the tunable.
Apply to FreeBSD directly since this bug causes "sysctl -a" to crash the
kernel.
PR: 276039
Reported by: pho
Reviewed by: mav
Pull Request: https://github.com/openzfs/zfs/pull/15719
We support C89 files, but compile everything in the tree with C99 or
newer. By compiling these -ansi, that will force C89 which doesn't
understand inline. All our header files must use __inline instead of
inline when they define inline functions.
Sponsored by: Netflix
Reviewed by: emaste
Differential Revision: https://reviews.freebsd.org/D43235
The last bump to 8G was caused by all the new symbols and debug info we
install by default. Turning those off gets us back to a reasonable size:
Filesystem Size Used Avail Capacity Mounted on
/dev/md1s1a 927M 771M 83M 90% /mnt
Sponsored by: Netflix
Reviewed by: jlduran@gmail.com
Differential Revision: https://reviews.freebsd.org/D43232
When copy_file_range(2) was first being developed,
*inoffp + len had to be <= infile_size or an error was
returned. This semantic (as defined by Linux) changed
to allow *inoffp + len to be greater than infile_size and
the copy would end at *inoffp + infile_size.
Unfortunately, the code that decided if the outfd should
be truncated in length did not get updated for this
semantics change.
As such, if a copy_file_range(2) is done, where infile_size - *inoffp
is less that outfile_size but len is large, the outfd file is truncated
when it should not be. (The semantics for this for Linux is to not
truncate outfd in this case.)
This patch fixes the problem. I believe the calculation is safe
for all non-negative values of outsize, *outoffp, *inoffp and insize,
which should be ok, since they are all guaranteed to be non-negative.
Note that this bug is not observed over NFSv4.2, since it truncates
len to infile_size - *inoffp.
PR: 276045
Reviewed by: asomers, kib
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D43258