iflib now supports mapping each (TX,RX) queue pair to the same CPU
(default), to separate CPUs, or to a pair of physical and logical CPUs
that share the same L2 cache. The mapping mechanism supports unequal
numbers of TX and RX queues, with the excess queues always being
mapped to consecutive physical CPUs. When the platform cannot
distinguish between physical and logical CPUs, all are treated as
physical CPUs. See the comment on get_cpuid_for_queue() for the
entire matrix.
The following device-specific tunables influence the mapping process:
dev.<device>.<unit>.iflib.core_offset (existing)
dev.<device>.<unit>.iflib.separate_txrx (existing)
dev.<device>.<unit>.iflib.use_logical_cores (new)
The following new, read-only sysctls provide visibility of the mapping
results:
dev.<device>.<unit>.iflib.{t,r}xq<n>.cpu
When an iflib driver allocates TX softirqs without providing reference
RX IRQs, iflib now binds those TX softirqs to CPUs using the above
mapping mechanism (that is, treats them as if they were TX IRQs).
Previously, such bindings were left up to the grouptaskqueue code and
thus fell outside of the iflib CPU mapping strategy.
Reviewed by: kbowling
Tested by: olivier, pkelsey
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D24094
After a vnode is recycled it can no longer be
acquired via vfs_hash_get() and, as such,
a delegation for the vnode cannot be recalled.
In the unlikely event that a delegation still
exists when the vnode is being recycled, return
the delegation since it will no longer be
recallable.
Until you have this patch in your NFSv4 client,
you should consider avoiding the use of delegations.
MFC after: 2 weeks
Without this patch, if a NFSv4 server recalled a
delegation when the file is not open, the renew
thread would block in the NFS VOP_INACTIVE()
trying to acquire the client state lock that it
already holds.
This patch fixes the problem by delaying the
vrele() call until after the client state
lock is released.
This bug has been in the NFSv4 client for
a long time, but since it only affects
delegation when recalled due to another
client opening the file, it got missed
during previous testing.
Until you have this patch in your client,
you should avoid the use of delegations.
MFC after: 2 weeks
Previously it depended on sysctl, which itself has no dependencies,
so rcorder(8) had a bit too much flexibility when choosing when to run
it. Make sure it runs just between 'fsck' and 'root'.
Reviewed By: jmg, imp
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29748
Option `FIB_ALGO` gates new modular fib lookup functionality,
enabling more performant routing table lookups and improving
control plane convergence under the load.
Detailed feature description is available in D27401.
Reviewed By: olivier, gnn
Differential Revision: https://reviews.freebsd.org/D28434
Traditionally we had 2 sources of information whether the
added/delete route request targets network or a host route:
netmask (RTA_NETMASK) and RTF_HOST flag.
The former one is tricky: netmask can be empty or can explicitly
specify the host netmask. Parsing netmask sockaddr requires per-family
parsing and that's what rtsock code traditionally avoided. As a result,
consistency was not enforced and it was possible to specify network with
the RTF_HOST flag and vice versa.
Continue normalization efforts from D29826 and D29826 and ensure that
RTF_HOST flag always reflects host/network data from netmask field.
Differential Revision: https://reviews.freebsd.org/D29958
MFC after: 2 days
Most of the routing tests create per-test VNET, making
it harder to repeat the failure with CLI tools.
Provide an additional route/nexthop data on failure.
Differential Revision: https://reviews.freebsd.org/D29957
Reviewed by: kp
MFC after: 2 weeks
PIPE_MINDIRECT determines at what (blocking) write size one-copy
optimizations are applied in pipe(2) I/O. That threshold hasn't
been tuned since the 1990s when this code was originally
committed, and allowing run-time reconfiguration will make it
easier to assess whether contemporary microarchitectures would
prefer a different threshold.
(On our local RPi4 baords, the 8k default would ideally be at least
32k, but it's not clear how generalizable that observation is.)
MFC after: 3 weeks
Reviewers: jrtc27, arichardson
Differential Revision: https://reviews.freebsd.org/D29819
Previously we've returned the error from native ptrace(2), ENOMEM.
This confused Linux strace(2).
Reviewed By: emaste
Sponsored By: EPSRC
Differential Revision: https://reviews.freebsd.org/D29925
structure is zeroed, by setting the VNET after checking the mbuf count
for zero. It appears there are some cases with early interrupts on some
network devices which still trigger page-faults on accessing a NULL "ifp"
pointer before the TCP LRO control structure has been initialized.
This basically preserves the old behaviour, prior to
9ca874cf74 .
No functional change.
Reported by: rscheff@
Differential Revision: https://reviews.freebsd.org/D29564
MFC after: 2 weeks
Sponsored by: Mellanox Technologies // NVIDIA Networking
Allow new enclosure to replace previously existing one if there is
no completely unused table entry, same as it is done for devices.
If we can not process DPM due to corruption -- wipe it and restart
from scratch. Otherwise I don't see a way to recover persistence if
something go wrong and there is no BIOS to recover it for us.
Together this solves a problem that appeared when 9300-8i firmware
update to 16.00.10.00 somehow switched its mapping mode from Device
Persistence to Enclosure/Slot without wiping the DPM table. It made
HBA completely unusable, since overflowed and conflicting mapping
table was unable to map any of enclosures and so devices.
Also while there make some enclosure mapping errors more informative.
MFC after: 1 month
Sponsored by: iXsystems, Inc.
3e7bae0821 turns the BUS_READ_IVAR() failure from a warning into a
KASSERT. For certain PCI audio devices such like snd_csa(4) and
snd_emu10kx(4), the ac97_create() keeps the device handler generated
by device_add_child(pci_dev, "pcm"), which is not really a PCI device
handler. This in turn causes the subsequent pci_get_subdevice()
inside ac97_initmixer() triggering a panic.
This patch tries to put a bandaid for the aforementioned pcm device
children such that they can use the correct PCI handler(from parent)
to avoid a KASSERT panic in the INVARIANTS kernel.
Tested with: snd_csa(4), snd_ich(4), snd_emu10kx(4)
Reviewed by: imp
MFC after: 1 month
When the NFSv4.1/4.2 server does a callback to a client
on the back channel, it will use a session slot in the
back channel session. If the back channel has failed,
the callback will fail and, without this patch, the
session slot will not be released.
As more callbacks are attempted, all session slots
can become busy and then the nfsd thread gets stuck
waiting for a back channel session slot.
This patch frees the session slot upon callback
failure to avoid this problem.
Without this patch, the problem can be avoided by leaving
delegations disabled in the NFS server.
MFC after: 2 weeks
This fixes a bug in an earlier change to move tree rotation to
the end of the update where the step to make room for the new
preworld tree was deleting the old "current" tree instead of
the old "preworld" tree.
Reported by: olivier, dhw
Fixes: 0611aec3cf
MFC after: 2 weeks
This reverts a portion of 274579831b ("capsicum: Limit socket
operations in capability mode") as at least rtsol and dhcpcd rely on
being able to configure network interfaces while in capability mode.
Reported by: bapt, Greg V
Sponsored by: The FreeBSD Foundation
Changes to the LRO code have exposed a bug in iflib where devices
which are not capable of doing LRO are still calling
tcp_lro_flush_all(), even when they have not initialized the LRO
context. This used to be mostly harmless, but the LRO code now sets
the VNET based on the ifp in the lro context and will try to access it
through a NULL ifp resulting in a panic at boot.
To fix this, we unconditionally initializes LRO so that we have a
valid LRO context when calling tcp_lro_flush_all(). One alternative is
to check the device capabilities before calling tcp_lro_flush_all() or
adding a new state flag in the ctx. However, it seems unwise to add an
extra, mostly useless test for higher performance devices when we can
just initialize LRO for all devices.
Reviewed by: erj, hselasky, markj, olivier
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D29928
There are two kinds of routines in the driver that read statistics from
the hardware: the cxgbe_* variants read the per-port MPS/MAC registers
and the vi_* variants read the per-VI registers. They can be called
from the 1Hz callout or if_get_counter. All stats collection now takes
place under the callout lock and there is a new flag to indicate that
these routines should not access any hardware register.
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Found in "Understanding and Detecting Disordered Error Handling with
Precise Function Pairing" by Qiushi Wu et al.
Reviewed by: imp, jhb
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29896
While the PV SCSI SG list can handle 512k of SG entries, it can only do
so for I/O that's aligned to 4k or better. newfs_msdos does unaligned
I/O, so triggers too long for host errors in cam when a 512k I/O is
attempted. Prefer power of 2 256k to the absolute maximum 508k, though
that can be revisited should the latter show to give significant
performance improvement.
MFC After: 3 days
Tested by: darius on discord (508k version of patch)
Sponsored by: Netflix
Fixes segfault with the command `bhyve -s 0,virtio-scsi`, which is used
by some third party software to probe bhyve for virtio-scsi support.
Reviewed by: jhb
MFC after: 1 day
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D29926
Since then, the FreeBSD USB stack has got proper USB RNDIS support.
PR: 254345
MFC after: 1 week
Sponsored by: Mellanox Technologies // NVIDIA Networking
Update the reference of which file to update in the doc tree when
bumping __FreeBSD_version.
This change is to catch up with commit f8fed61b80 in the doc repository.
MFC after: 3 days
Approved by: lwhsu (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D29920