Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
MFC after: 3 days
Sponsored by: Netflix
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
Reviewed by: rrs
MFC after: 3 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44259
I found I was getting constant device timeouts when doing anything
more complicated than a single SSH on laptop with RTL8811AU.
After digging into it, i found a variety of fun situations, including
traffic stalls that would recover w/ a shorter (1 second) USB transfer
timeout. However, the big one is a straight up hang of any TX endpoint
until the NIC was reset. The RX side kept going just fine; only the
TX endpoints would hang.
Reproducing it was easy - just start up a couple of traffic streams
on different WME AC's - eg a best effort + bulk transfer, like
browsing the web and doing an ssh clone - throw in a ping -i 0.1
to your gateway, and it would very quickly hit device timeouts every
couple of seconds.
I put everything into a single TX EP and the hangs went away.
Well, mostly.
So after some MORE digging, I found that this driver isn't checking
if the transfers are going into the correct EPs for the packet
WME access category / 802.11 TID; and would frequently be able
to schedule multiple transfers into the same endpoint.
Then there's a second problem - there's an array of endpoints
used for setting up the USB device, with .endpoint = UE_ADDR_ANY,
however they're also being setup with the same endpoint configured
in multiple transfer configs. Eg, a NIC with 3 or 4 bulk TX endpoints
will configure the BK and BE endpoints with the same physical endpoint
ID. This also leads to timed out transfers.
My /guess/ was that the firmware isn't happy with one or both of the
above, and so I solved both.
* drop the USB transfer timeout to 1 second, not 5 seconds -
that way we'll either get a 1 second traffic pause and USB transfer
failure, or a 5 second device timeout. Having both the TX timeout
and the USB transfer timeout made recovery from a USB transfer
timeout (without a NIC reset) almost impossible.
* enforce one transfer per endpoint;
* separate pending/active buffer tracking per endpoint;
* each endpoint now has its own TX callback to make sure the queue /
end point ID is known;
* and only frames from a given endpoint pending queue is going
into the active queue and into that endpoint.
* Finally, create a local wme2qid array and populate it with the
endpoint mapping that ensures unique physical endpoint use.
Locally tested:
* rtl8812AU, 11n STA mode
* rtl8192EU, 11n STA mode (with diffs to fix the channel config / power
timeouts.)
Differential Revision: https://reviews.freebsd.org/D47522
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.
This allows to fix the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.
Michael notes emperically 82599 has an unexpected middle mask:
Chip First Middle Last
82599 0xFF6 0xFF6 0xF7F
which should be fixed up to 0xF76 (RFC 3168) in a future commit.
Reviewed by: rrs, rscheff
MFC after: 3 days
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D44258
Use hc_ prefix instead of rmx_. The latter stands for "route metrix" and
is an artifact from the 90-ies, when TCP caching was embedded into the
routing table. The rename should have happened back in 97d8d152c2.
No functional change. Done with sed(1) command:
s/rmx_(mtu|ssthresh|rtt|rttvar|cwnd|sendpipe|recvpipe|granularity|expire|q|hits|updates)/hc_\1/g
Once we set that we're doing the inversion workaround, there's no sense
continuing to search for the inversion workaround.
Sponsored by: Netflix
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D47686
This block has a lot of nesting, not helped by two adjacent nested
blocks involving _POSIX_C_SOURCE, with only the inner one commented,
looking like it's the end of the outer one. Comment the outer one as
well so it's not quite so hard to figure out.
MFC after: 1 week
The previous change committed a preliminary version of the change to
use iterators to free page sequences. This updates to what was
intended to be the final version.
Reviewed by: markj (previous version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D46724
Use pctrie iterators for removing some page sequences from radix
trees, to avoid repeated searches from the tree root.
Rename vm_page_object_remove to vm_page_remove_radixdone, and remove
from it the responsibility for removing a page from its radix tree,
and pass that responsibility on to its callers.
For one of those callers, vm_page_rename, pass a pages pctrie_iter,
rather than a page, and use the iterator to remove the page from its
radix tree.
Define functions vm_page_iter_remove() and vm_page_iter_free() that
are like vm_page_remove() and vm_page_free(), respectively, except
that they take an iterator as parameter rather than a page, and use
the iterator to remove the page from the radix tree instead of
searching the radix tree. Function vm_page_iter_free() assumes that
the page is associated with an object, and calls
vm_page_free_object_prep to do the part of vm_page_free_prep that is
object-related.
In functions vm_object_split and vm_object_collapse_scan, use a
pctrie_iter to walk over the pages of the object, and use
vm_page_rename and vm_radix_iter_remove modify the radix tree without
searching for pages. In vm_object_page_remove and _kmem_unback, use a
pctrie_iter and vm_page_iter_free to remove the page from the radix
tree.
Reviewed by: markj (prevoius version)
Tested by: pho
Differential Revision: https://reviews.freebsd.org/D46724
If during ip_forward() we find a blackhole (or reject) route we should stop
processing and count this in the 'cantforward' counter, just like we already do
for IPv6.
Blackhole routes are set to use the loopback interface, so we don't actually
incorrectly forward traffic, but we do fail to count it as unroutable.
Test this, both for IPv4 and IPv6.
Reviewed by: melifaro
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47529
Disable the IP/IP6/ICMP/... counter probe points by default.
They are kept enabled in debug builds, and can be enabled with
'options KDTRACE_MIB_SDT'.
Requested by: glebius
Reviewed by: glebius
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47657
struct cdev has members of type struct timespec. Include sys/_timespec.h
to so we don't need to rely on namespace pollution to define it.
Sponsored by: Netflix
The previous maximum value for the upper watermark was based on the old
value of MAXPHYS. Raise it to allow more parallel I/O on large systems.
This is still a rather flawed mechanism since it's applied without
regard to the number of filesystems or block devices between which this
mechanism sits, but we might as well bump the limits at this point, as
they haven't been revised in quite a long time.
Reviewed by: imp, kib
MFC after: 2 weeks
Fixes: cd85379104 ("Make MAXPHYS tunable. Bump MAXPHYS to 1M.")
Differential Revision: https://reviews.freebsd.org/D47398
atomic(9) primitives are documented as operating on unsigned types.
Here, we need a cast to avoid a tautological comparison.
Add a regression test for access(2), which was affected by the bug.
Reported by: NetApp
Reviewed by: kib
Fixes: e511bd1406 ("vfs: fully lockless v_writecount adjustment")
MFC after: 1 week
Sponsored by: Klara, Inc.
Sponsored by: NetApp, Inc.
Differential Revision: https://reviews.freebsd.org/D47672
The logic in update_special_reg_field was reversed. Fix by swapping the
order of the arguments.
PR: 282505
Fixes: f1fb1d5c90 ("arm64: Support more ID register field types")
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D47437
In vfp_save_state_savectx we check if the pcb has a NULL vfp state.
When it's called multiple times with the same pcb then we can panic
because the vfp state has been set.
Weaken the requirement for the state pointer to be NULL by also
allowing it to point to the pcb vfp state area we are about to use.
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D47237
When multiple IRQs are specified in a single resource then we only
check the first. Change this to check all interrupts for the value
we expect to find.
Without this we may still enable the interrupt, but it can have the
wrong polatiry or trigger. This can cause an interrupt storm if the
interrupt was configured with a level trigger when it should have
been an edge.
PR: 282241
Reported by: trasz
Sponsored by: Arm Ltd
Differential Revision: https://reviews.freebsd.org/D47487
For current architectures, these are just aliases for the existing
operation on the relevant scalar integer.
Reviewed by: imp, kib
Obtained from: CheriBSD
Sponsored by: AFRL, DARPA
Differential Revision: https://reviews.freebsd.org/D47631
Use a memory barrier after calling the existing atomic_testandset_long
rather than using the fcmpset-based fallback version from
<sys/_atomic_subword.h>.
Reviewed by: kib
Sponsored by: AFRL, DARPA
Differential Revision: https://reviews.freebsd.org/D47628
These use amoor and amoand rather than a loop.
Also define atomic_testandset_acq_(64|long) using amoor.aq.
Reviewed by: mhorne, kib
Sponsored by: AFRL, DARPA
Differential Revision: https://reviews.freebsd.org/D47627
The driver wasn't stable - it would start fine, but during scan
it would eventually hang and no further command endpoint transfers
would complete.
After adding some debugging and looking at the logs I noticed that
things went sideways once a /data/ frame was sent. The channel
change config happened between the data frame being sent and
being completed.
My guess is that the firmware doesn't like a channel change
and reset whilst there's pending data frames. Checking the Linux
driver I found that it was doing a flush before a channel change,
and we're doing it afterwards. This acts like a fence around
ensuring scheduled TX work has completed. In net80211 the
transmit path and the control path aren't serialised, so it's
very often the case that ioctls, state changes, etc occur
whilst in parallel there are frame transmits being scheduled.
This seems to happen more frequently on a more recent, high core
(8) machine with XHCI. I remember testing this driver years ago
on single and dual core CPU laptops with no problems.
So, add some flushes - before a channel change, and during
a transition to AUTH when the BSS config is being programmed into
the firmware. These two fences seem enough to reliably
associate as a 2GHz and 5GHz STA.
Note that this isn't entirely blocking all newly queued
transmit work from occuring until after the NIC has finished
configuration. That will need some further investigation.
Locally tested:
* Wistron NuWeb AR5523 dual-band NIC, STA mode, 2/5GHz
Differential Revision: https://reviews.freebsd.org/D47655
Use the correct ID, as I have one of these NICs.
Add the previous one back in case it's out there in the wild.
@emaste did a bit of a dig into the product numbers.
@sam did change the ID from 0x0828 -> 082a in a commit
a long while back. It's worth reading the code review for
further details.
However, I do have one of these NICs and I verified that
it indeed has the given ID, and with some follow-up work
to fix some race conditions, it works fine in 2GHz 11bg
and 5GHz 11a operation.
Differential Revision: https://reviews.freebsd.org/D47654
Obtained from: Linux, drivers/net/wireless/ath/ar5523/ar5523.c
This will be useful when fixing up the sequence number generation
and checks, as the rules around how sequence numbers are generated
have been clarified in 802.11-2016 and later. QoS-NULL frames are
explicitly marked as "any sequence number".
But for now, just create a macro and use it in the one place
it's currently being used as a check - ath(4).
* Add IEEE80211_IS_QOS_NULL().
* Change the "will this frame go into the TX block-ack window" check
in the ath(4) transmit path. Note this changes the check to be
more specific, but both paths already had previous checks to ensure
they're QoS data frames.
Locally tested:
* ath(4), AR9380, STA mode w/ AMPDU TX/RX enabled and negotiated
Differential Revision: https://reviews.freebsd.org/D47645
The rtsock_msg_buffer() can be called without walkarg, just to calculate
required length. It can also be called with a degenerate walkarg, that
doesn't have a w_req. The latter happens when the function is called from
update_rtm_from_info() for the second time.
Zero init walkarg in update_rtm_from_info() and don't pass random stack
garbage as w_req.
In rtsock_msg_buffer() initialize compat32 boolean only once and take of
possible empty w_req. Simplify the rest of code once compat32 is already
set.
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D47662
Reported-by: syzbot+d4a2682059e23179e76e@syzkaller.appspotmail.com
Reported-by: syzbot+66d7c9b3062e27a56f3f@syzkaller.appspotmail.com
pflow opens sockets in the kernel to transmit netflow information.
If this is done in a (vnet) jail these sockets end up preventing the removal of
the jail. The VNET_SYSUNINIT() vnet_pflowdetach() function doesn't get called,
but that's the function that would remove the sockets.
Install a callback on the PR_METHOD_REMOVE jail callback and close the sockets
there. This ensures that the jail can get cleaned up.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D47545
HIDRAW_GET_REPORT ioctl is documented to update hgd_actlen on return
with the number of bytes copied. It does not do this.
Reviewed by: wulf
PR: 282790
MFC after: 1 week
USB_GET_REPORT ioctl is documented to update ugd_actlen on return with
the number of bytes copied. It does not do this.
Reviewed by: wulf
PR: 282790
MFC after: 1 week
This function is only intended for the internal use of the VM system.
Reviewed by: dougm, kib, markj
Differential Revision: https://reviews.freebsd.org/D47644
This is unused by anything in the tree; anything using it should be
instead using one of the frame type macros.
Differential Revision: https://reviews.freebsd.org/D47503
* Add macros for the management and control frame type checks that
I've come across in the drivers.
* Delete some now old code (eg ath's ieee80211_is_action()) as there's now
a macro for it.
Local testing:
* not yet, I have a lot of wifi devices to find and test against
Differential Revision: https://reviews.freebsd.org/D47500