1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-11-21 07:15:49 +00:00
Commit Graph

153167 Commits

Author SHA1 Message Date
Kevin Bowling
ab540d44ba igc: sysctl for TCP flag handling during TSO
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.

This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.

MFC after:	3 days
Sponsored by:	Netflix
2024-11-20 19:38:01 -07:00
Michael Tuexen
90853dfac8 e1000: sysctl for TCP flag handling during TSO
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.

This allows to change the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.

Reviewed by:	rrs
MFC after:	3 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D44259
2024-11-20 19:23:55 -07:00
Adrian Chadd
d99eb8230e rtwn: change the USB TX transfers to only do one pending transfer per endpoint
I found I was getting constant device timeouts when doing anything
more complicated than a single SSH on laptop with RTL8811AU.

After digging into it, i found a variety of fun situations, including
traffic stalls that would recover w/ a shorter (1 second) USB transfer
timeout.  However, the big one is a straight up hang of any TX endpoint
until the NIC was reset.  The RX side kept going just fine; only the
TX endpoints would hang.

Reproducing it was easy - just start up a couple of traffic streams
on different WME AC's - eg a best effort + bulk transfer, like
browsing the web and doing an ssh clone - throw in a ping -i 0.1
to your gateway, and it would very quickly hit device timeouts every
couple of seconds.

I put everything into a single TX EP and the hangs went away.
Well, mostly.

So after some MORE digging, I found that this driver isn't checking
if the transfers are going into the correct EPs for the packet
WME access category / 802.11 TID; and would frequently be able
to schedule multiple transfers into the same endpoint.

Then there's a second problem - there's an array of endpoints
used for setting up the USB device, with .endpoint = UE_ADDR_ANY,
however they're also being setup with the same endpoint configured
in multiple transfer configs.  Eg, a NIC with 3 or 4 bulk TX endpoints
will configure the BK and BE endpoints with the same physical endpoint
ID.  This also leads to timed out transfers.

My /guess/ was that the firmware isn't happy with one or both of the
above, and so I solved both.

* drop the USB transfer timeout to 1 second, not 5 seconds -
  that way we'll either get a 1 second traffic pause and USB transfer
  failure, or a 5 second device timeout.  Having both the TX timeout
  and the USB transfer timeout made recovery from a USB transfer
  timeout (without a NIC reset) almost impossible.

* enforce one transfer per endpoint;
* separate pending/active buffer tracking per endpoint;
* each endpoint now has its own TX callback to make sure the queue /
  end point ID is known;
* and only frames from a given endpoint pending queue is going
  into the active queue and into that endpoint.
* Finally, create a local wme2qid array and populate it with the
  endpoint mapping that ensures unique physical endpoint use.

Locally tested:

* rtl8812AU, 11n STA mode
* rtl8192EU, 11n STA mode (with diffs to fix the channel config / power
  timeouts.)

Differential Revision: https://reviews.freebsd.org/D47522
2024-11-20 17:56:56 -08:00
Michael Tuexen
eea2e089f8 ixgbe: sysctl for TCP flag handling during TSO
Add tso_tcp_flags_mask_first_segment, tso_tcp_flags_mask_middle_segment,
and tso_tcp_flags_mask_last_segment sysctl-variables to control the
handling of TCP flags during TSO.

This allows to fix the masks appropriate for classical ECN and to
configure appropriate masks for accurate ECN.

Michael notes emperically 82599 has an unexpected middle mask:
Chip  First Middle Last
82599 0xFF6 0xFF6  0xF7F

which should be fixed up to 0xF76 (RFC 3168) in a future commit.

Reviewed by:	rrs, rscheff
MFC after:	3 days
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D44258
2024-11-20 18:25:06 -07:00
Gleb Smirnoff
3789810845 tcp: avoid bcopy() in tcp_mss_update() 2024-11-20 16:37:24 -08:00
Gleb Smirnoff
2944a888ea tcp: remove so != NULL check
In the modern FreeBSD network stack a socket outlives its tcpcb.
2024-11-20 16:37:18 -08:00
Gleb Smirnoff
b80c06cc0a tcp: use const argument in the TCP hostcache KPI
The hostcache can't modify tcpcb, inpcb or connection info.
2024-11-20 16:30:42 -08:00
Gleb Smirnoff
09000cc133 tcp: mechanically rename hostcache metrics structure fields
Use hc_ prefix instead of rmx_.  The latter stands for "route metrix" and
is an artifact from the 90-ies, when TCP caching was embedded into the
routing table.  The rename should have happened back in 97d8d152c2.

No functional change. Done with sed(1) command:

s/rmx_(mtu|ssthresh|rtt|rttvar|cwnd|sendpipe|recvpipe|granularity|expire|q|hits|updates)/hc_\1/g
2024-11-20 16:29:00 -08:00
Jose Luis Duran
accf71534c
geom: Allow BSD type '!0' partitions
Allow the creation of '!0' partition types.

Fix it by not considering "0" an invalid partition type.

Reviewed by:	emaste
Approved by:	emaste (mentor)
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D47652
2024-11-20 22:28:57 +00:00
Warner Losh
459404cbc4 rtsw: Break out as soon as we find we're doing the inversion workaround
Once we set that we're doing the inversion workaround, there's no sense
continuing to search for the inversion workaround.

Sponsored by:		Netflix
Reviewed by:	adrian
Differential Revision:	https://reviews.freebsd.org/D47686
2024-11-20 14:37:20 -07:00
Jessica Clarke
7a3af393d8 sys/cdefs.h: Add comments to make #if/#else/#endif triple more obvious
This block has a lot of nesting, not helped by two adjacent nested
blocks involving _POSIX_C_SOURCE, with only the inner one commented,
looking like it's the end of the outer one. Comment the outer one as
well so it's not quite so hard to figure out.

MFC after:	1 week
2024-11-20 20:09:28 +00:00
Doug Moore
18a8f4e586 vm_page: correct page iterator patch
The previous change committed a preliminary version of the change to
use iterators to free page sequences.  This updates to what was
intended to be the final version.

Reviewed by:	markj (previous version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D46724
2024-11-20 12:00:57 -06:00
Doug Moore
5b78ff8307 vm_page: remove pages with iterators
Use pctrie iterators for removing some page sequences from radix
trees, to avoid repeated searches from the tree root.

Rename vm_page_object_remove to vm_page_remove_radixdone, and remove
from it the responsibility for removing a page from its radix tree,
and pass that responsibility on to its callers.

For one of those callers, vm_page_rename, pass a pages pctrie_iter,
rather than a page, and use the iterator to remove the page from its
radix tree.

Define functions vm_page_iter_remove() and vm_page_iter_free() that
are like vm_page_remove() and vm_page_free(), respectively, except
that they take an iterator as parameter rather than a page, and use
the iterator to remove the page from the radix tree instead of
searching the radix tree. Function vm_page_iter_free() assumes that
the page is associated with an object, and calls
vm_page_free_object_prep to do the part of vm_page_free_prep that is
object-related.

In functions vm_object_split and vm_object_collapse_scan, use a
pctrie_iter to walk over the pages of the object, and use
vm_page_rename and vm_radix_iter_remove modify the radix tree without
searching for pages.  In vm_object_page_remove and _kmem_unback, use a
pctrie_iter and vm_page_iter_free to remove the page from the radix
tree.

Reviewed by:	markj (prevoius version)
Tested by:	pho
Differential Revision:	https://reviews.freebsd.org/D46724
2024-11-20 11:54:20 -06:00
Kristof Provost
e27970ae8f netinet: handle blackhole routes
If during ip_forward() we find a blackhole (or reject) route we should stop
processing and count this in the 'cantforward' counter, just like we already do
for IPv6.
Blackhole routes are set to use the loopback interface, so we don't actually
incorrectly forward traffic, but we do fail to count it as unroutable.

Test this, both for IPv4 and IPv6.

Reviewed by:	melifaro
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D47529
2024-11-20 16:52:41 +01:00
Kristof Provost
4b65481ac6 pf: fix build without DTrace
Reported by:	kib
Fixes:		438ca68cef
Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-11-20 14:23:07 +01:00
Kristof Provost
81f7ad324d pf: add missing unlock
If we fail to unshare the mbuf we forgot to unlock the rules.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
2024-11-20 13:35:53 +01:00
Kristof Provost
438ca68cef netinet: default mib counter probe points off
Disable the IP/IP6/ICMP/... counter probe points by default.
They are kept enabled in debug builds, and can be enabled with
'options KDTRACE_MIB_SDT'.

Requested by:	glebius
Reviewed by:	glebius
Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D47657
2024-11-20 09:52:48 +01:00
Kevin Bowling
e38f9257c3 ixgbe: Add support for 1Gbit Active DAC links
1Gbit also emperically works on Active DACs.

MFC after:	3 days
Sponsored by:	BBOX.io
2024-11-19 21:38:08 -07:00
Warner Losh
7255a2969f stand: Don't need sys/select.h
The boot loader doesn't need the types and prototypes defined in
sys/select.h, so don't indirectly include it.

Sponsored by:		Netflix
2024-11-19 20:35:04 -07:00
Warner Losh
6a88766ecd stand: We don't want signal definitions in the boot loader
We don't support signals in the boot loader, so we don't need to include
sys/signal.h there. It pollutes the namespace.

Sponsored by:		Netflix
2024-11-19 19:57:46 -07:00
Warner Losh
c82ed437c6 sys/conf.h: Make more self-contained
struct cdev has members of type struct timespec. Include sys/_timespec.h
to so we don't need to rely on namespace pollution to define it.

Sponsored by: Netflix
2024-11-19 19:52:40 -07:00
Mark Johnston
3c29734502 hwpmc: Fix whitespace in logging macros
MFC after:	1 week
Sponsored by:	Klara, Inc.
2024-11-19 23:48:53 +00:00
Mark Johnston
e03a056de0 vfs: Fix runningspace tuning after maxphys was bumped
The previous maximum value for the upper watermark was based on the old
value of MAXPHYS.  Raise it to allow more parallel I/O on large systems.

This is still a rather flawed mechanism since it's applied without
regard to the number of filesystems or block devices between which this
mechanism sits, but we might as well bump the limits at this point, as
they haven't been revised in quite a long time.

Reviewed by:	imp, kib
MFC after:	2 weeks
Fixes:		cd85379104 ("Make MAXPHYS tunable. Bump MAXPHYS to 1M.")
Differential Revision:	https://reviews.freebsd.org/D47398
2024-11-19 23:46:50 +00:00
Mark Johnston
e3f6ef5ade sdt: Stop defining probe and provider structures as arrays
There was no reason I can find for defining them this way.

No functional change intended.

Sponsored by:	Innovate UK
2024-11-19 21:18:34 +00:00
Mark Johnston
d6b692835e mii_fdt: Search for the "ethernet-ports" subnode
This is a more common name for the parent of the port nodes.

PR:		280770
MFC after:	2 weeks
Reported by:	Mike Belanger <mibelanger@qnx.com>
2024-11-19 21:05:19 +00:00
Mark Johnston
4ff291ebe8 vfs: Fix vop_stdis_text()
atomic(9) primitives are documented as operating on unsigned types.
Here, we need a cast to avoid a tautological comparison.

Add a regression test for access(2), which was affected by the bug.

Reported by:	NetApp
Reviewed by:	kib
Fixes:		e511bd1406 ("vfs: fully lockless v_writecount adjustment")
MFC after:	1 week
Sponsored by:	Klara, Inc.
Sponsored by:	NetApp, Inc.
Differential Revision:	https://reviews.freebsd.org/D47672
2024-11-19 21:03:14 +00:00
Andrew Turner
ce284dded5 arm64: Fix comparing ID register fields
The logic in update_special_reg_field was reversed. Fix by swapping the
order of the arguments.

PR:		282505
Fixes:		f1fb1d5c90 ("arm64: Support more ID register field types")
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D47437
2024-11-19 17:31:00 +00:00
Andrew Turner
9ff643a8da arm64: Adjust the MPASS in vfp_save_state_savectx
In vfp_save_state_savectx we check if the pcb has a NULL vfp state.
When it's called multiple times with the same pcb then we can panic
because the vfp state has been set.

Weaken the requirement for the state pointer to be NULL by also
allowing it to point to the pcb vfp state area we are about to use.

Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D47237
2024-11-19 17:31:00 +00:00
Andrew Turner
a1330a71d2 acpi: Handle multiple interrupts
When multiple IRQs are specified in a single resource then we only
check the first. Change this to check all interrupts for the value
we expect to find.

Without this we may still enable the interrupt, but it can have the
wrong polatiry or trigger. This can cause an interrupt storm if the
interrupt was configured with a level trigger when it should have
been an edge.

PR:		282241
Reported by:	trasz
Sponsored by:	Arm Ltd
Differential Revision:	https://reviews.freebsd.org/D47487
2024-11-19 17:14:42 +00:00
Dmitry Salychev
c2dd2be344
dpaa2: Fix kernel built with ACPI_DEBUG
PR:			282800
Reported by:		phk
Tested by:		bz
MFC after:		3 days
Differential Revision:	https://reviews.freebsd.org/D47666
2024-11-19 16:49:04 +01:00
John Baldwin
c7d29adcb3 vga_pci: Use bus_generic_* directly instead of wrappers
Differential Revision:	https://reviews.freebsd.org/D47375
2024-11-19 10:26:32 -05:00
John Baldwin
48a88a4ee9 socket: Move SO_SPLICE next to other socket option constants
Reviewed by:	markj
Differential Revision:	https://reviews.freebsd.org/D47626
2024-11-19 10:25:49 -05:00
John Baldwin
590e7a0eb5 rangelock: Use atomic_testandset_ptr
Reviewed by:	imp, kib
Obtained from:	CheriBSD
Sponsored by:	AFRL, DARPA
Differential Revision:	https://reviews.freebsd.org/D47632
2024-11-19 10:25:08 -05:00
John Baldwin
a80b9ee15a atomic(9): Implement atomic_testand(clear|set)_ptr
For current architectures, these are just aliases for the existing
operation on the relevant scalar integer.

Reviewed by:	imp, kib
Obtained from:	CheriBSD
Sponsored by:	AFRL, DARPA
Differential Revision:	https://reviews.freebsd.org/D47631
2024-11-19 10:24:50 -05:00
John Baldwin
fa2091d757 atomic(9): Remove fcmpset-based fallback for atomic_testand(clear|set)
All architectures implement a MD version

Reviewed by:	kib
Sponsored by:	AFRL, DARPA
Differential Revision:	https://reviews.freebsd.org/D47629
2024-11-19 10:23:15 -05:00
John Baldwin
987c5a1944 arm: Implement atomic_testandset_acq_long as a simple wrapper
Use a memory barrier after calling the existing atomic_testandset_long
rather than using the fcmpset-based fallback version from
<sys/_atomic_subword.h>.

Reviewed by:	kib
Sponsored by:	AFRL, DARPA
Differential Revision:	https://reviews.freebsd.org/D47628
2024-11-19 10:22:50 -05:00
John Baldwin
a474e53d03 riscv: Add implementations of atomic_testand(set|clear)_(32|64|long)
These use amoor and amoand rather than a loop.

Also define atomic_testandset_acq_(64|long) using amoor.aq.

Reviewed by:	mhorne, kib
Sponsored by:	AFRL, DARPA
Differential Revision:	https://reviews.freebsd.org/D47627
2024-11-19 10:20:32 -05:00
Adrian Chadd
842a2c1ad3 uath: flush data/commands to the firmware before changing channel / state
The driver wasn't stable - it would start fine, but during scan
it would eventually hang and no further command endpoint transfers
would complete.

After adding some debugging and looking at the logs I noticed that
things went sideways once a /data/ frame was sent.  The channel
change config happened between the data frame being sent and
being completed.

My guess is that the firmware doesn't like a channel change
and reset whilst there's pending data frames.  Checking the Linux
driver I found that it was doing a flush before a channel change,
and we're doing it afterwards.  This acts like a fence around
ensuring scheduled TX work has completed.  In net80211 the
transmit path and the control path aren't serialised, so it's
very often the case that ioctls, state changes, etc occur
whilst in parallel there are frame transmits being scheduled.

This seems to happen more frequently on a more recent, high core
(8) machine with XHCI.  I remember testing this driver years ago
on single and dual core CPU laptops with no problems.

So, add some flushes - before a channel change, and during
a transition to AUTH when the BSS config is being programmed into
the firmware.  These two fences seem enough to reliably
associate as a 2GHz and 5GHz STA.

Note that this isn't entirely blocking all newly queued
transmit work from occuring until after the NIC has finished
configuration.  That will need some further investigation.

Locally tested:

  * Wistron NuWeb AR5523 dual-band NIC, STA mode, 2/5GHz

Differential Revision:	https://reviews.freebsd.org/D47655
2024-11-18 20:50:41 -08:00
Adrian Chadd
7098b90152 usb: fix the ID for the dual-band Wistron AR5523 USB NIC
Use the correct ID, as I have one of these NICs.
Add the previous one back in case it's out there in the wild.

@emaste did a bit of a dig into the product numbers.
@sam did change the ID from 0x0828 -> 082a in a commit
a long while back. It's worth reading the code review for
further details.

However, I do have one of these NICs and I verified that
it indeed has the given ID, and with some follow-up work
to fix some race conditions, it works fine in 2GHz 11bg
and 5GHz 11a operation.

Differential Revision:	https://reviews.freebsd.org/D47654

Obtained from:	Linux, drivers/net/wireless/ath/ar5523/ar5523.c
2024-11-18 20:50:24 -08:00
Adrian Chadd
1375790a15 net80211: add IEEE80211_IS_QOS_NULL()
This will be useful when fixing up the sequence number generation
and checks, as the rules around how sequence numbers are generated
have been clarified in 802.11-2016 and later.  QoS-NULL frames are
explicitly marked as "any sequence number".

But for now, just create a macro and use it in the one place
it's currently being used as a check - ath(4).

* Add IEEE80211_IS_QOS_NULL().
* Change the "will this frame go into the TX block-ack window" check
  in the ath(4) transmit path.  Note this changes the check to be
  more specific, but both paths already had previous checks to ensure
  they're QoS data frames.

Locally tested:

* ath(4), AR9380, STA mode w/ AMPDU TX/RX enabled and negotiated

Differential Revision: https://reviews.freebsd.org/D47645
2024-11-18 20:50:17 -08:00
Gleb Smirnoff
dae64402b3 rtsock: fix panic in rtsock_msg_buffer()
The rtsock_msg_buffer() can be called without walkarg, just to calculate
required length.  It can also be called with a degenerate walkarg, that
doesn't have a w_req.  The latter happens when the function is called from
update_rtm_from_info() for the second time.

Zero init walkarg in update_rtm_from_info() and don't pass random stack
garbage as w_req.

In rtsock_msg_buffer() initialize compat32 boolean only once and take of
possible empty w_req.  Simplify the rest of code once compat32 is already
set.

Reviewed by:		melifaro
Differential Revision:	https://reviews.freebsd.org/D47662
Reported-by: syzbot+d4a2682059e23179e76e@syzkaller.appspotmail.com
Reported-by: syzbot+66d7c9b3062e27a56f3f@syzkaller.appspotmail.com
2024-11-18 14:12:42 -08:00
Kristof Provost
83641335f9 pf: clean up pflow sockets on jail removal
pflow opens sockets in the kernel to transmit netflow information.
If this is done in a (vnet) jail these sockets end up preventing the removal of
the jail. The VNET_SYSUNINIT() vnet_pflowdetach() function doesn't get called,
but that's the function that would remove the sockets.

Install a callback on the PR_METHOD_REMOVE jail callback and close the sockets
there. This ensures that the jail can get cleaned up.

Sponsored by:	Rubicon Communications, LLC ("Netgate")
Differential Revision:	https://reviews.freebsd.org/D47545
2024-11-18 11:06:46 +01:00
Matthew Nygard Dodd
f4f46a2eef hidraw(4): update hgd_actlen in HIDRAW_GET_REPORT ioctl
HIDRAW_GET_REPORT ioctl is documented to update hgd_actlen on return
with the number of bytes copied.  It does not do this.

Reviewed by:	wulf
PR:		282790
MFC after:	1 week
2024-11-18 07:31:39 +03:00
Matthew Nygard Dodd
0b5d86b38a uhid(4): update ugd_actlen in USB_GET_REPORT ioctl
USB_GET_REPORT ioctl is documented to update ugd_actlen on return with
the number of bytes copied.  It does not do this.

Reviewed by:	wulf
PR:		282790
MFC after:	1 week
2024-11-18 07:31:24 +03:00
Michael Tuexen
8caa2f5351 tcp: define tcp_lro_log() only when TCP_BLACKBOX is defined
Reviewed by:		rrs, Peter Lei
MFC after:		1 week
Sponsored by:		Netflix, Inc.
Differential Revision:	https://reviews.freebsd.org/D47401
2024-11-17 19:21:01 +01:00
Alan Cox
8c8d36b9d1 vm: static-ize vm_page_alloc_after()
This function is only intended for the internal use of the VM system.

Reviewed by:	dougm, kib, markj
Differential Revision:	https://reviews.freebsd.org/D47644
2024-11-17 12:19:00 -06:00
Adrian Chadd
3d0d43d25a net80211: remove IEEE80211_FC0_QOSDATA
This is unused by anything in the tree; anything using it should be
instead using one of the frame type macros.

Differential Revision: https://reviews.freebsd.org/D47503
2024-11-17 09:53:16 -08:00
Adrian Chadd
c249cc3822 net80211: migrate FC0_TYPE_MASK / FC0_SUBTYPE_MASK frame type checks to macros
* Add macros for the management and control frame type checks that
  I've come across in the drivers.
* Delete some now old code (eg ath's ieee80211_is_action()) as there's now
  a macro for it.

Local testing:

* not yet, I have a lot of wifi devices to find and test against

Differential Revision: https://reviews.freebsd.org/D47500
2024-11-17 09:53:04 -08:00
Mark Johnston
6817f3375b conf: Fix KCSAN enablement checking
Fixes:	6e3875ebcf ("sys: move SAN and COVERAGE options handling to kern.mk")
2024-11-17 16:40:33 +00:00
Michal Meloun
b882d21558 arm: link all .rodata variants into one output section
MFC after:	1 week
2024-11-17 12:35:55 +01:00