freebsd

mirror of https://git.FreeBSD.org/src.git synced 2024-11-23 07:31:31 +00:00

Author	SHA1	Message	Date
Michael Tuexen	af84665261	tcp: whitespace cleanup of enum tcp_log_events No functional change intended. MFC after: 1 week Sponsored by: Netflix, Inc.	2024-07-11 11:38:11 +02:00
Michael Tuexen	811d831050	tcp: minor cleanup Fix two KASSERTs to catch the condition they are intended to, add two asserts to ensure that the appropriate locking is in place and fix some things related to style. No functional change intended. MFC after: 1 week Sponsored by: Netflix, Inc.	2024-06-29 11:06:35 +02:00
Ryan Libby	0d8da0df41	tcp_rack: avoid gcc -Werror=pointer-to-int-cast on 32-bit arch Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D45752	2024-06-27 20:40:12 -07:00
Ryan Libby	c02a8caf50	tcp_bbr: avoid gcc -Werror=pointer-to-int-cast on 32-bit arch Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D45751	2024-06-27 20:37:26 -07:00
Michael Tuexen	14fee5324a	tcp: improve failure handling in tcp_newtcpcb() In case of a failure of tcp_newtcpcb, where NULL is returned, * call CC_ALGO(tp)->cb_destroy, after CC_ALGO(tp)->cb_init was called. * call khelp_destroy_osd(), after khelp_init_osd() was called. Reviewed by: glebius, rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D45753	2024-06-27 08:26:34 +02:00
Michael Tuexen	3703e1a73e	tcp: improve error handling in syncache_socket() If syncache_socket() fails after calling tcp_newtcpcb(), the resources allocated in tcp_newtcpcb() needs to be freed. Just call tcp_discardcb() to do this. Thanks to jtl for making me aware of the issue and proposing a fix. Reviewed by: glebius, jtl, rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D45749	2024-06-27 07:25:15 +02:00
Zhenlei Huang	08a98731dd	ip_mroute: Use NET_EPOCH_WAIT() macro This makes it easier to grep the usage. Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D45715	2024-06-24 17:57:14 +08:00
Reid Linnemann	de4bfd6b99	udp_var: correct intoudpcb macro unintended identifier dependency Change `483fe9651` embedded struct inpcb into struct udpcb and updated the intoudpcb macro to use __containerof to locate it. This change accidentally introduced a dependency on the identifier inp being defined in the block the macro is expanded in. This should have been the macro argument ip. This change makes this simple correction. No functional change intended. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2024-06-14 17:28:28 +02:00
Kristof Provost	8f04209d37	pf: simplify pf_addrcpy() and pf_match_addr() Use the v4/v6 union members rather than the uint32_t ones. Export IN_ARE_MASKED_ADDR_EQUAL() in in_var.h and use it (and its IPv6 equivalent) for masked comparisons rather than hand-rolled code. Event: Kitchener-Waterloo Hackathon 202406	2024-06-06 15:45:31 +02:00
Michael Tuexen	86c9325d34	tcp: simplify stack switching protocol Before this patch, a stack (tfb) accepts a tcpcb (tp), if the tp->t_state is TCPS_CLOSED or tfb->tfb_tcp_handoff_ok is not NULL and tfb->tfb_tcp_handoff_ok(tp) returns 0. After this patch, the only check is tfb->tfb_tcp_handoff_ok(tp) returns 0. tfb->tfb_tcp_handoff_ok must always be provided. For existing TCP stacks (FreeBSD, RACK and BBR) there is no functional change. However, the logic is simpler. Reviewed by: lstewart, peter_lei_ieee_.org, rrs MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D45253	2024-06-06 08:29:05 +02:00
Michael Tuexen	e7381521aa	tcp: remove unused code in tcp_usr_attach pr_attach is only called on a socket (so) with so->so_listen != NULL via sonewconn. However, sonewconn is not called from the TCP code. The listening sockets are handled in tcp_syncache.c without using sonewconn. Therefore, the code removed is never executed. No functional change intended. Reviewed by: rrs, peter.lei_ieee.org MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D45412	2024-05-30 21:23:45 +02:00
Michael Tuexen	df9de82f54	tcp: fix sending RST after second inp lookup When we first find an inp, we set also the tp. If then a second lookup is necessary, the inp is recomputed. If this fails, the tp is not cleared, which resulted in failing KASSERT. Therefore, clear the tp when staring the inp lookup procedure. Reported by: Jenkins Fixes: `02d15215ce` ("tcp: improve blackhole support") MFC after: 1 week Sponsored by: Netflix, Inc.	2024-05-25 19:58:48 +02:00
Michael Tuexen	02d15215ce	tcp: improve blackhole support There are two improvements to the TCP blackhole support: (1) If net.inet.tcp.blackhole is set to 2, also sent no RST whenever a segment is received on an existing closed socket or if there is a port mismatch when using UDP encapsulation. (2) If net.inet.tcp.blackhole is set to 3, no RST segment is sent in response to incoming segments on closed sockets or in response to unexpected segments on listening sockets. Thanks to gallatin@ for suggesting such an improvement. Reviewed by: gallatin MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D45304	2024-05-24 06:59:13 +02:00
Henrich Hartzer	674956e199	sys/netinet/cc: Switch from deprecated random() to prng32() Related: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277655 Signed-off-by: henrichhartzer@tuta.io Reviewed by: imp, mav Pull Request: https://github.com/freebsd/freebsd-src/pull/1162	2024-05-23 15:10:09 -06:00
Cy Schubert	380ee9b3c0	sys/netinet/icmp6.h: Fix build Fix stdint.h file not found. Fixes: `4b75afe885`	2024-05-23 14:03:55 -07:00
Lexi Winter	4b75afe885	sys/netinet/icmp6.h: use C99 uintX_t constants for new PREF64 struct Reviewed by: imp, glebius (prior suggetions done) Pull Request: https://github.com/freebsd/freebsd-src/pull/1206	2024-05-23 14:40:48 -06:00
Lexi Winter	1e8eb413f6	netinet/icmp6: add PREF64 definitions (RFC 8781) Reviewed by: imp, glebius (prior suggetions done) Pull Request: https://github.com/freebsd/freebsd-src/pull/1206	2024-05-23 14:40:11 -06:00
Michael Tuexen	fe136aecc2	tcp: improve inp locking in setsockopt Ensure that the inp is not dropped when starting a stack switch. While there, clean-up the code by using INP_WLOCK_RECHECK, which also re-assigns tp. Reviewed by: glebius MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D45241	2024-05-23 22:19:12 +02:00
Randall Stewart	ea916b6412	Remove TCP_SAD optional code now that the sack filter performs this function. With the commit of D44903 we no longer need the SAD option. Instead all stacks that use the sack filter inherit its protection against sack-attack. Reviewed by: tuexen@ Differential Revision:https://reviews.freebsd.org/D45216	2024-05-18 10:57:04 -04:00
Marko Zec	42b3c16e30	fib_dxr: code hygiene, prune old code, no functional changes The !DXR2 code corresponds to the original DXR encoding proposal from 2012 with a single direct-lookup stage, which is inferior to the more recent (DXR2) variant with two-stage trie both in terms of memory footprint of the lookup structures, and in terms of overall lookup througput. I'm axing the old code chunks to (hopefully) somewhat improve readability, as well as to simplify future maintenance and updates. MFC after: 1 week	2024-05-17 18:57:25 +02:00
Marko Zec	19bd24caa4	fib_dxr: do not leak memory if FIB constellation hits structural limit DXR lookup table encoding has an inherent structural limit on the amount of binary search ranges it can accomodate. With the current IPv4 BGP views (circa 1 M prefixes) and default DXR encoding we are only at around 5% of that limit, so far, far away from hitting it. Just in case it ever gets hit, make sure we free the allocated structures, instead of leaking it. MFC after: 1 week	2024-05-17 18:46:41 +02:00
Marko Zec	4ab122e8ef	fib_dxr: check if cached fib_data matches the new request in dxr_init() When calling dxr_init(), the FIB_ALGO infrastructure may provide a pointer to a previous dxr instance, which permits reuse of auxiliary dxr structures, i.e. incremental lookup structure updates. For dxr this is a crucial feature provided by FIB_ALGO, since dxr incremental updates are typically several orders of magnitude faster than full lookup table rebuilds. However, the auxiliary dxr structure caches a pointer to struct fib_data and relies upon it for performing incremental updates. Apparently, incremental rebuild requests from FIB_ALGO, i.e. a calls to dxr_init() with a pointer old_data set, may (under not yet fully understood circumstances) be invoked within a different fib_data context than the one cached in the previous version of dxr auxiliary structures. In such (rare) events, we ignore the offered old dxr context, and proceed with a full lookup structure rebuild instead of attempting an incremental one using a fib_data context which may or may not no longer be valid, and thus lead to a system crash. PR: 278422 MFC after: 1 week	2024-05-17 18:21:54 +02:00
Gordon Bergling	78e4dbc345	ipfw: Fix a typo in a source code comment - s/defaul/default/ MFC after: 3 days	2024-05-12 10:53:40 +02:00
Michael Tuexen	2f923a0ced	tcp rack: improve handling of front states When the RACK stack wants to send a FIN, but still has outstanding or unsent data, it sends a challenge ack. Don't do this when the TCP endpoint is still in the front states, since it does not make sense. Reviewed by: rrs MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D45122	2024-05-11 16:28:45 +02:00
Michael Tuexen	5120ea0d88	sctp: improve heartbeat timer computation PR: 278666 Reviewed by: Albin Hellqvist MFC after: 3 days Pull Request: https://reviews.freebsd.org/D45107	2024-05-10 21:02:56 +02:00
Michael Tuexen	b67716dd58	sctp: store heartbeat creation time as time_t Reported by: Coverity Scan CID: 1493087 MFC after: 3 days	2024-05-10 20:40:15 +02:00
Michael Tuexen	42aeb8d490	sctp: store vtag expire time as time_t Reported by: Coverity Scan CID: 1492525 CID: 1493239 MFC after: 3 days	2024-05-10 20:28:38 +02:00
Michael Tuexen	9d8a3718e2	sctp: store cookie secret change time as time_t Reported by: Coverity Scan CID: 1492349 CID: 1493281 MFC after: 3 days	2024-05-10 20:14:16 +02:00
Michael Tuexen	0d15140d6d	sctp: minor cleanup No functional chnage intended. MFC after: 3 days	2024-05-09 00:51:09 +02:00
Michael Tuexen	8c37094036	sctp: allow stcb == NULL in sctp_shutdown() Consistently handle this case. Reported by: Coverity Scan CID: 1533813 MFC after: 3 days	2024-05-09 00:43:28 +02:00
Michael Tuexen	83dcc7790b	sctp: don't provide uninitialized memory to process_chunk_drop() Right now, the code in process_chunk_drop() does not look the the corresponding fields. Therefore, no functional change intended. Reported by: Coverity Scan CID: 1472476 MFC after: 3 days	2024-05-09 00:17:13 +02:00
Michael Tuexen	e187fa5690	sctp: fix sctp_sendall() when an mbuf chain is provided In this case uio is NULL, which needs to be checked and m must be copied into the sctp_copy_all structure. Reported by: Coverity Scan CID: 1400449 MFC after: 3 days	2024-05-08 23:45:55 +02:00
Michael Tuexen	3d40cc7ab8	sctp: add missing check If memory allocation fails, m is NULL. Since this is possible, check for it. Reported by: Coverity Scan CID: 1086866 MFC after: 3 days	2024-05-08 23:03:34 +02:00
Richard Scheffenegger	2a9aae9e5f	tcp: add counter to track when SACK loss recovery uses TSO Add a counter to track how frequently SACK has transmitted more than one MSS using TSO. Instances when this will be beneficial is the use of PRR, or when ACK thinning due to GRO/LRO or ACK discards by the network are present. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D45070	2024-05-08 14:37:33 +02:00
Richard Scheffenegger	dcdfe44901	tcp: add sysctl to allow/disallow TSO during SACK loss recovery Introduce net.inet.tcp.sack.tso for future use when TSO is ready to be used during loss recovery. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D45068	2024-05-08 14:33:20 +02:00
Richard Scheffenegger	cbf3575aa3	tcp: filter small SACK blocks While the SACK Scoreboard in the base stack limits the number of holes by default to only 128 per connection in order to prevent CPU load attacks by splitting SACKs, filtering out SACK blocks of unusually small size can further improve the actual processing of SACK loss recovery. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D45075	2024-05-08 14:00:10 +02:00
Gleb Smirnoff	a254d6870e	carp: isolate VRRP from CARP There is only one functional change here - we don't allow SIOCSVH (or netlink request) to change sc->sc_version. I'm convinced that allowing such a change doesn't brings any practical value, but creates enless minefields in front of both developers and end users (sysadmins). If you want to switch from VRRP to CARP or vice versa, you'd need to recreate the VHID. Oh, one tiny funtional change: carp_ioctl_set() won't modify any fields if it returns EINVAL. Previously you could provide valid advbase with invalid advskew - that used to modify advbase and return EINVAL. All other changes is a sweep around not ever using CARP fields when we are in VRRP mode and vice versa. Also adding assertions on sc_version where necessary. Do not send VRRP vars in CARP mode via NetLink and vice versa. However in compat ioctl SIOCGVH for VRRP mode the CARP fields would be zeroes. This allows to declare softc as union and thus prevent any future logic deterioration wrt to mixing VRRP and CARP. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45039	2024-05-08 13:19:04 +02:00
Gleb Smirnoff	601438fbfa	carp: refactor packet tagging for ether_output() - Separate HMAC preparation (CARP specific) from tagging. - In unicast mode (CARP specific) don't put tag at all. - Don't put pointer to software context into the tag. Putting just vhid, an integer value, is a safer design. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45038	2024-05-08 13:19:04 +02:00
Gleb Smirnoff	cda57d955b	carp: assert that we are calling correct input function. We are. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45037	2024-05-08 13:19:04 +02:00
Gleb Smirnoff	5ee92cbd82	carp: don't chain call vrrp_send_ad via carp_send_ad Provide inline send_ad_locked() that switches between protocol specific sending function. Rename carp_send_ad() to carp_callout() to avoid getting lost in all these multiple foo_send_ad. No functional change intended. Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D45036	2024-05-08 13:19:04 +02:00
Kristof Provost	3711515467	carp: support VRRPv3 Allow carp(4) to use the VRRPv3 protocol (RFC 5798). We can distinguish carp and VRRP based on the protocol version number (carp is 2, VRRPv3 is 3), and support both from the carp(4) code. Reviewed by: glebius Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D44774	2024-05-08 13:19:03 +02:00
Gleb Smirnoff	b6b4ac2faa	tcp_hostcache: remove unnecessary socketvar.h	2024-05-07 14:15:49 -07:00
Richard Scheffenegger	59884aea8b	tcp: clean up macro useage in tcp_fixed_maxseg() Replace local PAD macro with PADTCPOLEN macro No functional change. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D45076	2024-05-04 13:04:25 +02:00
Marko Zec	b24e353f9e	fib_dxr: set fib_data field in struct dxr_aux early enough Previously it was possible for dxr_build() to return with da->fd unset in case of range_tbl or x_tbl malloc() failures. This may have led to NULL ptr dereferencing in dxr_change_rib_batch(). MFC after: 1 week PR: 278422	2024-05-07 17:44:09 +02:00
Marko Zec	4aa275f12d	fib_dxr: s/KASSERT/MPASS/ MFC after: 1 week	2024-05-07 17:33:23 +02:00
Marko Zec	7a5de1d4cc	fib_dxr: KASSERTs for chasing NULL ptr and runaway refcount suspects MFC after: 1 week	2024-05-07 17:22:00 +02:00
Marko Zec	ed541e201a	fib_dxr: move the bulko of malloc() failure logging into dxr_build()	2024-05-07 17:11:30 +02:00
Marko Zec	5295e891d0	fib_dxr: update comment. MFC after: 1 week	2024-05-06 20:42:31 +02:00
Marko Zec	858010643c	fib_dxr: free() does nothing if arg is NULL, so remove a redundant check. MFC after: 1 week	2024-05-06 20:37:44 +02:00
Marko Zec	308caa38cd	fib_dxr: log malloc() failures. MFC after: 1 week	2024-05-06 20:21:55 +02:00
Randall Stewart	fce03f85c5	TCP can be subject to Sack Attacks lets fix this issue. There is a type of attack that a TCP peer can launch on a connection. This is for sure in Rack or BBR and probably even the default stack if it uses lists in sack processing. The idea of the attack is that the attacker is driving you to look at 100's of sack blocks that only update 1 byte. So for example if you have 1 - 10,000 bytes outstanding the attacker sends in something like: ACK 0 SACK(1-512) SACK(1024 - 1536), SACK(2048-2536), SACK(4096 - 4608), SACK(8192-8704) This first sack looks fine but then the attacker sends ACK 0 SACK(1-512) SACK(1025 - 1537), SACK(2049-2537), SACK(4097 - 4609), SACK(8193-8705) ACK 0 SACK(1-512) SACK(1027 - 1539), SACK(2051-2539), SACK(4099 - 4611), SACK(8195-8707) ... These blocks are making you hunt across your linked list and split things up so that you have an entry for every other byte. Has your list grows you spend more and more CPU running through the lists. The idea here is the attacker chooses entries as far apart as possible that make you run through the list. This example is small but in theory if the window is open to say 1Meg you could end up with 100's of thousands link list entries. To combat this we introduce three things. when the peer requests a very small MSS we stop processing SACK's from them. This prevents a malicious peer from just using a small MSS to do the same thing. Any time we get a sack block, we use the sack-filter to remove sacks that are smaller than the smallest v4 mss (minus 40 for max TCP options) unless it ties up to snd_max (since that is legal). All other sacks in theory should be at least an MSS. If we get such an attacker that means we basically start skipping all but MSS sized Sacked blocks. The sack filter used to throw away data when its bounds were exceeded, instead now we increase its size to 15 and then throw away sack's if the filter gets over-run to prevent the malicious attacker from over-running the sack filter and thus we start to process things anyway. The default stack will need to start using the sack-filter which we have talked about in past conference calls to take full advantage of the protections offered by it (and reduce cpu consumption when processing sacks). After this set of changes is in rack can drop its SAD detection completely Reviewed by:tuexen@, rscheff@ Differential Revision: <https://reviews.freebsd.org/D44903>	2024-05-05 09:08:47 -04:00
Richard Scheffenegger	30cf0fbf26	in_pcb: don't leak credential refcounts on error In the error path during allocating an in_pcb, the credentials associated with the new struct get their reference count increased early on, but not decremented when the allocation fails. Reported by: cmiller_netapp.com MFC after: 3 days Reviewed by: jhb, tuexen Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D45033	2024-05-01 08:41:26 +02:00
Gleb Smirnoff	c68eed82a3	accf_tls: accept filter that waits for TLS handshake header	2024-04-24 17:53:10 -07:00
Denny Page	fcdf9a1989	Support ARP for 802 networks This is used by 802.3 Ethernet. (Also be used by 802.4 Token Bus and 802.5 Token Ring, but we don't support those.) This was accidentally removed along with FDDI support in commit `0437c8e3b1`, presumably because comments implied it was used only by FDDI or Token Ring. Fixes: `0437c8e3b1` ("Remove support for FDDI networks.") Reviewed-by: emaste Signed-off-by: Denny Page <dennypage@me.com> Pull-request: https://github.com/freebsd/freebsd-src/pull/1166	2024-04-23 12:30:53 -04:00
Michael Tuexen	1941914d3b	tcp rack: improve BBR_LOG_CWND event Fix a typo, which resulted in missing r_ctl.gate_to_fs in the BBLog event. Reported by: Coverity Scan CID: 1540024 Reviewed by: rrs, rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44648	2024-04-18 21:57:44 +02:00
Michael Tuexen	c9cd686bd4	tcp: drop data received after a FIN has been processed RFC 9293 describes the handling of data in the CLOSE-WAIT, CLOSING, LAST-ACK, and TIME-WAIT states: This should not occur since a FIN has been received from the remote side. Ignore the segment text. Therefore, implement this handling. Reviewed by: rrs, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44746	2024-04-18 21:54:42 +02:00
Michael Tuexen	605a00660e	tcp bbr: improve code consistency Improve code consistency with the RACK stack. Reviewed by: gallatin, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44800	2024-04-15 23:52:08 +02:00
Mark Johnston	1d14e88e53	tcp: Make tcp_var.h more self-contained struct tcpcb embeds a struct osd and a struct callout. Rather than forcing all consumers to pull in the same headers, include the headers directly. No functional change intended. Reviewed by: glebius MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D44685	2024-04-10 08:53:49 -04:00
Gleb Smirnoff	f7c4d12bcd	icmp: correct the assertion that checks limit + jitter Fixes: `4399e055ea`	2024-04-08 16:54:19 -07:00
Kristof Provost	60d8dbbef0	netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters When debugging network issues one common clue is an unexpectedly incrementing error counter. This is helpful, in that it gives us an idea of what might be going wrong, but often these counters may be incremented in different functions. Add a static probe point for them so that we can use dtrace to get futher information (e.g. a stack trace). For example: dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }' This can be disabled by setting the following kernel option: options KDTRACE_NO_MIB_SDT Reviewed by: gallatin, tuexen (previous version), gnn (previous version) Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D43504	2024-04-08 17:29:59 +02:00
Michael Tuexen	e8c149ab85	tcp: add some debug output Also log, when dropping text or FIN after having received a FIN. This is the intended behavior described in RFC 9293. A follow-up patch will enforce this behavior for the base stack and the RACK stack. Reviewed by: rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44669	2024-04-07 22:41:24 +02:00
Michael Tuexen	3e1c8a35f7	tcp: improve consistency No functional change intended. Reported by: Coverity Scan CID: 1523781 Reviewed by: rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44645	2024-04-06 10:02:06 +02:00
Michael Tuexen	d902c8f55b	tcp rack: fix memory corruption When in rack_output() jumping to the label out, don't write errno into the log buffer, since the pointer is not initialized. Reported by: Coverity Scan CID: 1523773 Reviewed by: rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44647	2024-04-06 09:55:46 +02:00
Michael Tuexen	7df0ef5f48	tcp rack: fix sending In rack_output(), idle is used as a boolean variable. So don't use it as an int and don't clear it afterwards. This avoids setting idle to false, when it is not intended. Reported by: olivier Reviewed by: rrs, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44610	2024-04-05 17:47:03 +02:00
Michael Tuexen	60bc195745	tcp bblog: cleanup Remove redundant checks and improve error checking. Reported by: Coverity Scan CID: 1523780 Reviewed by: rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44616	2024-04-05 17:36:40 +02:00
Michael Tuexen	aaaa01c0c8	tcp hpts: initialize variable Ensure that tv.tv_sec is zero in all code paths. Reported by: Coverity Scan CID: 1527724 Reviewed by: rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44584	2024-04-05 17:30:31 +02:00
Michael Tuexen	6b454da6bb	tcp: address a warning t_state is an unsigned variable, so no need for testing that it is non-negative. Reported by: Coverity Scan CID: 1390885 Reviewed by: glebius MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44619	2024-04-04 00:14:59 +02:00
Michael Tuexen	e0bd180130	tcp: fix conversion of rttvar A wrong variable and wrong scaling factors were used. Reported by: Coverity Scan CID: 1508689 Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44612	2024-04-03 18:39:31 +02:00
Michael Tuexen	5a268d8688	tcp: fix comment Make the comment consistent with the code. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44611	2024-04-03 18:26:04 +02:00
Michael Tuexen	b600644fdd	tcp hpts: improve consistency The target_slot argument of max_slots_available() can be NULL. Therefore, check for this in all places. Right now, all callers provide non-NULL pointer. Reported by: Coverity Scan CID: 1527732 Reviewed by: rrs MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44527	2024-04-01 21:51:59 +02:00
Gleb Smirnoff	1a8d176432	inpcb: fully retire inp_ppcb pointer Before a protocol specific control block started to embed inpcb in self (see `0aa120d52f`, `e68b379244`, `483fe96511`) this pointer used to point at it. Retain kf_sock_inpcb field in the struct kinfo_file in <sys/user.h>. The exp-run detected a minimal use of the field in ports: * sysutils/lsof - patched upstream * net-mgmt/netdata - patch accepted upstream * emulators/qemu-user-static - upstream master branch seems not using the field anymore We can keep the field around for some time, but eventually it may be reused for something else. PR: 277659 (exp-run) Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D44491	2024-03-29 12:18:32 -07:00
Randall Stewart	b7b78c1c16	Optimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold HPTS inserts a softclock for system call return that optimizes performance. However when no HPTS threads need the help (i.e. when they have less than 100 or so connections) then there should be little work done i.e. check the counter and return instead of running through all the threads getting locks etc.ptimize HPTS so that little work is done until we have a hpts thread that is over the connection threshold. Reported by: eduardo Reviewed by: gallatin, glebius, tuexen Tested by: gallatin Differential Revision: https://reviews.freebsd.org/D44420	2024-03-28 08:12:37 -04:00
Michael Tuexen	ed505f893a	tcp bblog: use correct length The length of tldl_reason is TCP_LOG_REASON_LEN, not TCP_LOG_ID_LEN. No functional change intended. Reported by: Coverity Scan CID: 1418074 CID: 1418276 Reviewed by: glebius, rscheff MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44510	2024-03-27 14:31:48 +01:00
Gleb Smirnoff	4399e055ea	icmp: allow zero value for ICMP limits Zero means limit is disabled, so the value doesn't need to be checked against jitter value. Fixes: `ac44739fd8` Fixes: `a03aff88a1`	2024-03-24 19:52:03 -07:00
Gleb Smirnoff	4f96be33fe	icmp6: move ICMPv6 related tunables to the files where they are used Most of them can be declared as static after the move out of in6_proto.c. Keeping sysctl(9) declarations with their text descriptions next to the variable declaration create self-documenting code. There should be no functional changes. Differential Revision: https://reviews.freebsd.org/D44481	2024-03-24 09:13:23 -07:00
Gleb Smirnoff	ac44739fd8	icmp: improve ICMP limit jitter Instead of fixing up invalid values set by a user in badport_bandlim() which is a fast path function, provide a sysctl handler sysctl_icmplim_and_jitter(), that will check that jitter is less than the limit. Provide jitter initilization function icmplim_new_jitter() used at boot, in the sysctl handler and when we actually hit the limit. This also fixes no jitter on a fresh booted system until first limit hit. Instead of CVE number provide link the the actual paper that explains what and why we are doing here. The CVE number isn't very informative, it will just tell you what RedHat version you need to upgrade to. Reviewed by: kp, tuexen, zlei Differential Revision: https://reviews.freebsd.org/D44478	2024-03-24 09:13:23 -07:00
Gleb Smirnoff	b508545ce0	icmp: when logging ICMP ratelimiting message use correct jitter value The limiting of the very last second has been done using certain jitter value. We update the jitter for the next second. But the logging should report the jitter before the change. Reviewed by: kp, tuexen, zlei Differential Revision: https://reviews.freebsd.org/D44477	2024-03-24 09:13:23 -07:00
Gleb Smirnoff	9d7f17d746	icmp: hide icmp_bandlimit_uninit() under VIMAGE The uninitialization may be executed only on a kernel with VIMAGE. Reviewed by: kp, tuexen, zlei Differential Revision: https://reviews.freebsd.org/D44476	2024-03-24 09:13:23 -07:00
Gleb Smirnoff	7142ab4790	icmp: do not store per-VNET identical array of strings We need per-VNET struct counter_rate, but we don't need per-VNET set of const char *. Also, identical word "response" can go into the format string instead of being stored 7 times. Reviewed by: kp, zlei, tuexen Differential Revision: https://reviews.freebsd.org/D44475	2024-03-24 09:13:23 -07:00
Michael Tuexen	af700f430f	tcp: no data on SYN segments unless doing TFO Ensure that there is no data on SYN segments unless doing TFO. This check is already in RACK and BBR. Reported by: glebius Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D44384	2024-03-22 11:12:56 +01:00
John Baldwin	3d0a736796	tcp: Add a new kernel-only TCP_USE_DDP socket option This socket option can be used by in-kernel consumers (like NFS) to request a NIC to use optimized receive of large buffers for a connection. The current use case is to support DDP by the TOE on Chelsio NICs. Reviewed by: rscheff, tuexen, glebius Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D44000	2024-03-20 15:29:02 -07:00
Gleb Smirnoff	56f7860087	carp: check CARP status in in_localip_fib(), in6_localip_fib() Don't report a BACKUP CARP address as local. These two functions are used only by source address validation for input packets, controlled by sysctls net.inet.ip.source_address_validation and net.inet6.ip6.source_address_validation. For this purpose we definitely want to treat BACKUP addresses as non local. This change is conservative and doesn't modify compat in_localip() and in6_localip(). They are used more widely than the FIB-aware versions. The change would modify the notion of ipfw(4) 'me' keyword. There might be other consequences as in_localip() is used by various tunneling protocols. PR: 277349	2024-03-19 11:48:59 -07:00
Gleb Smirnoff	e34ea0196f	tcp: clear all TCP timers in tcp_timer_stop() when in callout When a TCP callout decides to disable self, e.g. tcp_timer_2msl() calling tcp_close(), we must also clear all other possible timers. Otherwise, upon return, the callout would be scheduled again in tcp_timer_enter(). Revert `57e27ff07a`, which was a temporary partial revert of otherwise correct `62d47d73b7`, that exposed the problem being fixed now. Add an extra assertion in tcp_timer_enter() to check we aren't arming callout for a closed connection. Reviewed by: rscheff	2024-03-18 13:57:00 -07:00
Gleb Smirnoff	dd7b86e2a0	tcp: remove IS_FASTOPEN() macro The macro is more obfuscating than helping as it just checks a single flag of t_flags. All other t_flags bits are checked without a macro. A bigger problem was that declaration of the macro in tcp_var.h depended on a kernel option. It is a bad practice to create such definitions in installable headers. Reviewed by: rscheff, tuexen, kib Differential Revision: https://reviews.freebsd.org/D44362	2024-03-18 08:56:17 -07:00
Gleb Smirnoff	d62c4607e8	sockets: remove unused KPIs to manipulate sockets These KPIs were added in `dd0e6c383a` and through 15 years had zero use. They slightly remind what IfAPI does for struct ifnet. But IfAPI does that for the sake of large collection of NIC drivers not being aware of struct ifnet. For the sockets it is unclear what could be a large collection of externally written kernel modules that need extensively use sockets and not be aware of their internals at the same time. This isolation of a structure knowledge requires a lot of work, and just throwing in a few KPIs isn't helpful. Reviewed by: kib, olce, markj Differential Revision: https://reviews.freebsd.org/D44311	2024-03-18 08:50:30 -07:00
Gleb Smirnoff	027fda80fe	inpcb: remove unused KPIs to manipulate inpcbs These KPIs were added in `9d29c635da` and through 15 years had zero use. They slightly remind what IfAPI does for struct ifnet. But IfAPI does that for the sake of large collection of NIC drivers not being aware of struct ifnet. For the inpcb it is unclear what could be a large collection of externally written kernel modules that need extensively use inpcb and not be aware of its internals at the same time. This isolation of a structure knowledge requires a lot of work, and just throwing in a few KPIs isn't helpful. Reviewed by: kib, bz, markj Differential Revision: https://reviews.freebsd.org/D44310	2024-03-18 08:49:39 -07:00
Gleb Smirnoff	ab8f59ceaf	rack: don't define TCPOUTFLAGS as the code doesn't use tcp_outflags. This should fix gcc builds.	2024-03-13 21:07:59 -07:00
Konstantin Belousov	220ee18f19	netinet/tcp_var.h: always define IS_FASTOPEN() for kernel compilation env and drop the definition for userspace (which matched TCP_RFC7413) since it depends on presence of the kernel option. Reviewed by: glebius, rscheff Sponsored by: NVIDIA networking MFC after: 1 week Differential revision: https://reviews.freebsd.org/D44349	2024-03-14 01:20:58 +02:00
Richard Scheffenegger	85df11a1de	ktls: deep copy tls_enable struct for in-kernel tcp consumers Doing a deep copy of the keys early allows users of the tls_enable structure to assume kernel memory. This enables the socket options to be set by kernel threads. Reviewed By: #transport, tuexen, jhb, rrs Sponsored by: NetApp, Inc. X-NetApp-PR: #79 Differential Revision: https://reviews.freebsd.org/D44250	2024-03-13 13:23:13 +01:00
Gleb Smirnoff	e4315bbc85	tcp: move struct tcp_ifcap declaration under _KERNEL Reviewed by: rscheff, tuexen, kib Differential Revision: https://reviews.freebsd.org/D44340	2024-03-13 12:14:18 -07:00
Randall Stewart	e18b97bd63	Update to bring the rack stack with all its fixes in. This brings the rack stack up to the current level used at NF. Many fixes and improvements have been added. I also add in a fix to BBR to deal with the changes that have been in hpts for a while i.e. only one call no matter if mbuf queue or tcp_output. It basically does little except BBlogs and is a placemark for future work on doing path capacity measurements. With a bit of a struggle with git I finally got rack_pcm.c into place (apologies for not noticing this error). The LINT kernel is running on my box now .. sigh. Reviewed by: tuexen, glebius Sponsored by: Netflix Inc. Differential Revision:https://reviews.freebsd.org/D43986	2024-03-12 07:55:02 -04:00
Brooks Davis	c112243f6b	Revert "Update to bring the rack stack with all its fixes in." This commit was incomplete and breaks LINT kernels. The tree has been broken for 8+ hours. This reverts commit `f6d489f402`.	2024-03-11 20:28:24 +00:00
Randall Stewart	f6d489f402	Update to bring the rack stack with all its fixes in. This brings the rack stack up to the current level used at NF. Many fixes and improvements have been added. I also add in a fix to BBR to deal with the changes that have been in hpts for a while i.e. only one call no matter if mbuf queue or tcp_output. Note there is a new file that I can't figure out how to get in rack_pcm.c It basically does little except BBlogs and is a placemark for future work on doing path capacity measurements. Reviewed by: tuexen, glebius Sponsored by: Netflix Inc. Differential Revision:https://reviews.freebsd.org/D43986	2024-03-11 07:36:54 -04:00
Michael Tuexen	96ad640178	TCP LRO: add dtrace probe points Add the IP, UDP, and TCP receive static probes to the code path, which avoids if_input. Reviewed by: rrs, markj MFC after: 1 week` Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D43727	2024-03-08 10:21:09 +01:00
Michael Tuexen	d1ce01214a	TCP LRO: disable mbuf queuing when packet filter hooks are in place When doing mbuf queueing, the packet filter hooks in ether_demux(), ip_input(), and ip6_input() are by-passed. This means that the packet filters don't process incoming packets, which might result in connection failures. For example bypassing the TCP sequence number validation will result in dropping valid packets. Please note that this patch is only disabling mbuf queueing, not LRO. Reported by: Herbert J. Skuhra Reviewed by: glebius, rrs, rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D43769	2024-03-08 10:03:43 +01:00
Kyle Evans	47ad4f2d45	ktrace: log genio events on failed write Visibility into the contents of the buffer when a write(2) has failed can be immensely useful in debugging IPC issues -- pushing this to discuss the idea, or maybe an alternative where we can set a flag like KTRFAC_ERRIO to enable it. When a genio event is potentially raised after an error, currently we'll just free the uio and return. However, such data can be useful when debugging communication between processes to, e.g., understand what the remote side should have grabbed before closing a pipe. Tap out the entire buffer on failure rather than simply discarding it. Reviewed by: kib, markj Differential Revision: https://reviews.freebsd.org/D43799	2024-03-04 23:44:09 -06:00
Randall Stewart	638b5ae1c7	HTPS has actually three states not two so the macro needs to account for that. Ok lets fix up the tcp_in_hpts() so that it also says yes if you are in the race state moving and you are scheduled to be put in. This also requires changing the MPASS to be the old version non inline function of tcp_in_hpts(). This change also adds a new inline macro so that a uint64_t timestamp can be obtained by a transport (aka Rack will use this). Reviewed by: glebius, tuexen Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D44157	2024-03-01 15:21:15 -05:00
Gordon Bergling	6bce41a38e	carp(4): Fix a typo in a source code comment - s/successfull/successful/ MFC after: 3 days	2024-02-27 17:39:57 +01:00
Richard Scheffenegger	8917131e00	tcp: need default in switch statement for enum. fix clang error after `c9b6241e25` Reviewed By: imp Differential Revision: https://reviews.freebsd.org/D44081	2024-02-25 08:24:13 +01:00
Richard Scheffenegger	c9b6241e25	tcp: address enum-int-mismatch fix gcc13 error after `f74352fbcf`	2024-02-25 04:46:39 +01:00
Richard Scheffenegger	5e248c23d9	tcp: retain some CC signals outside of kernel scope Summary: fix build error after `f74352fbcf` Reviewers: #transport! Subscribers: imp, melifaro, glebius Differential Revision: https://reviews.freebsd.org/D44066	2024-02-24 21:01:54 +01:00
Michael Tuexen	644cffe67f	sctp: improve sending of packets containing an INIT ACK chunk If the peer announced support of zero checksums, do so when sending packets containing an INIT ACK chunk. MFC after: 1 week	2024-02-24 19:16:36 +01:00
Richard Scheffenegger	038699a8f1	tcp: cubic - restart epoch after RTO This is a migitation to avoid sudden extreme jumps in cwnd, as t_epoch can be very out of date after an RTO. Per RFC9438, sec 4.8, t_epoch is to be reset whenever cwnd grows beyond ssthresh (CC phase transitions from slow start to congestion avoidance), to be fixed with the upcoming cc_cubic changes. MFC after: 3 days Reviewed By: cc, #transport Sponsored by: NetApp, Inc Differential Revision: https://reviews.freebsd.org/D44023	2024-02-24 17:07:46 +01:00
Richard Scheffenegger	40fdc6d25f	tcp: provide correct snd_fack on post_recovery Ensure that snd_fack holds a valid value when doing the post_recovery CC processing, for preparation of the cc_cubic update, so that local pipe calculations can correctly refer to snd_fack during and after CC events. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43957	2024-02-24 16:55:31 +01:00
Richard Scheffenegger	f74352fbcf	tcp: use enum for all congestion control signals Facilitate easier troubleshooting by enumerating all congestion control signals. Typecast the enum to int, when a congestion control module uses private signals. No external change. Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43838	2024-02-24 16:41:48 +01:00
Richard Scheffenegger	38983d40c1	tcp: prevent div by zero in cc_htcp Make sure the divident is at least one. While cwnd should never be smaller than t_maxseg, this can happen during Path MTU Discovery, or when TCP options are considered in other parts of the stack. PR: 276674 MFC after: 3 days Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43797	2024-02-24 16:35:59 +01:00
Michael Tuexen	533faf21c1	sctp: improve consistency MFC after: 1 week	2024-02-23 21:40:46 +01:00
Gordon Bergling	2fb174d18a	sctp(4): Fix a typo in a source code comment - s/anthing/anything/ MFC after: 3 days	2024-02-18 13:01:04 +01:00
Michael Tuexen	2f4e46dfdd	RACK, BBR: handle EACCES like EPERM for IP output handling The FreeBSD TCP base stack handles them also the same way. In case of packet filters dropping packets in the output path, this avoids retranmitting the dropped packet every 10ms or so. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D43773	2024-02-16 12:19:24 +01:00
Gleb Smirnoff	abe8379b4f	sockets: repair wakeup of accept(2) by shutdown(2) That was lost in transition from one-for-all soshutdown() to protocol specific methods. Only protocols that listen(2) were affected. This is not a documented or specified feature, but some software relies on it. At least the FreeSWITCH telephony software uses this behavior on PF_INET/SOCK_STREAM. Fixes: `5bba272807`	2024-02-15 10:48:44 -08:00
Richard Scheffenegger	fcea1cc971	tcp: fix RTO ssthresh for non-6675 pipe calculation Follow up on D43768 to properly deal with the non-default pipe calculation. When CC_RTO is processed, the timeout will have already pulled back snd_nxt. Further, snd_fack is not pulled along with snd_una. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43876	2024-02-14 14:51:53 +01:00
Richard Scheffenegger	57e27ff07a	tcp: partially undo D43792 At the destruction of the tcpcb, no timers are supposed to be running. However, it turns out that stopping them in the close() / shutdown() call does not have the desired effect under all circumstances. This partially reverts `62d47d73b7` to reduce the nuisance caused. PR: 277009 Reported-by: syzbot+9a9aa434a14a2b35c3ba@syzkaller.appspotmail.com Reported-by: syzbot+e82856782410e895bae7@syzkaller.appspotmail.com Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43855	2024-02-12 22:38:11 +01:00
Richard Scheffenegger	62d47d73b7	tcp: stop timers and clean scoreboard in tcp_close() Stop timers when in tcp_close() instead of doing that in tcp_discardcb(). A connection in CLOSED state shall not need any timers. Assert that no timer is rescheduled after that in tcp_timer_activate() and verfiy that this is also the expected state in tcp_discardcb(). PR: 276761 Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43792	2024-02-10 10:30:00 +01:00
Richard Scheffenegger	a8e817cf5c	tcp: stop doing superfluous work after sending RST When sending a RST control segment in tcp_output() it means we are in TCPS_CLOSED state, called from tcp_drop(). Once the RST is sent, don't call tcp_timer_activate() or update anything in tcpcb, since that will go away shortly. PR: 276761 Provided by: glebius Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43808	2024-02-10 10:25:02 +01:00
Richard Scheffenegger	3eeb22cb81	tcp: clean scoreboard when releasing the socket buffer The SACK scoreboard is conceptually an extention of the socket buffer. Remove it when the socket buffer goes away with soisdisconnected(). Verify that this is also the expected state in tcp_discardcb(). PR: 276761 Reviewed by: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43805	2024-02-10 10:20:00 +01:00
Richard Scheffenegger	23c4f23247	tcp: ensure tcp_sack_partialack does not inflate cwnd after RTO The implicit assumption of snd_nxt always being larger than snd_recover is not true after RTO. In that case, cwnd would get inflated to ssthresh, which may be much larger than the current pipe (data in flight). Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43653	2024-02-08 20:40:25 +01:00
Richard Scheffenegger	32a6df57df	tcp: calculate ssthresh on RTO according to RFC5681 per RFC5681, only adjust ssthresh on the initital retransmission timeout. Since RTO often happens during loss recovery, while cwnd no longer tracks all data in flight, calculcate pipe properly. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43768	2024-02-08 19:18:26 +01:00
Richard Scheffenegger	1adab814e8	tcp: use tcp_fixed_maxseg instead of tcp_maxseg in cc modules tcp_fixed_maxseg() is the streamlined calculation of typical tcp options and more suitable for heavy use in the congestion control modules on every received packet. No external functional change. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43779	2024-02-08 18:36:59 +01:00
Gleb Smirnoff	ce69e37369	Revert "sockets: retire sorflush()" Provide a comment in sorflush() why the socket I/O sx(9) lock is actually important. This reverts commit `507f87a799`.	2024-02-03 13:08:41 -08:00
Gleb Smirnoff	f79a8585bb	sockets: garbage collect SS_ISCONFIRMING Fixes: `8df32b19de`	2024-01-30 10:38:33 -08:00
Michael Tuexen	f30c7d5654	TCP LRO: convert TCP header fields to host byte order earlier This is a preparation for adding dtrace hooks in a follow-up commit, which are missing in the code path, where packets are directly queued to the tcpcb. The dtrace hooks expect the fields to be in host byte order. This only applies when TCP HPTS is used. No functional change intended. Reviewed by: rscheff MFC after: 1 week Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D43594	2024-01-29 18:52:17 +01:00
Kristof Provost	ffeab76b68	pfil: PFIL_PASS never frees the mbuf pfil hooks (i.e. firewalls) may pass, modify or free the mbuf passed to them. (E.g. when rejecting a packet, or when gathering up packets for reassembly). If the hook returns PFIL_PASS the mbuf must still be present. Assert this in pfil_mem_common() and ensure that ipfilter follows this convention. pf and ipfw already did. Similarly, if the hook returns PFIL_DROPPED or PFIL_CONSUMED the mbuf must have been freed (or now be owned by the firewall for further processing, like packet scheduling or reassembly). This allows us to remove a few extraneous NULL checks. Suggested by: tuexen Reviewed by: tuexen, zlei Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D43617	2024-01-29 14:10:19 +01:00
Richard Scheffenegger	0b3f9e435f	tcp: move cc_post_recovery past snd_una update The RFC6675 pipe calculation (sack.revised, enabled by default since D28702), uses outdated information, while the previous default calculated it correctly with up-to-date information from the incoming ACK. This difference can become as large as the receive window (not the congestion window previously), potentially triggering a massive burst of new packets. MFC after: 1 week Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43520	2024-01-28 00:18:51 +01:00
Mark Johnston	bbf86c65d0	netinet: Remove stale references to Giant from comments MFC after: 1 week	2024-01-27 13:51:13 -05:00
Richard Scheffenegger	2d05a1c81b	tcp: commonize check for more data to send, style changes Use SEQ_SUB instead of a plain subtraction, for an implict type conversion and prevention of a possible overflow. Use curly brackets in stacked if statements throughout. Use of the ? operator to enhance readability when clearing the FIN flag in tcp_output(). None of the above change the function. Reviewed By: tuexen, cc, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43539	2024-01-26 01:20:35 +01:00
Richard Scheffenegger	fc262fd3dc	tcp: AccECN access ACE field by shifting bits Shifting bits is quicker than checking header flag bits one by one. Also improve readability by the use of switch statements. No change in behaviour. Reviewed By: glebius, tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43560	2024-01-26 00:16:22 +01:00
Richard Scheffenegger	0932fb565a	tcp: fix TCPSTAT accounting for SACK Account for SACK retransmitted bytes once the actual length is known. This prevents a call to tcp_maxseg() and prepares for TSO support when transmitting from the SACK scoreboard. Reviewed By: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43447	2024-01-25 22:58:33 +01:00
Richard Scheffenegger	c7c325d01d	tcp: pass maxseg around instead of calculating locally Improve slowpath processing (reordering, retransmissions) slightly by calculating maxseg only once. This typically saves one of two calls to tcp_maxseg(). Reviewed By: glebius, tuexen, cc, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43536	2024-01-24 16:43:29 +01:00
Gleb Smirnoff	90ad2dc287	tcp: remove 20+ year old disabled code from `d912c694ee`	2024-01-23 13:16:34 -08:00
Gleb Smirnoff	c809435b18	tcp: clear outdated comment mentioning T/TCP	2024-01-23 12:59:21 -08:00
Gleb Smirnoff	e21c668719	tcp: pass positive errno to tcp_drop() Fixes: `446ccdd08e`	2024-01-23 12:59:21 -08:00
Gordon Bergling	9b035689f1	tcp_fastopen: Fix a typo in a source code comment - s/posession/possession/ MFC after: 3 days	2024-01-22 21:49:47 +01:00
Gleb Smirnoff	7f3184ba79	tcp: remove outdated comment This paragraph should have been removed in `446ccdd08e`.	2024-01-22 12:42:21 -08:00
Gordon Bergling	ef0ac0a1ad	tcp_hpts: Fix a typo of a function name in a comment - s/tcp_ouput/tcp_output/ MFC after: 3 days	2024-01-20 17:29:28 +01:00
Richard Scheffenegger	dfe30e4196	tcp: remove unused tcp_sack_output_debug() function This debugging code has been lingering for years with no known use. No functional change. Reviewed by: tuexen, #transport Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43511	2024-01-19 14:48:32 +01:00
Gleb Smirnoff	a079c891c0	sctp: restore missing inpcb lock Fixes: `5bba272807` Reported-by: syzbot+b8636c973dc20fea4a9b@syzkaller.appspotmail.com Reported-by: syzbot+d76a18ee8bbe6f7d3056@syzkaller.appspotmail.com	2024-01-16 23:11:27 -08:00
Xavier Beaudouin	80044c785c	Add UDP encapsulation of ESP in IPv6 This patch provides UDP encapsulation of ESP packets over IPv6. Ports the IPv4 code to IPv6 and adds support for IPv6 in udpencap.c As required by the RFC and unlike in IPv4 encapsulation, UDP checksums are calculated. Co-authored-by: Aurelien Cazuc <aurelien.cazuc.external@stormshield.eu> Sponsored-by: Stormshield Sponsored-by: Wiktel Sponsored-by: Klara, Inc.	2024-01-16 20:44:34 +00:00
Gleb Smirnoff	507f87a799	sockets: retire sorflush() With removal of dom_dispose method the function boils down to two meaningful function calls: socantrcvmore() and sbrelease(). The latter is only relevant for protocols that use generic socket buffers. The socket I/O sx(9) lock acquisition in sorflush() is not relevant for shutdown(2) operation as it doesn't do any I/O that may interleave with read(2) or write(2). The socket buffer mutex acquisition inside sbrelease() is what guarantees thread safety. This sx(9) acquisition in soshutdown() can be tracked down to 4.4BSD times, where it used to be sblock(), and it was carried over through the years evolving together with sockets with no reconsideration of why do we carry it over. I can't tell if that sblock() made sense back then, but it doesn't make any today. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43415	2024-01-16 10:30:49 -08:00
Gleb Smirnoff	5bba272807	sockets: make pr_shutdown fully protocol specific method Disassemble a one-for-all soshutdown() into protocol specific methods. This creates a small amount of copy & paste, but makes code a lot more self documented, as protocol specific method would execute only the code that is relevant to that protocol and nothing else. This also fixes a couple recent regressions and reduces risk of future regressions. The extended KPI for the new pr_shutdown removes need for the extra pr_flush which was added for the sake of SCTP which could not perform its shutdown properly with the old one. Particularly for SCTP this change streamlines a lot of code. Some notes on why certain parts of code were copied or were not to certain protocols: * The (SS_ISCONNECTED \| SS_ISCONNECTING \| SS_ISDISCONNECTING) check is needed only for those protocols that may be connected or disconnected. * The above reduces into only SS_ISCONNECTED for those protocols that always connect instantly. * The ENOTCONN and continue processing hack is left only for datagram protocols. * The SOLISTENING(so) block is copied to those protocols that listen(2). * sorflush() on SHUT_RD is copied almost to every protocol, but that will be refactored later. * wakeup(&so->so_timeo) is copied to protocols that can make a non-instant connect(2), can SO_LINGER or can accept(2). There are three protocols (netgraph(4), Bluetooth, SDP) that did not have pr_shutdown, but old soshutdown() would still perform sorflush() on SHUT_RD for them and also wakeup(9). Those protocols partially supported shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost shutdown(2) support. I'm pretty sure netgraph(4) and Bluetooth are okay about that and SDP is almost abandoned anyway. Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43413	2024-01-16 10:30:37 -08:00
Gleb Smirnoff	d4033ebd05	divert: just return EOPNOTSUPP on shutdown(2) Before this change we would always return ENOTCONN. There is no legitimate use of shutdown(2) on divert(4).	2024-01-12 02:04:04 -08:00
Michael Tuexen	13720136fb	tcpsso: fix when used without -i option Since `fdb987bebd` it is not possible anymore to use inp_next iterator for bound, but unconnected sockets. This applies to TCP listening sockets. Therefore the metioned commit broke tcpsso on listening sockets if the -i option was not used. Fix this by iterating through all endpoints instead of only through the bound, but unconnected ones. Reviewed by: markj Fixes: `fdb987bebd` ("inpcb: Split PCB hash tables") Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D43353	2024-01-10 08:33:09 +01:00
John Baldwin	8cb9b68f58	sys: Use mbufq_empty instead of comparing mbufq_len against 0 Reviewed by: bz, emaste Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D43338	2024-01-09 11:00:46 -08:00
Richard Scheffenegger	429f14f83a	tcp: clean PRR state after ECN congestion recovery. PRR state was not properly reset on subsequent ECN CE events. Clean up after local transmission failures too. Reviewed by: tuexen, cc, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43170	2024-01-08 10:53:04 +01:00
Richard Scheffenegger	f4574e2dc5	tcp: prevent spurious empty segments and fix uncommon panic Only try sending more data on pure ACKs when there is more data available in the send buffer. In the case of a retransmitted SYN not being sent due to an internal error, the snd_una/snd_nxt accounting could be off, leading to a panic. Pulling snd_nxt up to snd_una prevents this from happening. Reported by: fengdreamer@126.com Reviewed by: cc, tuexen, #transport MFC after: 1 week Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43343	2024-01-08 10:52:49 +01:00
Richard Scheffenegger	30409ecdb6	tcp: do not purge SACK scoreboard on first RTO Keeping the SACK scoreboard intact after the first RTO and retransmitting all data anew only on subsequent RTOs allows a more timely and efficient loss recovery under many adverse cirumstances. Reviewed By: tuexen, #transport MFC after: 10 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D42906	2024-01-06 20:25:38 +01:00
Richard Scheffenegger	893ed42eca	tcp: Make use of enum for sack_changed No functional change. Reviewed By: tuexen, #transport MFC after: 3 days Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D43346	2024-01-06 20:23:52 +01:00
Michael Tuexen	aa1223ac3a	tcp: limit visibility of symbols Put most symbols under __BSD_VISIBLE and limit the namespace of tcp_[gs]et_flags. Reviewed by: kib, karels, rscheff Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D43245	2024-01-06 12:00:38 +01:00
Jose Luis Duran	b0e13f785b	netinet: Define IPv6 ECN mask Define a mask for the code point used for ECN in the Traffic Class field (2 bits) of an IPv6 header. BE: 0 0 3 0 0 0 0 0 Bit: 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \|Version\| Traffic Class \| Flow Label \| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ \| ... \| For BE (Big Endian), or network-byte order, this corresponds to 0x00300000. For Little Endian, it corresponds to 0x00003000. Reviewed by: imp, markj MFC after: 1 week Pull Request: https://github.com/freebsd/freebsd-src/pull/879	2024-01-03 12:56:28 -05:00
Richard Kümmel	7df9da47e8	Fix udp IPv4-mapped address Do not use the cached route if the destination isn't the same. This fix a problem where an UDP packet will be sent via the wrong route and interface if a previous one was sent via them. PR: 275774 Reviewed by: glebius, tuexen Sponsored by: Beckhoff Automation GmbH & Co. KG	2024-01-02 07:49:12 +01:00
Michael Tuexen	642ac6015b	tcp: fix ports inline is only support in C99 and newer. To support also C89, use __inline instead as suggested by dim. Reported by: eduardo Reviewed by: rscheff, markj, dim, imp Tested by: eduardo Fixes: `a8b70cf260` ("netpfil: Use accessor functions and named constants for all tcphdr flags") Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D43231	2023-12-30 03:28:13 +01:00

1 2 3 4 5 ...

8100 Commits