freebsd

mirror of https://git.FreeBSD.org/src.git synced 2025-01-18 15:30:21 +00:00

Author	SHA1	Message	Date
Alexander V. Chernikov	0e87bab6b4	routing: fix debug headers added in `6fa8ed43ee`. - move debug headers out of COMPAT_FREEBSD32 in rtsock.c - remove accidentally-added LOG_ defines from syslog.h MFC after: 2 weeks	2022-06-25 23:05:25 +00:00
Alexander V. Chernikov	76179e400a	routing: fix syslog include for rtsock.c MFC after: 2 weeks	2022-06-25 22:08:10 +00:00
Alexander V. Chernikov	6fa8ed43ee	routing: improve debugging. Use unified guidelines for the severity across the routing subsystem. Update severity for some of the already-used messages to adhere the guidelines. Convert rtsock logging to the new FIB_ reporting format. MFC after: 2 weeks	2022-06-25 19:53:31 +00:00
Alexander V. Chernikov	c260d5cd8e	routing: fix crash when RTM_CHANGE results in no-op for the multipath route. Reporting logic assumed there is always some nhop change for every successful modification operation. Explicitly check that the changed nexthop indeed exists when reporting back to userland. MFC after: 2 weeks Reported by: Claudio Jeker <claudio.jeker@klarasystems.com> Tested by: Claudio Jeker <claudio.jeker@klarasystems.com>	2022-06-25 19:35:09 +00:00
Alexander V. Chernikov	c38da70c28	routing: fix RTM_CHANGE nhgroup updates. RTM_CHANGE operates on a single component of the multipath route (e.g. on a single nexthop). Search of this nexthop is peformed by iterating over each component from multipath (nexthop) group, using check_info_match_nhop. The problem with the current code that it incorrectly assumes that `check_info_match_nhop()` returns true value on match, while in reality it returns an error code on failure). Fix this by properly comparing the result with 0. Additionally, the followup code modified original necthop group instead of a new one. Fix this by targetting new nexthop group instead. Reported by: thj Tested by: Claudio Jeker <claudio.jeker@klarasystems.com> Differential Revision: https://reviews.freebsd.org/D35526 MFC after: 2 weeks	2022-06-25 18:54:57 +00:00
Alexander V. Chernikov	5d6894bd66	routing: improve debug logging Use standard logging (FIB_XX_LOG) across nhg code instead of using old-style DPRINTFs. Add debug object printer for nhgs (`nhgrp_print_buf`). Example: ``` Jun 19 20:17:09 devel2 kernel: [nhgrp] inet.0 nhgrp_ctl_alloc_default: multipath init done Jun 19 20:17:09 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 2, compiled_nhop: 2 Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 alloc_nhgrp: num_nhops: 3, compiled_nhop: 3 Jun 19 20:17:26 devel2 kernel: [nhg_ctl] inet.0 destroy_nhgrp: destroying nhg#0/sz=2:[#6:1,#5:1] ``` Differential Revision: https://reviews.freebsd.org/D35525 MFC after: 2 weeks	2022-06-22 15:59:21 +00:00
Mark Johnston	60b4ad4b6b	bpf: Zero pad bytes preceding BPF headers BPF headers are word-aligned when copied into the store buffer. Ensure that pad bytes following the preceding packet are cleared. Reported by: KMSAN MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-06-20 12:48:13 -04:00
Mark Johnston	c88f6908b4	bpf: Correct a comment MFC after: 1 week Sponsored by: The FreeBSD Foundation	2022-06-20 12:48:13 -04:00
Kristof Provost	1f61367f8d	pf: support matching on tags for Ethernet rules Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D35362	2022-06-20 10:16:20 +02:00
Mark Johnston	c262d5e877	debugnet: Fix an error handling bug in the DDB command tokenizer MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-06-16 10:05:10 -04:00
Mark Johnston	8414331481	debugnet: Handle batches of packets from if_input Some drivers will collect multiple mbuf chains, linked by m_nextpkt, before passing them to upper layers. debugnet_pkt_in() didn't handle this and would process only the first packet, typically leading to retransmits. MFC after: 2 weeks Sponsored by: The FreeBSD Foundation	2022-06-16 10:02:00 -04:00
Andrew Gallatin	43c72c45a1	lacp: Remove racy kassert In lacp_select_tx_port_by_hash(), we assert that the selected port is DISTRIBUTING. However, the port state is protected by the LACP_LOCK(), which is not held around lacp_select_tx_port_by_hash(). So this assertion is racy, and can result in a spurious panic when links are flapping. It is certainly possible to fix it by acquiring LACP_LOCK(), but this seems like an early development assert, and it seems best to just remove it, rather than add complexity inside an ifdef INVARIANTS. Sponsored by: Netflix Reviewed by: hselasky Differential Revision: https://reviews.freebsd.org/D35396	2022-06-13 11:32:10 -04:00
Hans Petter Selasky	892eded5b8	vlan(4): Add support for allocating TLS receive tags. The TLS receive tags are allocated directly from the receiving interface, because mbufs are flowing in the opposite direction and then route change checks are not useful, because they only work for outgoing traffic. Differential revision: https://reviews.freebsd.org/D32356 Sponsored by: NVIDIA Networking	2022-06-07 12:54:42 +02:00
Hans Petter Selasky	1967e31379	lagg(4): Add support for allocating TLS receive tags. The TLS receive tags are allocated directly from the receiving interface, because mbufs are flowing in the opposite direction and then route change checks are not useful, because they only work for outgoing traffic. Differential revision: https://reviews.freebsd.org/D32356 Sponsored by: NVIDIA Networking	2022-06-07 12:54:42 +02:00
Gordon Bergling	4f493559b0	if_llatbl: Fix a typo in a debug statement - s/droped/dropped/ Obtained from: NetBSD MFC after: 3 days	2022-06-04 15:22:09 +02:00
Gordon Bergling	f7faa4ad48	if_bridge(4): Fix a typo in a source code comment - s/accross/across/ MFC after: 3 days	2022-06-04 11:26:01 +02:00
Arseny Smalyuk	d18b4bec98	netinet6: Fix mbuf leak in NDP Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d. It happens because lltable_drop_entry_queue() rely on `la_numheld` counter when dropping NDP entries (lles). It turned out NDP code never increased `la_numheld`, so the actual free never happened. Fix the issue by introducing unified lltable_append_entry_queue(), common for both ARP and NDP code, properly addressing packet queue maintenance. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35365 MFC after: 2 weeks	2022-05-31 21:06:14 +00:00
KUROSAWA Takahiro	d6cd20cc5c	netinet6: fix ndp proxying We could insert proxy NDP entries by the ndp command, but the host with proxy ndp entries had not responded to Neighbor Solicitations. Change the following points for proxy NDP to work as expected: * join solicited-node multicast addresses for proxy NDP entries in order to receive Neighbor Solicitations. * look up proxy NDP entries not on the routing table but on the link-level address table when receiving Neighbor Solicitations. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35307 MFC after: 2 weeks	2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro	77001f9b6d	lltable: introduce the llt_post_resolved callback In order to decrease ifdef INET/INET6s in the lltable implementation, introduce the llt_post_resolved callback and implement protocol-dependent code in the protocol-dependent part. Reviewed By: melifaro Differential Revision: https://reviews.freebsd.org/D35322 MFC after: 2 weeks	2022-05-30 10:53:33 +00:00
KUROSAWA Takahiro	3719dedb91	lltable: use sa_family_t instead of int for lltable.llt_af Reviewed By: melifaro, #network Differential Revision: https://reviews.freebsd.org/D35323 MFC after: 2 weeks	2022-05-30 10:53:33 +00:00
Konrad Sewiłło-Jopek	c9a5c48ae8	arp: Implement sticky ARP mode for interfaces. Provide sticky ARP flag for network interface which marks it as the "sticky" one similarly to what we have for bridges. Once interface is marked sticky, any address resolved using the ARP will be saved as a static one in the ARP table. Such functionality may be used to prevent ARP spoofing or to decrease latencies in Ethernet networks. The drawbacks include potential limitations in usage of ARP-based load-balancers and high-availability solutions such as carp(4). The implemented option is disabled by default, therefore should not impact the default behaviour of the networking stack. Sponsored by: Conclusive Engineering sp. z o.o. Reviewed By: melifaro, pauamma_gundo.com Differential Revision: https://reviews.freebsd.org/D35314 MFC after: 2 weeks	2022-05-27 12:41:30 +00:00
Konstantin Belousov	6a311e6fa5	Add ifcap2 names for RXTLS4 and RXTLS6 interface capabilities and corresponding nvlist capabilities name strings. Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551	2022-05-24 23:59:32 +03:00
Konstantin Belousov	051e7d78b0	Kernel-side infrastructure to implement nvlist-based set/get ifcaps Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551	2022-05-24 23:59:32 +03:00
Konstantin Belousov	b96549f057	struct ifnet: add if_capabilities2 and if_capenable2 bitmasks We are running out of bits in if_capabilities. Suggested by: jhb Reviewed by: hselasky, jhb, kp (previous version) Sponsored by: NVIDIA Networking MFC after: 3 weeks Differential revision: https://reviews.freebsd.org/D32551	2022-05-24 23:59:32 +03:00
Andrey V. Elsukov	f2ab916084	[vlan + lagg] add IFNET_EVENT_UPDATE_BAUDRATE event use it to update if_baudrate for vlan interfaces created on the LACP lagg. Differential revision: https://reviews.freebsd.org/D33405	2022-05-20 06:38:43 +02:00
Mitchell Horne	a84bf5eaa1	debugnet: fix an errant assertion We may call debugnet_free() before g_debugnet_pcb_inuse is true, specifically in the cases where the interface is down or does not support debugnet. pcb->dp_drv_input is used to hold the real driver if_input callback while debugnet is in use, so we can check the status of this field in the assertion. This can be triggered trivially by trying to configure netdump on an unsupported interface at the ddb prompt. Initializing the dp_drv_input field to NULL explicitly is not necessary but helps display the intent. PR: 263929 Reported by: Martin Filla <freebsd@sysctl.cz> Reviewed by: cem, markj MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D35179	2022-05-14 10:27:53 -03:00
Kurosawa Takahiro	9573cc3555	rtsock: fix a stack overflow struct sockaddr is not sufficient for buffer that can hold any sockaddr_* structure. struct sockaddr_storage should be used. Test: ifconfig epair create ifconfig epair0a inet6 add 2001:db8::1 up ndp -s 2001:db8::2 02:86:98:2e:96:0b proxy # this triggers kernel stack overflow Reviewed by: markj, kp Differential Revision: https://reviews.freebsd.org/D35188	2022-05-13 20:05:36 +02:00
Kristof Provost	cbbce42345	epair: unbind prior to returning to userspace If 'options RSS' is set we bind the epair tasks to different CPUs. We must take care to not keep the current thread bound to the last CPU when we return to userspace. MFC after: 1 week Sponsored by: Orange Business Services	2022-05-07 18:17:33 +02:00
Kristof Provost	a6b0c8d04d	epair: fix set but not used warning If 'options RSS' is set. MFC after: 1 week Sponsored by: Orange Business Services	2022-05-07 18:17:32 +02:00
Kristof Provost	868bf82153	if: avoid interface destroy race When we destroy an interface while the jail containing it is being destroyed we risk seeing a race between if_vmove() and the destruction code, which results in us trying to move a destroyed interface. Protect against this by using the ifnet_detach_sxlock to also covert if_vmove() (and not just detach). PR: 262829 MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D34704	2022-05-06 13:55:08 +02:00
Gleb Smirnoff	51f798e761	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268 (cherry picked from commit `6871de9363`)	2022-05-05 14:38:07 -04:00
Gleb Smirnoff	4d7a1361ef	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918` (cherry picked from commit `e1882428dc`)	2022-05-05 14:38:07 -04:00
Gleb Smirnoff	80e60e236d	ifnet: make if_index global Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672 (cherry picked from commit `91f44749c6`)	2022-05-05 14:38:07 -04:00
Marko Zec	d461deeaa4	VNET: Revert "ifnet: make if_index global" This reverts commit `91f44749c6`. Devirtualization of V_if_index and V_ifindex_table was rushed into the tree lacking proper context, discussion, and declaration of intent, so I'm backing it out as harmful to VNET on the following grounds: 1) The change repurposed the decades-old and stable if_index KBI for new, unclear goals which were omitted from the commit note. 2) The change opened up a new resource exhaustion vector where any vnet could starve the system of ifnet indices, including vnet0. 3) To circumvent the newly introduced problem of separating ifnets belonging to different vnets from the globalized ifindex_table, the author introduced sysctl_ifcount() which does a linear traversal over the (potentially huge) global ifnet list just to return a simple upper bound on existing ifnet indices. 4) The change effectively led to nonuniform ifnet index allocation among vnets. 5) The commit note clearly stated that the patch changed the implicit if_index ABI contract where ifnet indices were assumed to be starting from one. The commit note also included a correct observation that holes in interface indices were always allowed, but failed to declare that the userland-observable ifindex tables could now include huge empty spans even under modest operating conditions. 6) The author had an earlier proposal in the works which did not affect per-vnet ifnet lists (D33265) but which he abandoned without providing the rationale behind his decision to do so, at the expense of sacrificing the vnet isolation contract and if_index ABI / KBI. Furthermore, the author agreed to back out his changes himself and to follow up with a proposal for a less intrusive alternative, but later silently declined to act. Therefore, I decided to resolve the status-quo by backing this out myself. This in no way precludes a future proposal aiming to mitigate ifnet-removal related system crashes or panics to be accepted, provided it would not unnecessarily compromise the goal of as strict as possible isolation between vnets. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:27:57 +02:00
Marko Zec	6c741ffbfa	Revert "mbuf: do not restore dying interfaces" This reverts commit `703e533da5`. Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif" This reverts commit `e1882428dc`. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:11:40 +02:00
Marko Zec	0fa5636966	Revert "netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs" This reverts commit `6871de9363`. Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex	2022-05-03 19:11:39 +02:00
Greg Foster	00a80538b4	lacp: short timeout erroneously declares link-flapping Panasas was seeing a higher-than-expected number of link-flap events. After joint debugging with the switch vendor, we determined there were problems on both sides; either of which might cause the occasional event, but together caused lots of them. On the switch side, an internal queuing issue was causing LACP PDUs -- which should be sent every second, in short-timeout mode -- to sometimes be sent slightly later than they should have been. In some cases, two successive PDUs were late, but we never saw three late PDUs in a row. On the FreeBSD side, we saw a link-flap event every time there were two late PDUs, while the spec says that it takes three seconds of downtime to trigger that event. It turns out that if a PDU was received shortly before the timer code was run, it would decrement less than a full second after the PDU arrived. Then two delayed PDUs would cause two additional decrements, causing it to reach zero less than three seconds after the most-recent on-time PDU. The solution is to note the time a PDU arrives, and only decrement if at least a full second has elapsed since then. Reported by: Greg Foster <gfoster@panasas.com> Reviewed by: gallatin Tested by: Greg Foster <gfoster@panasas.com> MFC after: 3 days Sponsored by: Panasas Differential Revision: https://reviews.freebsd.org/D35070	2022-04-27 12:41:30 -07:00
Reid Linnemann	0abcc1d2d3	pf: Add per-rule timestamps for rule and eth_rule Similar to ipfw rule timestamps, these timestamps internally are uint32_t snaps of the system time in seconds. The timestamp is CPU local and updated each time a rule or a state associated with a rule or state is matched. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34970	2022-04-22 19:53:20 +02:00
Kristof Provost	812839e5aa	pf: allow the use of tables in ethernet rules Allow tables to be used for the l3 source/destination matching. This requires taking the PF_RULES read lock. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34917	2022-04-20 13:01:12 +02:00
John Baldwin	ac3e46fa3e	infiniband_resolve_addr: ih is only used for INET or INET6.	2022-04-13 16:08:21 -07:00
John Baldwin	d98981585c	ether_resolve_addr: eh is only used for INET or INET6.	2022-04-13 16:08:21 -07:00
John Baldwin	2884a93651	vlan: ifa is only used under #ifdef INET.	2022-04-13 16:08:21 -07:00
John Baldwin	2174f0f2f2	net/route: Use __diagused for variables only used in KASSERT().	2022-04-13 16:08:19 -07:00
Kristof Provost	742e7210d0	udp: allow udp_tun_func_t() to indicate it did not eat the packet Allow udp tunnel functions to indicate they have not taken ownership of the packet, and that normal UDP processing should continue. This is especially useful for scenarios where the kernel has taken ownership of a socket that was originally created by userspace. It allows the tunnel function to pass through certain packets for userspace processing. The primary user of this is if_ovpn, when it receives messages from unknown peers (which might be a new client). Reviewed by: tuexen Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34883	2022-04-12 10:04:59 +02:00
Gordon Bergling	1a15a383a6	net: Fix a typo in a source code comment - s/peform/perform/ MFC after: 3 days	2022-04-09 11:37:57 +02:00
John Baldwin	d08cb45362	iflib: Use empty inline functions for prefetch() on non-x86. This avoids warnings about unused variables in expressions passed to prefetch().	2022-04-08 17:25:14 -07:00
Mark Johnston	990a6d18b0	net: Fix memory leaks in lltable_calc_llheader() error paths Also convert raw epoch_call() calls to lltable_free_entry() calls, no functional change intended. There's no need to asynchronously free the LLEs in that case to begin with, but we might as well use the lltable interfaces consistently. Noticed by code inspection; I believe lltable_calc_llheader() failures do not generally happen in practice. Reviewed by: bz MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34832	2022-04-08 11:47:25 -04:00
John Baldwin	f7236dd068	change_mpath_route: Remove write-only nh variable. While here, cleanup the style of the function prologue by moving an assignment out of the middle of two variable declaration blocks.	2022-04-06 16:45:28 -07:00
John Baldwin	371c917b0b	unlink_nhgrp: Remove write-only variable. Possibly one could assert that ret should always be 0 here (that is, that there was always an index found in the bitmask). That should be true since a bitmask index is allocated before the nhgrp is inserted in the ctl->gr_head list in link_nhgrp.	2022-04-06 16:45:27 -07:00
Warner Losh	e606e5d157	sysctl_dumpentry: move error to inner scope Sponsored by: Netflix	2022-04-04 22:30:50 -06:00
Warner Losh	5de5b5a34d	route_ctl: eliminate write only variables ifa and nh Sponsored by: Netflix	2022-04-04 22:30:48 -06:00
Warner Losh	7f9c3339a4	get_nhop: eliminate write only variable gateway Sponsored by: Netflix	2022-04-04 22:30:47 -06:00
Gordon Bergling	d792dc7ebb	net(4): Fix a typo in a source code comment - s/accomodate/accommodate/ MFC after: 3 days	2022-04-02 14:57:06 +02:00
Gordon Bergling	cba46da538	net(3): Fix a typo in a source code comment - s/verion/version/ MFC after: 3 days	2022-04-02 10:53:40 +02:00
Gordon Bergling	f8d292b665	net(3): Fix a typo in a source code comment - s/Multilik/Multilink/ Obtained from: NetBSD MFC after: 3 days	2022-04-02 09:41:10 +02:00
Gordon Bergling	23677398ca	net(3): Fix a typo in a source code comment - s/paramenters/parameters/ MFC after: 3 days	2022-04-02 09:24:48 +02:00
Kristof Provost	9bb06778f8	pf: support listing ethernet anchors Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-30 10:28:19 +02:00
Gordon Bergling	bef80a7285	vxlan(4): Fix two typos in sysctl descriptions - s/fowarding/forwarding/ MFC after: 3 days	2022-03-28 19:35:34 +02:00
Mateusz Guzik	bd7762c869	pf: add a rule rb tree with md5 sum used as key. This gets rid of the quadratic rule traversal when "keep_counters" is set. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:45:03 +00:00
Mateusz Guzik	1a3e98a5b8	pf: pre-compute rule hash Makes it cheaper to compare rules when "keep_counters" is set. This also sets up keeping them in a RB tree. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:44:52 +00:00
Mateusz Guzik	93f8c38c03	pf: add pf_config_lock For now only protects rule creation/destruction, but will allow gradually reducing the scope of rules lock when changing the rules. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-28 11:44:46 +00:00
Alexander V. Chernikov	1b8b69508b	routing: copy nexthop fib when changing existing nexthop MFC after: 1 day	2022-03-28 11:32:30 +00:00
Gordon Bergling	ef88adc527	pf(4): Fix a typo in a source code comment - s/seaching/searching/ MFC after: 3 days	2022-03-27 19:57:49 +02:00
Kristof Provost	0bf7acd6b7	if_epair: build fix `66acf7685b` failed to build on riscv (and mips). This is because the atomic_testandset_int() (and friends) functions do not exist there. Happily those platforms do have the long variant, so switch to that. PR: 262571 MFC after: 3 days	2022-03-17 06:43:47 +01:00
Michael Gmelin	66acf7685b	if_epair: fix race condition on multi-core systems As an unwanted side effect of the performance improvements in `24f0bfbad5`, epair interfaces stop forwarding traffic on higher load levels when running on multi-core systems. This happens due to a race condition in the logic that decides when to place work in the task queue(s) responsible for processing the content of ring buffers. In order to fix this, a field named state is added to the epair_queue structure. This field is used by the affected functions to signal each other that something happened in the underlying ring buffers that might require work to be scheduled in task queue(s), replacing the existing logic, which relied on checking if ring buffers are empty or not. epair_menq() does: - set BIT_MBUF_QUEUED - queue mbuf - if testandset BIT_QUEUE_TASK: enqueue task epair_tx_start_deferred() does: - swap ring buffers - process mbufs - clear BIT_QUEUE_TASK - if testandclear BIT_MBUF_QUEUED enqueue task PR: 262571 Reported by: Johan Hendriks <joh.hendriks@gmail.com> MFC after: 3 days Differential Revision: https://reviews.freebsd.org/D34569	2022-03-16 23:08:55 +01:00
Kristof Provost	8a42005d1e	pf: support basic L3 filtering in the Ethernet rules Allow filtering based on the source or destination IP/IPv6 address in the Ethernet layer rules. Reviewed by: pauamma_gundo.com (man), debdrup (man) Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34482	2022-03-14 22:42:37 +01:00
Mateusz Guzik	f11b6505f1	pf: add PF_UNLNKDRULES_ASSERT Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-10 17:20:41 +00:00
Vincenzo Maffione	09a1893398	netmap: fix refcount bug in netmap allocator Symptom: when a single extmem memory region is provided to netmap multiple times, for multiple interfaces, the memory region is never released by netmap once all the existing file descriptors are closed. Fix the relevant condition in netmap_mem_drop(): release the memory when the last user of netmap_adapter is gone, rather then when the last user of netmap_mem_d is gone. MFC after: 2 weeks	2022-03-06 16:39:16 +00:00
Santiago Martinez	52bcdc5b80	if_epair: fix build with RSS and INET or INET6 disabled Reviewed by: kp MFC after: 1 week	2022-03-03 18:31:26 +01:00
Kristof Provost	b590f17a11	pf: support masking mac addresses When filtering Ethernet packets allow rules to specify a mac address with a mask. This indicates which bits of the specified address are significant. This allows users to do things like filter based on device manufacturer. Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-03-02 17:00:08 +01:00
Kristof Provost	c5131afee3	pf: add anchor support for ether rules Support anchors in ether rules. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32482	2022-03-02 17:00:07 +01:00
Kristof Provost	fb330f3931	pf: support dummynet on L2 rules Allow packets to be tagged with dummynet information. Note that we do not apply dummynet shaping on the L2 traffic, but instead mark it for dummynet processing in the L3 code. This is the same approach as we take for ALTQ. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32222	2022-03-02 17:00:06 +01:00
Kristof Provost	20c4899a8e	pf: Do not hold PF_RULES_RLOCK while processing Ethernet rules Avoid the overhead of acquiring a (read) RULES lock when processing the Ethernet rules. We can get away with that because when rules are modified they're staged in V_pf_keth_inactive. We take care to ensure the swap to V_pf_keth is atomic, so that pf_test_eth_rule() always sees either the old rules, or the new ruleset. We need to take care not to delete the old ruleset until we're sure no pf_test_eth_rule() is still running with those. We accomplish that by using NET_EPOCH_CALL() to actually free the old rules. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31739	2022-03-02 17:00:03 +01:00
Kristof Provost	e732e742b3	pf: Initial Ethernet level filtering code This is the kernel side of stateless Ethernel level filtering for pf. The primary use case for this is to enable captive portal functionality to allow/deny access by MAC address, rather than per IP address. Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31737	2022-03-02 17:00:03 +01:00
Kristof Provost	36637dd19d	bridge: Don't share broadcast packets if_bridge duplicates broadcast packets with m_copypacket(), which creates shared packets. In certain circumstances these packets can be processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as part of the checksum verification. That may lead to incorrect packets being transmitted. Use m_dup() to create independent mbufs instead. Reported by: Richard Russo <toast@ruka.org> Reviewed by: donner, afedorov MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D34319	2022-02-21 19:03:44 +01:00
Mateusz Guzik	430e0e409c	vnet: add CURVNET_ASSERT_SET for !VIMAGE Reported by: ler Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-02-19 21:00:00 +00:00
Mateusz Guzik	75cde1f872	vnet: add CURVNET_ASSERT_SET Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D34312	2022-02-19 13:10:01 +00:00
Li-Wen Hsu	7442b63231	if_epair: Use ANSI C definition This fixes -Werror=strict-prototypes from gcc9 Sponsored by: The FreeBSD Foundation	2022-02-15 21:45:22 +08:00
Kristof Provost	24f0bfbad5	if_epair: implement fanout Allow multiple cores to be used to process if_epair traffic. We do this (if RSS is enabled) based on the RSS hash of the incoming packet. This allows us to distribute the load over multiple cores, rather than sending everything to the same one. We also switch from swi_sched() to taskqueues, which also contributes to better throughput. Benchmark results: With net.isr.maxthreads=-1 Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1) Before 627 Kpps After (no RSS) 1.198 Mpps After (RSS) 3.148 Mpps Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1) Before 7.705 Kpps After (no RSS) 1.017 Mpps After (RSS) 2.083 Mpps MFC after: 3 weeks Sponsored by: Orange Business Services Differential Revision: https://reviews.freebsd.org/D33731	2022-02-15 09:03:24 +01:00
Kristof Provost	78bc3d5e17	vlan: allow net.link.vlan.mtag_pcp to be set per vnet The primary reason for this change is to facilitate testing. MFC after: 1 week	2022-02-14 22:51:10 +01:00
Aleksandr Fedorov	ceaf442ff2	if_vxlan(4): Allow netmap_generic to intercept RX packets. Netmap (generic) intercepts the if_input method to handle RX packets. Call ifp->if_input() instead of netisr_dispatch(). Add stricter check for incoming packet length. This change is very useful with bhyve + vale + if_vxlan. Reviewed by: vmaffione (mentor), kib, np, donner Approved by: vmaffione (mentor), kib, np, donner MFC after: 2 weeks Sponsored by: vstack.com Differential Revision: https://reviews.freebsd.org/D30638	2022-02-06 15:27:46 +03:00
Kristof Provost	4daa31c108	pflog: align header to 4 bytes, not 8 `6d4baa0d01` incorrectly rounded the lenght of the pflog header up to 8 bytes, rather than 4. PR: 261566 Reported by: Guy Harris <gharris@sonic.net> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")	2022-02-01 18:17:44 +01:00
Mark Johnston	773e3a71b2	pf: Initialize pf_kpool mutexes earlier There are some error paths in ioctl handlers that will call pf_krule_free() before the rule's rpool.mtx field is initialized, causing a panic with INVARIANTS enabled. Fix the problem by introducing pf_krule_alloc() and initializing the mutex there. This does mean that the rule->krule and pool->kpool conversion functions need to stop zeroing the input structure, but I don't see a nicer way to handle this except perhaps by guarding the mtx_destroy() with a mtx_initialized() check. Constify some related functions while here and add a regression test based on a syzkaller reproducer. Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com Reviewed by: kp MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D34115	2022-01-31 16:14:00 -05:00
Gleb Smirnoff	964b8f8b99	ifnet: garbage collect unused function ifaddr_byindex(). Last use was removed in `5adea417d4`.	2022-01-28 09:51:52 -08:00
Gleb Smirnoff	6abb5043a6	rtsock: always set m_pkthdr.rcvif when queueing on netisr netisr uses global workstreams and after dequeueing an mbuf it uses rcvif to get the VNET of the mbuf. Of course, this is not needed when kernel is compiled without VIMAGE. It came out that routing socket does not set rcvif if compiled without VIMAGE. Make this assignment not depending on VIMAGE option. Fixes: `6871de9363`	2022-01-27 09:41:31 -08:00
Gleb Smirnoff	6871de9363	netisr: serialize/restore m_pkthdr.rcvif when queueing mbufs Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33268	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	e1882428dc	ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif Supplement ifindex table with generation count and use it to serialize & restore an ifnet pointer. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33266 Fun note: git show `e6abef0918`	2022-01-26 21:58:50 -08:00
Gleb Smirnoff	91f44749c6	ifnet: make if_index global Now that ifindex is static to if.c we can unvirtualize it. For lifetime of an ifnet its index never changes. To avoid leaking foreign interfaces the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI filter their returned value on curvnet. Since if_vmove() no longer changes the if_index, inline ifindex_alloc() and ifindex_free() into if_alloc() and if_free() respectively. API wise the only change is that now minimum interface index can be greater than 1. The holes in interface indexes were always allowed. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33672	2022-01-26 21:58:44 -08:00
Hans Petter Selasky	c8f2c290e4	Add definitions for TLS receive tags using the existing send tag infrastructure. Although send tags are strictly used for transmit, the name might be changed in the future to be more generic. The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the network driver itself when creating the TLS RX decryption offload filter. TLS receive tags have a modify callback to tell the network driver about the progress of decryption. Currently decryption is done IP packet by IP packet, even if the IP packet contains a partial TLS record. The modify callback allows the network driver to keep track of TCP sequence numbers pointing to the beginning of TLS records after TCP packet reassembly. These callbacks only happen when encrypted or partially decrypted data is received and are used to verify the decryptions starting point for the hardware. Typically the hardware will guess where TLS headers start and needs help from the software to know if the guess was correct. This is the purpose of the modify callback. Differential Revision: https://reviews.freebsd.org/D32356 Discussed with: jhb@ MFC after: 1 week Sponsored by: NVIDIA Networking	2022-01-26 12:55:00 +01:00
Gleb Smirnoff	6d1808f051	if_clone: correctly destroy a clone from a different vnet Try to live with cruel reality fact - if_vmove doesn't move an interface from previous vnet cloning infrastructure to the new one. Let's admit this as design feature and make it work better. * Delete two blocks of code that would fallback to vnet0, if a cloner isn't found. They didn't do any good job and also whole idea of treating vnet0 as special one is wrong. * When deleting a cloned interface, lookup its cloner using it's home vnet. With this change simple sequence works correctly: ifconfig foo0 create jail -c name=jj persist vnet vnet.interface=foo0 jexec jj ifconfig foo0 destroy Differential revision: https://reviews.freebsd.org/D33942	2022-01-24 21:07:16 -08:00
Gleb Smirnoff	54712fc423	if_vmove: improve restoration in cloner's ifgroup membership * Do a single call into if_clone.c instead of two. The cloner can't disappear since the interface sits on its list. * Make restoration smarter - check that cloner with same name exists in the new vnet. Differential revision: https://reviews.freebsd.org/D33941	2022-01-24 21:06:59 -08:00
Eric Joyner	213e91399b	iflib: Allow drivers to determine which queue to TX on Adds a new function pointer to struct if_txrx in order to allow drivers to set their own function that will determine which queue a packet should be sent on. Since this includes a kernel ABI change, bump the __FreeBSD_version as well. (This motivation behind this is to allow the driver to examine the UP in the VLAN tag and determine which queue to TX on based on that, in support of HW TX traffic shaping.) Signed-off-by: Eric Joyner <erj@FreeBSD.org> Reviewed by: kbowling@, stallamr@netapp.com Tested by: jeffrey.e.pieper@intel.com Sponsored by: Intel Corporation Differential Revision: https://reviews.freebsd.org/D31485	2022-01-24 18:22:02 -08:00
Vincenzo Maffione	e0e1240528	netmap: fix LOR in iflib_netmap_register In iflib_device_register(), the CTX_LOCK is acquired first and then IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg() we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive. MFC after: 1 month	2022-01-14 21:09:04 +00:00
Kristof Provost	5f5e32f1b3	pf: protect the rpool from races The roundrobin pool stores its state in the rule, which could potentially lead to invalid addresses being returned. For example, thread A just executed PF_AINC(&rpool->counter) and immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter) (i.e. after the pf_match_addr() check of rpool->counter). Lock the rpool with its own mutex to prevent these races. The performance impact of this is expected to be low, as each rule has its own lock, and the lock is also only relevant when state is being created (so only for the initial packets of a connection, not for all traffic). See also: https://redmine.pfsense.org/issues/12660 Reviewed by: glebius MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33874	2022-01-14 10:30:33 +01:00
Alexander Motin	618d49f5ca	Revert "iflib: Relax timer period from 0.5 to 0.5-0.75s." I've noticed relations between iflib_timer() vs ixl_admin_timer(). Both scheduled at the same 2Hz rate, but the second is rescheduling the first each time, so if the first get any slower, it won't be executed at all. Revert this until deeper investigation. This reverts commit `90bc1cf657`.	2022-01-10 09:40:38 -05:00
Alexander Motin	90bc1cf657	iflib: Relax timer period from 0.5 to 0.5-0.75s. While there switch it from hardclock ticks to milliseconds. MFC after: 2 weeks	2022-01-09 20:32:50 -05:00
Ryan Stone	5adea417d4	Fix ifa refcount leak in ifa_ifwithnet() In `4f6c66cc9c`, ifa_ifwithnet() was changed to no longer ifa_ref() the returned ifaddr, and instead the caller was required to stay in the net_epoch for as long as they wanted the ifaddr to remain valid. However, this missed the case where an AF_LINK lookup would call ifaddr_byindex(), which still does ifa_ref() the ifaddr. This would cause a refcount leak. Fix this by inlining the relevant parts of ifaddr_byindex() here, with the ifa_ref() call removed. This also avoids an unnecessary entry and exit from the net_epoch for this case. I've audited all in-tree consumers of ifa_ifwithnet() that could possibly perform an AF_LINK lookup and confirmed that none of them will expect the ifaddr to have a reference that they need to release. MFC after: 2 months Sponsored by: Dell Inc Differential Revision: https://reviews.freebsd.org/D28705 Reviewed by: melifaro	2022-01-06 15:04:24 -05:00
Ed Maste	a6668e31aa	Fix kernel build without INET and INET6 Reviewed by: brooks, melifaro Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33718	2022-01-05 09:41:38 -05:00
Gleb Smirnoff	644ca0846d	domains: make domain_init() initialize only global state Now that each module handles its global and VNET initialization itself, there is no VNET related stuff left to do in domain_init(). Differential revision: https://reviews.freebsd.org/D33541	2022-01-03 10:15:22 -08:00
Gleb Smirnoff	89128ff3e4	protocols: init with standard SYSINIT(9) or VNET_SYSINIT The historical BSD network stack loop that rolls over domains and over protocols has no advantages over more modern SYSINIT(9). While doing the sweep, split global and per-VNET initializers. Getting rid of pr_init allows to achieve several things: o Get rid of ifdef's that protect against double foo_init() when both INET and INET6 are compiled in. o Isolate initializers statically to the module they init. o Makes code easier to understand and maintain. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33537	2022-01-03 10:15:21 -08:00
Ed Maste	818952c638	Fix kernel build without INET6 Reported by: Gary Jennejohn Fixes: `ff3a85d324` ("[lltable] Add per-family lltable ...") Sponsored by: The FreeBSD Foundation	2021-12-30 18:40:46 -05:00
Stefan Eßer	e2650af157	Make CPU_SET macros compliant with other implementations The introduction of <sched.h> improved compatibility with some 3rd party software, but caused the configure scripts of some ports to assume that they were run in a GLIBC compatible environment. Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being added to ports, but there still were compatibility issues due to invalid assumptions made in autoconfigure scripts. The differences between the FreeBSD version of macros like CPU_AND, CPU_OR, etc. and the GLIBC versions was in the number of arguments: FreeBSD used a 2-address scheme (one source argument is also used as the destination of the operation), while GLIBC uses a 3-adderess scheme (2 source operands and a separately passed destination). The GLIBC scheme provides a super-set of the functionality of the FreeBSD macros, since it does not prevent passing the same variable as source and destination arguments. In code that wanted to preserve both source arguments, the FreeBSD macros required a temporary copy of one of the source arguments. This patch set allows to unconditionally provide functions and macros expected by 3rd party software written for GLIBC based systems, but breaks builds of externally maintained sources that use any of the following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR. One contributed driver (contrib/ofed/libmlx5) has been patched to support both the old and the new CPU_OR signatures. If this commit is merged to -STABLE, the version test will have to be extended to cover more ranges. Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do no longer require that option. The FreeBSD version has been bumped to 1400046 to reflect this incompatible change. Reviewed by: kib MFC after: 2 weeks Relnotes: yes Differential Revision: https://reviews.freebsd.org/D33451	2021-12-30 12:20:32 +01:00
Alexander V. Chernikov	63f7f3921b	routing: Add unified level-based logging support for the routing subsystem. Summary: MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33664	2021-12-29 21:30:18 +00:00
Alexander V. Chernikov	823a08d740	nhops: split nh_family into nh_upper_family and nh_neigh_family. With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway address family. Store them explicitly in the private part of the nexthop data. While here, store nhop fibnum in nhop_prip datastructure to make it self-contained. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D33663	2021-12-29 21:03:19 +00:00
Alexander V. Chernikov	ff3a85d324	[lltable] Add per-family lltable getters. Introduce a new function, lltable_get(), to retrieve lltable pointer for the specified interface and family. Use it to avoid all-iftable list traversal when adding or deleting ARP/ND records. Differential Revision: https://reviews.freebsd.org/D33660 MFC after: 2 weeks	2021-12-29 20:57:15 +00:00
Vincenzo Maffione	4561c4f0ca	net: iflib: sync isc_capenable to if_capenable On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled. When this happens, apply the same change to isc_capenable, which is the iflib private copy of if_capenable (for a subset of the IFCAP_* bits). In this way the iflib drivers can check the bits using isc_capenable rather than if_capenable. This is convenient because the latter access requires an additional indirection through the ifp, and it is also less likely to be in cache. PR: 260068 Reviewed by: kbowling, gallatin MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33156	2021-12-28 10:55:21 +00:00
Kristof Provost	e7809dceb5	pf: make if_pfsync.h self-contained Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33504	2021-12-17 12:38:35 +01:00
Kristof Provost	dc04fa802d	pf: make if_pflog.h self-contained Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33503	2021-12-17 12:38:35 +01:00
Kristof Provost	e9167358e4	net: make if_bridgevar.h self-contained Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33502	2021-12-17 12:38:35 +01:00
Kristof Provost	f4096a7c8a	net: make ethernet.h self-contained Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33501	2021-12-17 12:38:35 +01:00
Kristof Provost	c658610b92	pf: make pfvar.h self-contained Ensure that the pfvar.h header can be included without including any other headers. Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33499	2021-12-17 12:38:34 +01:00
Kristof Provost	b29c145cc1	if_stf: make if_stf.h self-contained Ensure that the if_stf.h header can be included without including any other headers. Reviewed by: imp Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33498	2021-12-17 12:38:34 +01:00
Warner Losh	c6df6f5322	Create wrapper for Giant taken for newbus Create a wrapper for newbus to take giant and for busses to take it too. bus_topo_lock() should be called before interacting with newbus routines and unlocked with bus_topo_unlock(). If you need the topology lock for some reason, bus_topo_mtx() will provide that. Sponsored by: Netflix Reviewed by: mav Differential Revision: https://reviews.freebsd.org/D31831	2021-12-09 17:04:45 -07:00
Mateusz Guzik	e735fa3212	net/if.c: plug set-but-not-unused vars Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-12-09 20:39:40 +00:00
Gleb Smirnoff	7e0bba4d80	ifnet: make V_if_index static to if.c This requires moving net.link.generic sysctl declaration from if_mib.c to if.c. Ideally if_mib.c needs just to be merged to if.c, but they have different license texts. Differential revision: https://reviews.freebsd.org/D33263	2021-12-06 09:32:31 -08:00
Gleb Smirnoff	d74b7baeb0	ifnet_byindex() actually requires network epoch Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment. Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263	2021-12-06 09:32:31 -08:00
Gleb Smirnoff	7b40b00fad	ifnet: merge ifindex_alloc(), ifnet_setbyindex(), if_grow() and call magic Now it is possible to just merge all this complexity into single linear function. Note that IFNET_WLOCK() is a sleepable lock, so we can M_WAITOK and epoch_wait_preempt(). Reviewed by: melifaro, bz, kp Differential revision: https://reviews.freebsd.org/D33262	2021-12-06 09:32:31 -08:00
Gleb Smirnoff	6ff4cac2ee	ifnet: initial if_grow() shall always succeed So let's just call malloc() directly. This also avoids hidden doubling of default V_if_indexlim. Reviewed by: melifaro, bz, kp Differential revision: https://reviews.freebsd.org/D33261	2021-12-06 09:32:31 -08:00
Gleb Smirnoff	450394af27	ifnet: use ck_pr(3) store & load setting ifnet pointer in ifindex The lockless access to the array is protected by the network epoch. Reviewed by: bz, kp Differential revision: https://reviews.freebsd.org/D33260	2021-12-06 09:32:30 -08:00
Gleb Smirnoff	8062e5759c	ifnet: allocate index at the end of if_alloc_domain() Now that if_alloc_domain() never fails and actually doesn't expose ifnet to outside we can eliminate IFNET_HOLD and two step index allocation. Reviewed by: kp Differential revision: https://reviews.freebsd.org/D33259	2021-12-06 09:32:30 -08:00
Gleb Smirnoff	ad2a0aec29	nhop: hash ifnet pointer instead of if_index Yet another problem created by VIMAGE/if_vmove/epair design that relocates ifnet between vnets and changes if_index. Since if_index changes, nhop hash values also changes, unlink_nhop() isn't able to find entry in hash and leaks the nhop. Since nhop references ifnet, the latter is also leaked. As result running network tests leaks memory on every single test that creates vnet jail. While here, rewrite whole hash_priv() to use static initializer, per Alexander's suggestion. Reviewed by: melifaro	2021-12-04 10:05:46 -08:00
Kristof Provost	6d4baa0d01	if_pflog: fix packet length There were two issues with the new pflog packet length. The first is that the length is expected to be a multiple of sizeof(long), but we'd assumed it had to be a multiple of sizeof(uint32_t). The second is that there's some broken software out there (such as Wireshark) that makes incorrect assumptions about the amount of padding. That is, Wireshark assumes there's always three bytes of padding, rather than however much is needed to get to a multiple of sizeof(long). Fix this by adding extra padding, and a fake field to maintain Wireshark's assumption. Reported by: Ozkan KIRIK <ozkan.kirik@gmail.com> Tested by: Ozkan KIRIK <ozkan.kirik@gmail.com> MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33236	2021-12-04 08:42:55 +01:00
Cy Schubert	db0ac6ded6	Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816" This reverts commit `266f97b5e9`, reversing changes made to `a10253cffe`. A mismerge of a merge to catch up to main resulted in files being committed which should not have been.	2021-12-02 14:45:04 -08:00
Cy Schubert	266f97b5e9	wpa: Import wpa_supplicant/hostapd commit 14ab4a816 This is the November update to vendor/wpa committed upstream 2021-11-26. MFC after: 1 month	2021-12-02 13:35:14 -08:00
Gleb Smirnoff	9e93d2b335	ifnet: enable & fix if_debug build Fixes: `ce40632a31`	2021-12-02 10:59:43 -08:00
Gleb Smirnoff	93c67567e0	Remove "options PCBGROUP" With upcoming changes to the inpcb synchronisation it is going to be broken. Even its current status after the move of PCB synchronization to the network epoch is very questionable. This experimental feature was sponsored by Juniper but ended never to be used in Juniper and doesn't exist in their source tree [sjg@, stevek@, jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix [gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@]. I'm up to resurrecting it back if there is any interest from anybody. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33020	2021-12-02 10:48:48 -08:00
Gleb Smirnoff	1cec1c5831	Allow to compile RSS without PCBGROUP. Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33019	2021-12-02 10:48:48 -08:00
Zhenlei Huang	73d41cc730	if_epair: Also mark the flag of pair b with IFF_KNOWSEPOCH Reviewed by: kp MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D33210	2021-12-01 15:54:23 +01:00
Kristof Provost	439da7f06d	if_stf: KASAN fix In in_stf_input() we grabbed a pointer to the IPv4 header and later did an m_pullup() before we look at the IPv6 header. However, m_pullup() could rearrange the mbuf chain and potentially invalidate the pointer to the IPv4 header. Avoid this issue by copying the IP header rather than getting a pointer to it. Reported by: markj, Jenkins (KASAN job) Reviewed by: markj MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33192	2021-11-30 17:35:15 +01:00
Mateusz Guzik	2cedfc3f7e	if_epair: ifdef vars only used with ALTQ Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-24 21:28:54 +00:00
Gleb Smirnoff	3bc40f39fd	if_free: add a comment explaining why ifindex_free() is performed here	2021-11-22 19:59:27 -08:00
Gleb Smirnoff	fe499a8452	ifnet: merge if_destroy() and if_free_internal() into one New function has more meaningful name if_free_deferred() and has its header comment fixed to reflect reality. NFC	2021-11-22 19:53:12 -08:00
Gleb Smirnoff	4787572d05	ifnet: make if_alloc_domain() never fail The last consumer of if_com_alloc() is firewire. It never fails to allocate. Most likely the if_com_alloc() KPI will go away together with if_fwip(), less likely new consumers of if_com_alloc() will be added, but they would need to follow the no fail KPI.	2021-11-22 19:49:57 -08:00
Gleb Smirnoff	1e3ca25d92	ifnet: make if_alloc_domain() static	2021-11-22 19:49:57 -08:00
Gleb Smirnoff	ce40632a31	ifnet: append if_debug.c to if.c With this change if_index can become static. There is nothing that if_debug.c would want to isolate from if.c. Potentially if.c wants to share everything with if_debug.c. Move Bjoern's copyright to if.c. Reviewed by: bz	2021-11-22 19:49:57 -08:00
Gleb Smirnoff	8a6f38c8ac	ifnet: garbage collect drbr_*_drv(). They were left in `62d76917b8` but after years proved not to be useful.	2021-11-22 19:49:57 -08:00
Kristof Provost	b46512f704	if_stf: add dtrace probe points Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33038	2021-11-20 19:29:01 +01:00
Kristof Provost	19dc644511	if_stf: add 6rd support Implement IPv6 Rapid Deployment (RFC5969) on top of the existing 6to4 (RFC3056) if_stf code. PR: 253328 Reviewed by: hrs Obtained from: pfSense Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33037	2021-11-20 19:29:01 +01:00
Kristof Provost	3142d4f622	lagg: fix unused-but-set-variable MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-19 22:01:27 +01:00
Andriy Gapon	1bfdb812c7	iflib_stop: drain rx tasks to prevent any data races iflib_stop modifies iflib data structures that are used by _task_fn_rx, most prominently the free lists. So, iflib_stop has to ensure that the rx task threads are not active. This should help to fix a crash seen when iflib_if_ioctl (e.g., SIOCSIFCAP) is called while there is already traffic flowing. The crash has been seen on VMWare guests with vmxnet3 driver. My guess is that on physical hardware the couple of 1ms delays that iflib_stop has after disabling interrupts are enough for the queued work to be completed before any iflib state is touched. But on busy hypervisors the guests might not get enough CPU time to complete the work, thus there can be a race between the taskqueue threads and the work done to handle an ioctl, specifically in iflib_stop and iflib_init_locked. PR: 259458 Reviewed by: markj MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D32926	2021-11-19 10:00:38 +02:00
Kristof Provost	8e492101ec	pf: add COMPAT_FREEBSD13 for DIOCKEEPCOUNTERS DIOCKEEPCOUNTERS used to overlap with DIOCGIFSPEEDV0, which has been fixed in 14, but remains in stable/12 and stable/13. Support the old, overlapping, call under COMPAT_FREEBSD13. Reviewed by: jhb Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33001	2021-11-17 03:09:20 +01:00
Mateusz Guzik	79554f2b6c	net: whack "set but not used" warnings in net/rtsock.c ... except for one where the error is ignored. Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-14 17:20:46 +00:00
Mateusz Guzik	c681cce925	net: whack "set but not used" warnings in net/pfil.c Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-14 17:19:58 +00:00
Mateusz Guzik	5a4e46f6ec	net: whack "set but not used" warnings in net/if.c Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-14 17:15:08 +00:00
Kristof Provost	047c4e365d	pf: renumber DIOCKEEPCOUNTERS We accidentally had two ioctls use the same base number (DIOCKEEPCOUNTERS and DIOCGIFSPEEDV{0,1}). We get away with that on most platforms because the size of the argument structures is different. This does break CHERI, and is generally a bad idea anyway. Renumber to avoid this collision. Reported by: jhb	2021-11-14 15:36:59 +01:00
Kristof Provost	8e45fed3ae	if_stf: enable use in vnet jails The cloner must be per-vnet so that cloned interfaces get destroyed when the vnet goes away. Otherwise we fail assertions in vnet_if_uninit(): panic: vnet_if_uninit:475 tailq &V_ifnet=0xfffffe01665fe070 not empty cpuid = 19 time = 1636107064 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015d0cac60 vpanic() at vpanic+0x187/frame 0xfffffe015d0cacc0 panic() at panic+0x43/frame 0xfffffe015d0cad20 vnet_if_uninit() at vnet_if_uninit+0x7b/frame 0xfffffe015d0cad30 vnet_destroy() at vnet_destroy+0x170/frame 0xfffffe015d0cad60 prison_deref() at prison_deref+0x9b0/frame 0xfffffe015d0cadd0 sys_jail_remove() at sys_jail_remove+0x119/frame 0xfffffe015d0cae00 amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe015d0caf30 fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015d0caf30 --- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x8011e920a, rsp = 0x7fffffffe788, rbp = 0x7fffffffe810 --- KDB: enter: panic MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32849	2021-11-09 09:39:53 +01:00
Kristof Provost	3576121c8b	if_stf: style(9) pass As stated in style(9): "Values in return statements should be enclosed in parentheses." MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32848	2021-11-09 09:39:53 +01:00
Kristof Provost	8ca6c11a7c	if_gif: fix vnet shutdown panic If an if_gif exists and has an address assigned inside a vnet when the vnet is shut down we failed to clean up the address, leading to a panic when we ip_destroy() and the V_in_ifaddrhashtbl is not empty. This happens because of the VNET_SYS(UN)INIT order, which means we destroy the if_gif interface before the addresses can be purged (and if_detach() does not remove addresses, it assumes this will be done by the stack teardown code). Set subsystem SI_SUB_PSEUDO just like if_bridge so the cleanup operations happen in the correct order. MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32835	2021-11-08 12:00:00 +01:00
Wojciech Macek	acdfc09639	lagg: update capabilites on SIOCSIFMTU Some NICs might have limited capabilities when Jumbo frames are used. For exampe some neta interfaces only support TX csum offload when the packet size is lower than a value specified in DT. Fix it by re-reading capabilities of children interfaces after MTU has been successfully changed. Found by: Jerome Tomczyk <jerome.tomczyk@stormshield.eu> Reviewed by: jhb Obtained from: Semihalf Sponsored by: Stormshield Differential revision: https://reviews.freebsd.org/D32724	2021-11-06 10:43:08 +01:00
Kristof Provost	76c5eecc34	pf: Introduce ridentifier Allow users to set a number on rules which will be exposed as part of the pflog header. The intent behind this is to allow users to correlate rules across updates (remember that pf rules continue to exist and match existing states, even if they're removed from the active ruleset) and pflog. Obtained from: pfSense MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32750	2021-11-05 09:39:56 +01:00
Bjoern A. Zeeb	1a8f198fa6	epair: remove "All rights reserved" Remove "All rights reserved" from The FreeBSD Foundation owned copyrights on epair code and documentation. Approved by: emaste (FreeBSD Foundation)	2021-11-02 16:50:26 +00:00
Bjoern A. Zeeb	3dd5760aa5	if_epair: rework Rework if_epair(4) to no longer use netisr and dpcpu. Instead use mbufq and swi_net. This simplifies the code and seems to make it work better and no longer hang. Work largely by bz@, with minor tweaks by kp@. Reviewed by: bz, kp MFC after: 3 weeks Differential Revision: https://reviews.freebsd.org/D31077	2021-11-02 09:23:46 +01:00
Mateusz Guzik	8f3d786cb3	pf: remove the flags argument from pf_unlink_state All consumers call it with PF_ENTER_LOCKED. Reviewed by: kp Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-11-01 20:59:14 +01:00
Kristof Provost	62d2dcafb7	if_epair: delete mbuf tags Remove all (non-persistent) tags when we transmit a packet. Real network interfaces do not carry any tags either, and leaving tags attached can produce unexpected results. Reviewed by: bz, glebius MFC after: 3 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D32663	2021-10-28 10:41:16 +02:00
Mark Johnston	426682b05a	bpf: Fix the write filter for detached descriptors A BPF descriptor only has an associated interface descriptor once it is attached to an interface, e.g., with BIOCSETIF. Avoid dereferencing a NULL pointer in filt_bpfwrite() if the BPF descriptor is not attached. Reviewed by: ae Reported by: syzbot+ae45d5166afe15a5a21d@syzkaller.appspotmail.com Fixes: `ded77e0237` ("Allow the BPF to be select for write.") Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D32561	2021-10-26 10:00:39 -04:00
Gleb Smirnoff	c8ee75f231	Use network epoch to protect local IPv4 addresses hash. The modification to the hash are already naturally locked by in_control_sx. Convert the hash lists to CK lists. Remove the in_ifaddr_rmlock. Assert the network epoch where necessary. Most cases when the hash lookup is done the epoch is already entered. Cover a few cases, that need entering the epoch, which mostly is initial configuration of tunnel interfaces and multicast addresses. Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D32584	2021-10-22 14:40:53 -07:00
Gleb Smirnoff	6aae3517ed	Retire synchronous PPP kernel driver sppp(4). The last two drivers that required sppp are cp(4) and ce(4). These devices are still produced and can be purchased at Cronyx <http://cronyx.ru/hardware/wan.html>. Since Roman Kurakin <rik@FreeBSD.org> has quit them, they no longer support FreeBSD officially. Later they have dropped support for Linux drivers to. As of mid-2020 they don't even have a developer to maintain their Windows driver. However, their support verbally told me that they could provide aid to a FreeBSD developer with documentaion in case if there appears a new customer for their devices. These drivers have a feature to not use sppp(4) and create an interface, but instead expose the device as netgraph(4) node. Then, you can attach ng_ppp(4) with help of ports/net/mpd5 on top of the node and get your synchronous PPP. Alternatively you can attach ng_frame_relay(4) or ng_cisco(4) for HDLC. Actually, last time I used cp(4) back in 2004, using netgraph(4) instead of sppp(4) was already the right way to do. Thus, remove the sppp(4) related part of the drivers and enable by default the negraph(4) part. Further maintenance of these drivers in the tree shouldn't be a big deal. While doing that, remove some cruft and enable cp(4) compilation on amd64. The ce(4) for some unknown reason marks its internal DDK functions with __attribute__ fastcall, which most likely is safe to remove, but without hardware I'm not going to do that, so ce(4) remains i386-only. Reviewed by: emaste, imp, donner Differential Revision: https://reviews.freebsd.org/D32590 See also: https://reviews.freebsd.org/D23928	2021-10-22 11:41:36 -07:00
Gleb Smirnoff	2144431c11	Remove in_ifaddr_lock acquisiton to access in_ifaddrhead. An IPv4 address is embedded into an ifaddr which is freed via epoch. And the in_ifaddrhead is already a CK list. Use the network epoch to protect against use after free. Next step would be to CK-ify the in_addr hash and get rid of the... Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D32434	2021-10-13 10:04:46 -07:00
Hartmut Brandt	ded77e0237	Allow the BPF to be select for write. This is needed for boost:asio which otherwise fails to handle BPFs. Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D31967	2021-10-10 17:03:51 +02:00
Alexander V. Chernikov	7e64580b5f	routing: Use the same index space for both nexthop and nexthop groups. This simplifies userland object handling along with kernel-level nexthop handling in fib algo framework. MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D32342	2021-10-08 07:58:55 +00:00
Kristof Provost	76c2e71c4c	pf: remove unused field from pf_kanchor The 'match' field is only used in the userspace version of the struct (pf_anchor). MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-10-07 19:50:22 +02:00
Kristof Provost	5062afff9d	pfctl: userspace adaptive syncookies configration Hook up the userspace bits to configure syncookies in adaptive mode. MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D32136	2021-09-29 15:11:54 +02:00
Kristof Provost	bf8637181a	pf: implement adaptive mode Use atomic counters to ensure that we correctly track the number of half open states and syncookie responses in-flight. This determines if we activate or deactivate syncookies in adaptive mode. MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D32134	2021-09-29 15:11:54 +02:00
Kristof Provost	63b3c1c770	pf: support dummynet Allow pf to use dummynet pipes and queues. We re-use the currently unused IPFW_IS_DUMMYNET flag to allow dummynet to tell us that a packet is being re-injected after being delayed. This is needed to avoid endlessly looping the packet between pf and dummynet. MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31904	2021-09-24 11:41:25 +02:00
Arnaud Ysmal	0b92a7fe47	LACP: Do not wait response for marker messages not sent The error returned when a marker message can not be emitted on a port is not handled. This cause the lacp to block all emissions until the timeout of 3 seconds is reached. To fix this issue, I just clear the LACP_PORT_MARK flag when the packet could not be emitted. Differential revision: https://reviews.freebsd.org/D30467 Obtained from: Stormshield	2021-09-23 10:57:11 +02:00
John Baldwin	c782ea8bb5	Add a switch structure for send tags. Move the type and function pointers for operations on existing send tags (modify, query, next, free) out of 'struct ifnet' and into a new 'struct if_snd_tag_sw'. A pointer to this structure is added to the generic part of send tags and is initialized by m_snd_tag_init() (which now accepts a switch structure as a new argument in place of the type). Previously, device driver ifnet methods switched on the type to call type-specific functions. Now, those type-specific functions are saved in the switch structure and invoked directly. In addition, this more gracefully permits multiple implementations of the same tag within a driver. In particular, NIC TLS for future Chelsio adapters will use a different implementation than the existing NIC TLS support for T6 adapters. Reviewed by: gallatin, hselasky, kib (older version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D31572	2021-09-14 11:43:41 -07:00
Mark Johnston	b1746faad6	debugnet: Include some required headers Don't depend on pollution from net/vnet.h. PR: 258496 MFC after: 1 week Sponsored by: The FreeBSD Foundation	2021-09-14 11:02:45 -04:00
Kristof Provost	b64f7ce98f	pf: qid and pqid can be uint16_t tag2name() returns a uint16_t, so we don't need to use uint32_t for the qid (or pqid). This reduces the size of struct pf_kstate slightly. That in turn buys us space to add extra fields for dummynet later. Happily these fields are not exposed to user space (there are user space versions of them, but they can just stay uint32_t), so there's no ABI breakage in modifying this. MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31873	2021-09-10 17:07:57 +02:00
Mark Johnston	b1e6a792d6	net: Enter a net epoch around protocol if_up/down notifications When traversing a list of interface addresses, we need to be in a net epoch section, and protocol ctlinput routines need a stable reference to the address. Reported by: syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com Reviewed by: kp, melifaro MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D31889	2021-09-10 09:07:40 -04:00
Alexander V. Chernikov	4b631fc832	routing: fix source address selection rules for IPv4 over IPv6. Current logic always selects an IFA of the same family from the outgoing interfaces. In IPv4 over IPv6 setup there can be just single non-127.0.0.1 ifa, attached to the loopback interface. Create a separate rt_getifa_family() to handle entire ifa selection for the IPv4 over IPv6. Differential Revision: https://reviews.freebsd.org/D31868 MFC after: 1 week	2021-09-07 21:41:05 +00:00
Kristof Provost	bb25e36e13	pf: remove unused function prototype MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate")	2021-09-07 16:38:49 +02:00
Kristof Provost	312f5f8a4f	altq: mark callouts as mpsafe There's no reason to acquire the Giant lock while executing the ALTQ callouts. While here also remove a few backwards compatibility defines for long obsolete FreeBSD versions. Reviewed by: mav Suggested by: mav Differential Revision: https://reviews.freebsd.org/D31835	2021-09-04 17:26:10 +02:00
Kristof Provost	4cab80a8df	pf: Add counters for syncookies Count when we send a syncookie, receive a valid syncookie or detect a synflood. Reviewed by: kbowling MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D31713	2021-09-01 12:02:19 +02:00
Alexander V. Chernikov	0a3a377aee	routing: Disallow zero nexthop weights in nexthop groups. Adding such nexthops breaks calc_min_mpath_slots() assumptions, thus resulting in the incorrect nexthop group creation and eventually leading to panic. Reported by: avg MFC after: 1 week	2021-09-01 07:16:24 +00:00
Alexander V. Chernikov	639d7abec6	routing: simplify malloc flags in alloc_nhgrp(). MFC after: 1 week	2021-08-31 08:14:16 +00:00
Alexander V. Chernikov	f84c30106e	routing: Fix newly-added rt_get_inet[6]_parent() api. Correctly handle the case when no default route is present. Reported by: Konrad <konrad.kreciwilk at korbank.pl>	2021-08-30 21:10:37 +00:00
Alexander V. Chernikov	d98954e229	routing: Bring back the ability to specify transmit interface via its name. Some software references outgoing interfaces by specifying name instead of index. Use rti_ifp from rt_addrinfo if provided instead of always using address interface when constructing nexthop. PR: 255678 Reported by: martin.larsson2 at gmail.com MFC after: 1 week	2021-08-29 20:05:14 +00:00
Kristof Provost	2b10cf85f8	pf: Introduce nvlist variant of DIOCGETSTATUS Make it possible to extend the GETSTATUS call (e.g. when we want to add new counters, such as for syncookie support) by introducing an nvlist-based alternative. MFC after: 1 week Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D31694	2021-08-29 14:59:04 +02:00
Luiz Otavio O Souza	eb680a63de	if_bridge: add ALTQ support Similar to the recent addition of ALTQ support to if_vlan. Reviewed by: donner Obtained from: pfsense MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31675	2021-08-26 11:23:44 +02:00
Luiz Otavio O Souza	2e5ff01d0a	if_vlan: add the ALTQ support to if_vlan. Inspired by the iflib implementation, allow ALTQ to be used with if_vlan interfaces. Reviewed by: donner Obtained from: pfsense MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31647	2021-08-25 08:56:45 +02:00
Kristof Provost	159258afb5	altq: Fix panics on rmc_restart() rmc_restart() is called from a timer, but can trigger traffic. This means the curvnet context will not be set. Use the vnet associated with the interface we're currently processing to set it. We also have to enter net_epoch here, for the same reason. Reviewed by: mjg MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31642	2021-08-23 21:35:41 +02:00
Zhenlei Huang	62e1a437f3	routing: Allow using IPv6 next-hops for IPv4 routes (RFC 5549). Implement kernel support for RFC 5549/8950. * Relax control plane restrictions and allow specifying IPv6 gateways for IPv4 routes. This behavior is controlled by the net.route.rib_route_ipv6_nexthop sysctl (on by default). * Always pass final destination in ro->ro_dst in ip_forward(). * Use ro->ro_dst to exract packet family inside if_output() routines. Consistently use RO_GET_FAMILY() macro to handle ro=NULL case. * Pass extracted family to nd6_resolve() to get the LLE with proper encap. It leverages recent lltable changes committed in `c541bd368f`. Presence of the functionality can be checked using ipv4_rfc5549_support feature(3). Example usage: route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0 Differential Revision: https://reviews.freebsd.org/D30398 MFC after: 2 weeks	2021-08-22 22:56:08 +00:00
Vincenzo Maffione	98399ab06f	netmap: import changes from upstream - make sure rings are disabled during resets - introduce netmap_update_hostrings_mode(), with support for multiple host rings - always initialize ni_bufs_head in netmap_if ni_bufs_head was not properly initialized when no external buffers were requestedx and contained the ni_bufs_head from the last request. This was causing spurious buffer frees when alternating between apps that used external buffers and apps that did not use them. - check na validitity under lock on detach - netmap_mem: fix leak on error path - nm_dispatch: fix compilation on Raspberry Pi MFC after: 2 weeks	2021-08-22 09:31:05 +00:00
Alexander V. Chernikov	c541bd368f	lltable: Add support for "child" LLEs holding encap for IPv4oIPv6 entries. Currently we use pre-calculated headers inside LLE entries as prepend data for `if_output` functions. Using these headers allows saving some CPU cycles/memory accesses on the fast path. However, this approach makes adding L2 header for IPv4 traffic with IPv6 nexthops more complex, as it is not possible to store multiple pre-calculated headers inside lle. Additionally, the solution space is limited by the fact that PCB caching saves LLEs in addition to the nexthop. Thus, add support for creating special "child" LLEs for the purpose of holding custom family encaps and store mbufs pending resolution. To simplify handling of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE. Such LLEs are not visible when iterating LLE table. Their lifecycle is bound to the "parent" LLE - it is not possible to delete "child" when parent is alive. Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state machine used by the standard LLEs. nd6_lookup() and nd6_resolve() now accepts an additional argument, family, allowing to return such child LLEs. This change uses `LLE_SF()` macro which packs family and flags in a single int field. This is done to simplify merging back to stable/. Once this code lands, most of the cases will be converted to use a dedicated `family` parameter. Differential Revision: https://reviews.freebsd.org/D31379 MFC after: 2 weeks	2021-08-21 17:34:35 +00:00
Luiz Otavio O Souza	c138424148	lagg: don't update link layer addresses on destroy When the lagg is being destroyed it is not necessary update the lladdr of all the lagg members every time we update the primary interface. Reviewed by: scottl Obtained from: pfSense MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31586	2021-08-19 10:49:32 +02:00
Franco Fichtner	bb250fae9e	gre: simplify RSS ifdefs Use the early break to avoid else definitions. When RSS gains a runtime option previous constructs would duplicate and convolute the existing code. While here init flowid and skip magic numbers and late default assignment. Reviewed by: melifaro, kbowling Obtained from: OPNsense MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D31584	2021-08-18 10:05:29 -07:00
Kristof Provost	a051ca72e2	Introduce m_get3() Introduce m_get3() which is similar to m_get2(), but can allocate up to MJUM16BYTES bytes (m_get2() can only allocate up to MJUMPAGESIZE). This simplifies the bpf improvement in `f13da24715`. Suggested by: glebius Differential Revision: https://reviews.freebsd.org/D31455	2021-08-18 08:48:27 +02:00
Stephan de Wit	66fa12d8fb	iflib: emulate counters in netmap mode When iflib devices are in netmap mode the driver counters are no longer updated making it look from userspace tools that traffic has stopped. Reported by: Franco Fichtner <franco@opnsense.org> Reviewed by: vmaffione, iflib (erj, gallatin) Obtained from: OPNsense MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D31550	2021-08-18 00:17:43 -07:00
Alexander V. Chernikov	36e15b717e	routing: Fix crashes with dpdk_lpm[46] algo. When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know the nexthop of the "parent" prefix to update its internal state. The glue code, which utilises RIB as a backing route store, uses fib[46]_lookup_rt() for the prefix destination after its deletion to fetch the desired nexthop. This approach does not work when deleting less-specific prefixes with most-specific ones are still present. For example, if 10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting 10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search result instead of 10.0.0.0/22. This, in turn, results in the failed datastructure update: part of the deleted /23 prefix will still contain the reference to an old nexthop. This leads to the use-after-free behaviour, ending with the eventual crashes. Fix the logic flaw by properly fetching the prefix "parent" via newly-created rt_get_inet[6]_parent() helpers. Differential Revision: https://reviews.freebsd.org/D31546 PR: 256882,256833 MFC after: 1 week	2021-08-17 20:46:22 +00:00
Mark Johnston	24fe461284	ether: Add a KMSAN check for transmitted frames This helps ensure that outbound packet data is initialized per KMSAN. Sponsored by: The FreeBSD Foundation	2021-08-11 16:33:41 -04:00
Ed Maste	9feff969a0	Remove "All Rights Reserved" from FreeBSD Foundation sys/ copyrights These ones were unambiguous cases where the Foundation was the only listed copyright holder (in the associated license block). Sponsored by: The FreeBSD Foundation	2021-08-08 10:42:24 -04:00
Alexander V. Chernikov	9748eb7427	Simplify nhop operations in ip_output(). Consistently use `nh` instead of always dereferencing ro->ro_nh inside the if block. Always use nexthop mtu, as it provides guarantee that mtu is accurate. Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses of updating ro flags based on different nexthop. Differential Revision: https://reviews.freebsd.org/D31451 Reviewed by: kp MFC after: 2 weeks	2021-08-08 09:19:27 +00:00
Alexander V. Chernikov	0b79b007eb	[lltable] Restructure nd6 code. Factor out lltable locking logic from lltable_try_set_entry_addr() into a separate lltable_acquire_wlock(), so the latter can be used in other parts of the code w/o duplication. Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c and nd6_nbr.c. Move lle creation logic from nd6_resolve_slow() into a separate nd6_get_llentry() to simplify the former. These changes serve as a pre-requisite for implementing RFC8950 (IPv4 prefixes with IPv6 nexthops). Differential Revision: https://reviews.freebsd.org/D31432 MFC after: 2 weeks	2021-08-07 09:59:11 +00:00
Alexander V. Chernikov	f3a3b06121	[lltable] Unify datapath feedback mechamism. Use newly-create llentry_request_feedback(), llentry_mark_used() and llentry_get_hittime() to request datapatch usage check and fetch the results in the same fashion both in IPv4 and IPv6. While here, simplify llentry_provide_feedback() wrapper by eliminating 1 condition check. MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D31390	2021-08-04 22:52:43 +00:00
Alexander V. Chernikov	5b42b494d5	Fix typo in rib_unsibscribe<_locked>(). Submitted by: Zhenlei Huang<zlei.huang at gmail.com> Differential Revision: https://reviews.freebsd.org/D31356	2021-08-01 13:29:52 +00:00
Alexander V. Chernikov	054948bd81	[multipath][nhops] Fix random crashes with high route churn rate. When certain multipath route begins flapping really fast, it may result in creating multiple identical nexthop groups. The code responsible for unlinking unused nexthop groups had an implicit assumption that there could be only one nexthop group for the same combination of nexthops with weights. This assumption resulted in always unlinking the first "identical" group, instead of the desired one. Such action, in turn, produced a used-but-unlinked nhg along with freed-and-linked nhg, ending up in random crashes. Similarly, it is possible that multiple identical nexthops gets created in the case of high route churn, resulting in the same problem when deleting one of such nexthops. Fix by matching the nexthop/nexhop group pointer when deleting the item. Reported by: avg MFC after: 1 week	2021-08-01 10:07:37 +00:00
Kristof Provost	b69019c14c	pf: remove DIOCGETSTATESNV While nvlists are very useful in maximising flexibility for future extensions their performance is simply unacceptably bad for the getstates feature, where we can easily want to export a million states or more. The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any branch, so we can still remove it everywhere. Reviewed by: mjg MFC after: 1 week Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31099	2021-07-30 11:45:28 +02:00
Bryan Drewery	7cbf1de38e	debugnet: Fix false-positive assertions for dp_state debugnet_handle_arp: An assertion is present to ensure the pcb is only modified when the state is DN_STATE_INIT. Because debugnet_arp_gw() is asynchronous it is possible for ARP replies to come in after the gateway address is known and the state already changed. debugnet_handle_ip: Similarly it is possible for packets to come in, from the expected server, during the gateway mac discovery phase. This can happen from testing disconnects / reconnects in quick succession. This later causes some acks to be sent back but hit an assertion because the state is wrong. Reviewed by: cem, debugnet_handle_arp: markj, vangyzen Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D31327	2021-07-28 16:34:14 -07:00
Kristof Provost	01ad0c0079	net: disallow MTU changes on bridge member interfaces if_bridge member interfaces should always have the same MTU as the bridge itself, so disallow MTU changes on interfaces that are part of an if_bridge. Reviewed by: donner Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31304	2021-07-28 22:03:30 +02:00
Kristof Provost	3330649382	if_bridge: allow MTU changes if_bridge used to only allow MTU changes if the new MTU matched that of all member interfaces. This doesn't really make much sense, in that we really shouldn't be allowed to change the MTU of bridge member in the first place. Instead we now change the MTU of all member interfaces. If one fails we revert all interfaces back to the original MTU. We do not address the issue where bridge member interface MTUs can be changed here. Reviewed by: donner Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D31288	2021-07-28 22:01:12 +02:00

... 2 3 4 5 6 ...

5061 Commits