Use unified guidelines for the severity across the routing subsystem.
Update severity for some of the already-used messages to adhere the
guidelines.
Convert rtsock logging to the new FIB_ reporting format.
MFC after: 2 weeks
route.
Reporting logic assumed there is always some nhop change for every
successful modification operation. Explicitly check that the changed
nexthop indeed exists when reporting back to userland.
MFC after: 2 weeks
Reported by: Claudio Jeker <claudio.jeker@klarasystems.com>
Tested by: Claudio Jeker <claudio.jeker@klarasystems.com>
RTM_CHANGE operates on a single component of the multipath route (e.g. on a single nexthop).
Search of this nexthop is peformed by iterating over each component from multipath (nexthop)
group, using check_info_match_nhop. The problem with the current code that it incorrectly
assumes that `check_info_match_nhop()` returns true value on match, while in reality it
returns an error code on failure). Fix this by properly comparing the result with 0.
Additionally, the followup code modified original necthop group instead of a new one.
Fix this by targetting new nexthop group instead.
Reported by: thj
Tested by: Claudio Jeker <claudio.jeker@klarasystems.com>
Differential Revision: https://reviews.freebsd.org/D35526
MFC after: 2 weeks
BPF headers are word-aligned when copied into the store buffer. Ensure
that pad bytes following the preceding packet are cleared.
Reported by: KMSAN
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Some drivers will collect multiple mbuf chains, linked by m_nextpkt,
before passing them to upper layers. debugnet_pkt_in() didn't handle
this and would process only the first packet, typically leading to
retransmits.
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
In lacp_select_tx_port_by_hash(), we assert that the selected port is
DISTRIBUTING. However, the port state is protected by the LACP_LOCK(),
which is not held around lacp_select_tx_port_by_hash(). So this
assertion is racy, and can result in a spurious panic when links
are flapping.
It is certainly possible to fix it by acquiring LACP_LOCK(),
but this seems like an early development assert, and it seems best
to just remove it, rather than add complexity inside an ifdef
INVARIANTS.
Sponsored by: Netflix
Reviewed by: hselasky
Differential Revision: https://reviews.freebsd.org/D35396
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.
Differential revision: https://reviews.freebsd.org/D32356
Sponsored by: NVIDIA Networking
The TLS receive tags are allocated directly from the receiving interface,
because mbufs are flowing in the opposite direction and then route change
checks are not useful, because they only work for outgoing traffic.
Differential revision: https://reviews.freebsd.org/D32356
Sponsored by: NVIDIA Networking
Mbufs leak when manually removing incomplete NDP records with pending packet via ndp -d.
It happens because lltable_drop_entry_queue() rely on `la_numheld`
counter when dropping NDP entries (lles). It turned out NDP code never
increased `la_numheld`, so the actual free never happened.
Fix the issue by introducing unified lltable_append_entry_queue(),
common for both ARP and NDP code, properly addressing packet queue
maintenance.
Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35365
MFC after: 2 weeks
We could insert proxy NDP entries by the ndp command, but the host
with proxy ndp entries had not responded to Neighbor Solicitations.
Change the following points for proxy NDP to work as expected:
* join solicited-node multicast addresses for proxy NDP entries
in order to receive Neighbor Solicitations.
* look up proxy NDP entries not on the routing table but on the
link-level address table when receiving Neighbor Solicitations.
Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35307
MFC after: 2 weeks
In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.
Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35322
MFC after: 2 weeks
Provide sticky ARP flag for network interface which marks it as the
"sticky" one similarly to what we have for bridges. Once interface is
marked sticky, any address resolved using the ARP will be saved as a
static one in the ARP table. Such functionality may be used to prevent
ARP spoofing or to decrease latencies in Ethernet networks.
The drawbacks include potential limitations in usage of ARP-based
load-balancers and high-availability solutions such as carp(4).
The implemented option is disabled by default, therefore should not
impact the default behaviour of the networking stack.
Sponsored by: Conclusive Engineering sp. z o.o.
Reviewed By: melifaro, pauamma_gundo.com
Differential Revision: https://reviews.freebsd.org/D35314
MFC after: 2 weeks
We may call debugnet_free() before g_debugnet_pcb_inuse is true,
specifically in the cases where the interface is down or does not
support debugnet. pcb->dp_drv_input is used to hold the real driver
if_input callback while debugnet is in use, so we can check the status
of this field in the assertion.
This can be triggered trivially by trying to configure netdump on an
unsupported interface at the ddb prompt.
Initializing the dp_drv_input field to NULL explicitly is not necessary
but helps display the intent.
PR: 263929
Reported by: Martin Filla <freebsd@sysctl.cz>
Reviewed by: cem, markj
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D35179
struct sockaddr is not sufficient for buffer that can hold any
sockaddr_* structure. struct sockaddr_storage should be used.
Test:
ifconfig epair create
ifconfig epair0a inet6 add 2001:db8::1 up
ndp -s 2001:db8::2 02:86:98:2e:96:0b proxy # this triggers kernel stack overflow
Reviewed by: markj, kp
Differential Revision: https://reviews.freebsd.org/D35188
If 'options RSS' is set we bind the epair tasks to different CPUs. We
must take care to not keep the current thread bound to the last CPU when
we return to userspace.
MFC after: 1 week
Sponsored by: Orange Business Services
When we destroy an interface while the jail containing it is being
destroyed we risk seeing a race between if_vmove() and the destruction
code, which results in us trying to move a destroyed interface.
Protect against this by using the ifnet_detach_sxlock to also covert
if_vmove() (and not just detach).
PR: 262829
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D34704
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33266
Fun note: git show e6abef0918
(cherry picked from commit e1882428dc)
Now that ifindex is static to if.c we can unvirtualize it. For lifetime
of an ifnet its index never changes. To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet. Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.
API wise the only change is that now minimum interface index can be
greater than 1. The holes in interface indexes were always allowed.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33672
(cherry picked from commit 91f44749c6)
This reverts commit 91f44749c6.
Devirtualization of V_if_index and V_ifindex_table was rushed into
the tree lacking proper context, discussion, and declaration of intent,
so I'm backing it out as harmful to VNET on the following grounds:
1) The change repurposed the decades-old and stable if_index KBI for
new, unclear goals which were omitted from the commit note.
2) The change opened up a new resource exhaustion vector where any vnet
could starve the system of ifnet indices, including vnet0.
3) To circumvent the newly introduced problem of separating ifnets
belonging to different vnets from the globalized ifindex_table, the
author introduced sysctl_ifcount() which does a linear traversal over
the (potentially huge) global ifnet list just to return a simple upper
bound on existing ifnet indices.
4) The change effectively led to nonuniform ifnet index allocation
among vnets.
5) The commit note clearly stated that the patch changed the implicit
if_index ABI contract where ifnet indices were assumed to be starting
from one. The commit note also included a correct observation that
holes in interface indices were always allowed, but failed to declare
that the userland-observable ifindex tables could now include huge
empty spans even under modest operating conditions.
6) The author had an earlier proposal in the works which did not
affect per-vnet ifnet lists (D33265) but which he abandoned without
providing the rationale behind his decision to do so, at the expense
of sacrificing the vnet isolation contract and if_index ABI / KBI.
Furthermore, the author agreed to back out his changes himself and
to follow up with a proposal for a less intrusive alternative, but
later silently declined to act. Therefore, I decided to resolve the
status-quo by backing this out myself. This in no way precludes a
future proposal aiming to mitigate ifnet-removal related system
crashes or panics to be accepted, provided it would not unnecessarily
compromise the goal of as strict as possible isolation between vnets.
Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
This reverts commit 703e533da5.
Revert "ifnet/mbuf: provide KPI to serialize/restore m->m_pkthdr.rcvif"
This reverts commit e1882428dc.
Obtained from: github.com/glebius/FreeBSD/commits/backout-ifindex
Panasas was seeing a higher-than-expected number of link-flap events.
After joint debugging with the switch vendor, we determined there were
problems on both sides; either of which might cause the occasional
event, but together caused lots of them.
On the switch side, an internal queuing issue was causing LACP PDUs --
which should be sent every second, in short-timeout mode -- to sometimes
be sent slightly later than they should have been. In some cases, two
successive PDUs were late, but we never saw three late PDUs in a row.
On the FreeBSD side, we saw a link-flap event every time there were two
late PDUs, while the spec says that it takes *three* seconds of downtime
to trigger that event. It turns out that if a PDU was received shortly
before the timer code was run, it would decrement less than a full
second after the PDU arrived. Then two delayed PDUs would cause two
additional decrements, causing it to reach zero less than three seconds
after the most-recent on-time PDU.
The solution is to note the time a PDU arrives, and only decrement if at
least a full second has elapsed since then.
Reported by: Greg Foster <gfoster@panasas.com>
Reviewed by: gallatin
Tested by: Greg Foster <gfoster@panasas.com>
MFC after: 3 days
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D35070
Similar to ipfw rule timestamps, these timestamps internally are
uint32_t snaps of the system time in seconds. The timestamp is CPU local
and updated each time a rule or a state associated with a rule or state
is matched.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34970
Allow tables to be used for the l3 source/destination matching.
This requires taking the PF_RULES read lock.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34917
Allow udp tunnel functions to indicate they have not taken ownership of
the packet, and that normal UDP processing should continue.
This is especially useful for scenarios where the kernel has taken
ownership of a socket that was originally created by userspace. It
allows the tunnel function to pass through certain packets for userspace
processing.
The primary user of this is if_ovpn, when it receives messages from
unknown peers (which might be a new client).
Reviewed by: tuexen
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34883
Also convert raw epoch_call() calls to lltable_free_entry() calls, no
functional change intended. There's no need to asynchronously free the
LLEs in that case to begin with, but we might as well use the lltable
interfaces consistently.
Noticed by code inspection; I believe lltable_calc_llheader() failures
do not generally happen in practice.
Reviewed by: bz
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34832
Possibly one could assert that ret should always be 0 here (that is,
that there was always an index found in the bitmask). That should be
true since a bitmask index is allocated before the nhgrp is inserted
in the ctl->gr_head list in link_nhgrp.
with md5 sum used as key.
This gets rid of the quadratic rule traversal when "keep_counters" is
set.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Makes it cheaper to compare rules when "keep_counters" is set.
This also sets up keeping them in a RB tree.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
For now only protects rule creation/destruction, but will allow
gradually reducing the scope of rules lock when changing the
rules.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
66acf7685b failed to build on riscv (and mips). This is because the
atomic_testandset_int() (and friends) functions do not exist there.
Happily those platforms do have the long variant, so switch to that.
PR: 262571
MFC after: 3 days
As an unwanted side effect of the performance improvements in
24f0bfbad5, epair interfaces stop forwarding traffic on higher
load levels when running on multi-core systems.
This happens due to a race condition in the logic that decides when to
place work in the task queue(s) responsible for processing the content
of ring buffers.
In order to fix this, a field named state is added to the epair_queue
structure. This field is used by the affected functions to signal each
other that something happened in the underlying ring buffers that might
require work to be scheduled in task queue(s), replacing the existing
logic, which relied on checking if ring buffers are empty or not.
epair_menq() does:
- set BIT_MBUF_QUEUED
- queue mbuf
- if testandset BIT_QUEUE_TASK:
enqueue task
epair_tx_start_deferred() does:
- swap ring buffers
- process mbufs
- clear BIT_QUEUE_TASK
- if testandclear BIT_MBUF_QUEUED
enqueue task
PR: 262571
Reported by: Johan Hendriks <joh.hendriks@gmail.com>
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D34569
Allow filtering based on the source or destination IP/IPv6 address in
the Ethernet layer rules.
Reviewed by: pauamma_gundo.com (man), debdrup (man)
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D34482
Symptom: when a single extmem memory region is provided to netmap
multiple times, for multiple interfaces, the memory region is
never released by netmap once all the existing file descriptors
are closed.
Fix the relevant condition in netmap_mem_drop(): release the memory
when the last user of netmap_adapter is gone, rather then when
the last user of netmap_mem_d is gone.
MFC after: 2 weeks
When filtering Ethernet packets allow rules to specify a mac address
with a mask. This indicates which bits of the specified address are
significant. This allows users to do things like filter based on device
manufacturer.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Allow packets to be tagged with dummynet information. Note that we do
not apply dummynet shaping on the L2 traffic, but instead mark it for
dummynet processing in the L3 code. This is the same approach as we take
for ALTQ.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32222
Avoid the overhead of acquiring a (read) RULES lock when processing the
Ethernet rules.
We can get away with that because when rules are modified they're staged
in V_pf_keth_inactive. We take care to ensure the swap to V_pf_keth is
atomic, so that pf_test_eth_rule() always sees either the old rules, or
the new ruleset.
We need to take care not to delete the old ruleset until we're sure no
pf_test_eth_rule() is still running with those. We accomplish that by
using NET_EPOCH_CALL() to actually free the old rules.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31739
This is the kernel side of stateless Ethernel level filtering for pf.
The primary use case for this is to enable captive portal functionality
to allow/deny access by MAC address, rather than per IP address.
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31737
if_bridge duplicates broadcast packets with m_copypacket(), which
creates shared packets. In certain circumstances these packets can be
processed by udp_usrreq.c:udp_input() first, which modifies the mbuf as
part of the checksum verification. That may lead to incorrect packets
being transmitted.
Use m_dup() to create independent mbufs instead.
Reported by: Richard Russo <toast@ruka.org>
Reviewed by: donner, afedorov
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D34319
Allow multiple cores to be used to process if_epair traffic. We do this
(if RSS is enabled) based on the RSS hash of the incoming packet. This
allows us to distribute the load over multiple cores, rather than
sending everything to the same one.
We also switch from swi_sched() to taskqueues, which also contributes to
better throughput.
Benchmark results:
With net.isr.maxthreads=-1
Setup A: (cc0 - bridge0 - epair0a) (epair0b - bridge1 - cc1)
Before 627 Kpps
After (no RSS) 1.198 Mpps
After (RSS) 3.148 Mpps
Setup B: (cc0 - bridge0 - epaira0) (epair0b - vnet jail - epair1a) (epair1b - bridge1 - cc1)
Before 7.705 Kpps
After (no RSS) 1.017 Mpps
After (RSS) 2.083 Mpps
MFC after: 3 weeks
Sponsored by: Orange Business Services
Differential Revision: https://reviews.freebsd.org/D33731
6d4baa0d01 incorrectly rounded the lenght of the pflog header up to 8
bytes, rather than 4.
PR: 261566
Reported by: Guy Harris <gharris@sonic.net>
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
There are some error paths in ioctl handlers that will call
pf_krule_free() before the rule's rpool.mtx field is initialized,
causing a panic with INVARIANTS enabled.
Fix the problem by introducing pf_krule_alloc() and initializing the
mutex there. This does mean that the rule->krule and pool->kpool
conversion functions need to stop zeroing the input structure, but I
don't see a nicer way to handle this except perhaps by guarding the
mtx_destroy() with a mtx_initialized() check.
Constify some related functions while here and add a regression test
based on a syzkaller reproducer.
Reported by: syzbot+77cd12872691d219c158@syzkaller.appspotmail.com
Reviewed by: kp
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34115
netisr uses global workstreams and after dequeueing an mbuf it
uses rcvif to get the VNET of the mbuf. Of course, this is not
needed when kernel is compiled without VIMAGE. It came out that
routing socket does not set rcvif if compiled without VIMAGE.
Make this assignment not depending on VIMAGE option.
Fixes: 6871de9363
Supplement ifindex table with generation count and use it to
serialize & restore an ifnet pointer.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33266
Fun note: git show e6abef0918
Now that ifindex is static to if.c we can unvirtualize it. For lifetime
of an ifnet its index never changes. To avoid leaking foreign interfaces
the net.link.generic.system.ifcount sysctl and the ifnet_byindex() KPI
filter their returned value on curvnet. Since if_vmove() no longer
changes the if_index, inline ifindex_alloc() and ifindex_free() into
if_alloc() and if_free() respectively.
API wise the only change is that now minimum interface index can be
greater than 1. The holes in interface indexes were always allowed.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33672
Although send tags are strictly used for transmit, the name might be changed
in the future to be more generic.
The TLS receive tags support regular IPv4 and IPv6 traffic, and also over any
VLAN. If prio-tagging is enabled, VLAN ID zero, this must be checked in the
network driver itself when creating the TLS RX decryption offload filter.
TLS receive tags have a modify callback to tell the network driver about
the progress of decryption. Currently decryption is done IP packet by IP
packet, even if the IP packet contains a partial TLS record. The modify
callback allows the network driver to keep track of TCP sequence numbers
pointing to the beginning of TLS records after TCP packet reassembly.
These callbacks only happen when encrypted or partially decrypted data is
received and are used to verify the decryptions starting point for the
hardware. Typically the hardware will guess where TLS headers start and
needs help from the software to know if the guess was correct. This is
the purpose of the modify callback.
Differential Revision: https://reviews.freebsd.org/D32356
Discussed with: jhb@
MFC after: 1 week
Sponsored by: NVIDIA Networking
Try to live with cruel reality fact - if_vmove doesn't move an
interface from previous vnet cloning infrastructure to the new
one. Let's admit this as design feature and make it work better.
* Delete two blocks of code that would fallback to vnet0, if a
cloner isn't found. They didn't do any good job and also whole
idea of treating vnet0 as special one is wrong.
* When deleting a cloned interface, lookup its cloner using it's
home vnet.
With this change simple sequence works correctly:
ifconfig foo0 create
jail -c name=jj persist vnet vnet.interface=foo0
jexec jj ifconfig foo0 destroy
Differential revision: https://reviews.freebsd.org/D33942
* Do a single call into if_clone.c instead of two. The cloner
can't disappear since the interface sits on its list.
* Make restoration smarter - check that cloner with same name
exists in the new vnet.
Differential revision: https://reviews.freebsd.org/D33941
Adds a new function pointer to struct if_txrx in order to allow
drivers to set their own function that will determine which queue
a packet should be sent on.
Since this includes a kernel ABI change, bump the __FreeBSD_version
as well.
(This motivation behind this is to allow the driver to examine the
UP in the VLAN tag and determine which queue to TX on based on
that, in support of HW TX traffic shaping.)
Signed-off-by: Eric Joyner <erj@FreeBSD.org>
Reviewed by: kbowling@, stallamr@netapp.com
Tested by: jeffrey.e.pieper@intel.com
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D31485
In iflib_device_register(), the CTX_LOCK is acquired first and then
IFNET_WLOCK is acquired by ether_ifattach(). However, in netmap_hw_reg()
we do the opposite: IFNET_RLOCK is acquired first, and then CTX_LOCK
is acquired by iflib_netmap_register(). Fix this LOR issue by wrapping
the CTX_LOCK/UNLOCK calls in iflib_device_register with an additional
IFNET_WLOCK. This is safe since the IFNET_WLOCK is recursive.
MFC after: 1 month
The roundrobin pool stores its state in the rule, which could
potentially lead to invalid addresses being returned.
For example, thread A just executed PF_AINC(&rpool->counter) and
immediately afterwards thread B executes PF_ACPY(naddr, &rpool->counter)
(i.e. after the pf_match_addr() check of rpool->counter).
Lock the rpool with its own mutex to prevent these races. The
performance impact of this is expected to be low, as each rule has its
own lock, and the lock is also only relevant when state is being created
(so only for the initial packets of a connection, not for all traffic).
See also: https://redmine.pfsense.org/issues/12660
Reviewed by: glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33874
I've noticed relations between iflib_timer() vs ixl_admin_timer().
Both scheduled at the same 2Hz rate, but the second is rescheduling
the first each time, so if the first get any slower, it won't be
executed at all. Revert this until deeper investigation.
This reverts commit 90bc1cf657.
In 4f6c66cc9c, ifa_ifwithnet() was changed to no longer
ifa_ref() the returned ifaddr, and instead the caller was required
to stay in the net_epoch for as long as they wanted the ifaddr
to remain valid. However, this missed the case where an AF_LINK
lookup would call ifaddr_byindex(), which still does ifa_ref()
the ifaddr. This would cause a refcount leak.
Fix this by inlining the relevant parts of ifaddr_byindex() here,
with the ifa_ref() call removed. This also avoids an unnecessary
entry and exit from the net_epoch for this case.
I've audited all in-tree consumers of ifa_ifwithnet() that could
possibly perform an AF_LINK lookup and confirmed that none of them
will expect the ifaddr to have a reference that they need to
release.
MFC after: 2 months
Sponsored by: Dell Inc
Differential Revision: https://reviews.freebsd.org/D28705
Reviewed by: melifaro
Now that each module handles its global and VNET initialization
itself, there is no VNET related stuff left to do in domain_init().
Differential revision: https://reviews.freebsd.org/D33541
The historical BSD network stack loop that rolls over domains and
over protocols has no advantages over more modern SYSINIT(9).
While doing the sweep, split global and per-VNET initializers.
Getting rid of pr_init allows to achieve several things:
o Get rid of ifdef's that protect against double foo_init() when
both INET and INET6 are compiled in.
o Isolate initializers statically to the module they init.
o Makes code easier to understand and maintain.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D33537
The introduction of <sched.h> improved compatibility with some 3rd
party software, but caused the configure scripts of some ports to
assume that they were run in a GLIBC compatible environment.
Parts of sched.h were made conditional on -D_WITH_CPU_SET_T being
added to ports, but there still were compatibility issues due to
invalid assumptions made in autoconfigure scripts.
The differences between the FreeBSD version of macros like CPU_AND,
CPU_OR, etc. and the GLIBC versions was in the number of arguments:
FreeBSD used a 2-address scheme (one source argument is also used as
the destination of the operation), while GLIBC uses a 3-adderess
scheme (2 source operands and a separately passed destination).
The GLIBC scheme provides a super-set of the functionality of the
FreeBSD macros, since it does not prevent passing the same variable
as source and destination arguments. In code that wanted to preserve
both source arguments, the FreeBSD macros required a temporary copy of
one of the source arguments.
This patch set allows to unconditionally provide functions and macros
expected by 3rd party software written for GLIBC based systems, but
breaks builds of externally maintained sources that use any of the
following macros: CPU_AND, CPU_ANDNOT, CPU_OR, CPU_XOR.
One contributed driver (contrib/ofed/libmlx5) has been patched to
support both the old and the new CPU_OR signatures. If this commit
is merged to -STABLE, the version test will have to be extended to
cover more ranges.
Ports that have added -D_WITH_CPU_SET_T to build on -CURRENT do
no longer require that option.
The FreeBSD version has been bumped to 1400046 to reflect this
incompatible change.
Reviewed by: kib
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D33451
With IPv4 over IPv6 nexthops and IP->MPLS support, there is a need
to distingush "upper" e.g. traffic family and "neighbor" e.g. LLE/gateway
address family. Store them explicitly in the private part of the nexthop data.
While here, store nhop fibnum in nhop_prip datastructure to make it self-contained.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D33663
Introduce a new function, lltable_get(), to retrieve lltable pointer
for the specified interface and family.
Use it to avoid all-iftable list traversal when adding or deleting
ARP/ND records.
Differential Revision: https://reviews.freebsd.org/D33660
MFC after: 2 weeks
On SIOCSIFCAP, some bits in ifp->if_capenable may be toggled.
When this happens, apply the same change to isc_capenable, which
is the iflib private copy of if_capenable (for a subset of the
IFCAP_* bits). In this way the iflib drivers can check the bits
using isc_capenable rather than if_capenable. This is convenient
because the latter access requires an additional indirection
through the ifp, and it is also less likely to be in cache.
PR: 260068
Reviewed by: kbowling, gallatin
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D33156
Ensure that the pfvar.h header can be included without including any
other headers.
Reviewed by: imp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33499
Ensure that the if_stf.h header can be included without including any
other headers.
Reviewed by: imp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33498
Create a wrapper for newbus to take giant and for busses to take it too.
bus_topo_lock() should be called before interacting with newbus routines
and unlocked with bus_topo_unlock(). If you need the topology lock for
some reason, bus_topo_mtx() will provide that.
Sponsored by: Netflix
Reviewed by: mav
Differential Revision: https://reviews.freebsd.org/D31831
This requires moving net.link.generic sysctl declaration from if_mib.c
to if.c. Ideally if_mib.c needs just to be merged to if.c, but they
have different license texts.
Differential revision: https://reviews.freebsd.org/D33263
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them
in epoch. Most of the code touched remains unsafe, as the returned
pointer is being used after epoch exit. Mark that with a comment.
Validate the index argument inside the function, reducing argument
validation requirement from the callers and making V_if_index
private to if.c.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D33263
Now it is possible to just merge all this complexity into single
linear function. Note that IFNET_WLOCK() is a sleepable lock, so
we can M_WAITOK and epoch_wait_preempt().
Reviewed by: melifaro, bz, kp
Differential revision: https://reviews.freebsd.org/D33262
So let's just call malloc() directly. This also avoids hidden
doubling of default V_if_indexlim.
Reviewed by: melifaro, bz, kp
Differential revision: https://reviews.freebsd.org/D33261
Now that if_alloc_domain() never fails and actually doesn't
expose ifnet to outside we can eliminate IFNET_HOLD and two
step index allocation.
Reviewed by: kp
Differential revision: https://reviews.freebsd.org/D33259
Yet another problem created by VIMAGE/if_vmove/epair design that
relocates ifnet between vnets and changes if_index. Since if_index
changes, nhop hash values also changes, unlink_nhop() isn't able to
find entry in hash and leaks the nhop. Since nhop references ifnet,
the latter is also leaked. As result running network tests leaks
memory on every single test that creates vnet jail.
While here, rewrite whole hash_priv() to use static initializer,
per Alexander's suggestion.
Reviewed by: melifaro
There were two issues with the new pflog packet length.
The first is that the length is expected to be a multiple of
sizeof(long), but we'd assumed it had to be a multiple of
sizeof(uint32_t).
The second is that there's some broken software out there (such as
Wireshark) that makes incorrect assumptions about the amount of padding.
That is, Wireshark assumes there's always three bytes of padding, rather
than however much is needed to get to a multiple of sizeof(long).
Fix this by adding extra padding, and a fake field to maintain
Wireshark's assumption.
Reported by: Ozkan KIRIK <ozkan.kirik@gmail.com>
Tested by: Ozkan KIRIK <ozkan.kirik@gmail.com>
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33236
This reverts commit 266f97b5e9, reversing
changes made to a10253cffe.
A mismerge of a merge to catch up to main resulted in files being
committed which should not have been.
With upcoming changes to the inpcb synchronisation it is going to be
broken. Even its current status after the move of PCB synchronization
to the network epoch is very questionable.
This experimental feature was sponsored by Juniper but ended never to
be used in Juniper and doesn't exist in their source tree [sjg@, stevek@,
jtl@]. In the past (AFAIK, pre-epoch times) it was tried out at Netflix
[gallatin@, rrs@] with no positive result and at Yandex [ae@, melifaro@].
I'm up to resurrecting it back if there is any interest from anybody.
Reviewed by: rrs
Differential revision: https://reviews.freebsd.org/D33020
In in_stf_input() we grabbed a pointer to the IPv4 header and later did
an m_pullup() before we look at the IPv6 header. However, m_pullup()
could rearrange the mbuf chain and potentially invalidate the pointer to
the IPv4 header.
Avoid this issue by copying the IP header rather than getting a pointer
to it.
Reported by: markj, Jenkins (KASAN job)
Reviewed by: markj
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33192
The last consumer of if_com_alloc() is firewire. It never fails
to allocate. Most likely the if_com_alloc() KPI will go away
together with if_fwip(), less likely new consumers of if_com_alloc()
will be added, but they would need to follow the no fail KPI.
With this change if_index can become static. There is nothing
that if_debug.c would want to isolate from if.c. Potentially
if.c wants to share everything with if_debug.c.
Move Bjoern's copyright to if.c.
Reviewed by: bz
iflib_stop modifies iflib data structures that are used by _task_fn_rx,
most prominently the free lists. So, iflib_stop has to ensure that the
rx task threads are not active.
This should help to fix a crash seen when iflib_if_ioctl (e.g.,
SIOCSIFCAP) is called while there is already traffic flowing.
The crash has been seen on VMWare guests with vmxnet3 driver.
My guess is that on physical hardware the couple of 1ms delays that
iflib_stop has after disabling interrupts are enough for the queued work
to be completed before any iflib state is touched.
But on busy hypervisors the guests might not get enough CPU time to
complete the work, thus there can be a race between the taskqueue
threads and the work done to handle an ioctl, specifically in iflib_stop
and iflib_init_locked.
PR: 259458
Reviewed by: markj
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D32926
DIOCKEEPCOUNTERS used to overlap with DIOCGIFSPEEDV0, which has been
fixed in 14, but remains in stable/12 and stable/13.
Support the old, overlapping, call under COMPAT_FREEBSD13.
Reviewed by: jhb
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D33001
We accidentally had two ioctls use the same base number
(DIOCKEEPCOUNTERS and DIOCGIFSPEEDV{0,1}). We get away with that on most
platforms because the size of the argument structures is different.
This does break CHERI, and is generally a bad idea anyway.
Renumber to avoid this collision.
Reported by: jhb
The cloner must be per-vnet so that cloned interfaces get destroyed when
the vnet goes away. Otherwise we fail assertions in vnet_if_uninit():
panic: vnet_if_uninit:475 tailq &V_ifnet=0xfffffe01665fe070 not empty
cpuid = 19
time = 1636107064
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015d0cac60
vpanic() at vpanic+0x187/frame 0xfffffe015d0cacc0
panic() at panic+0x43/frame 0xfffffe015d0cad20
vnet_if_uninit() at vnet_if_uninit+0x7b/frame 0xfffffe015d0cad30
vnet_destroy() at vnet_destroy+0x170/frame 0xfffffe015d0cad60
prison_deref() at prison_deref+0x9b0/frame 0xfffffe015d0cadd0
sys_jail_remove() at sys_jail_remove+0x119/frame 0xfffffe015d0cae00
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe015d0caf30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015d0caf30
--- syscall (508, FreeBSD ELF64, sys_jail_remove), rip = 0x8011e920a, rsp = 0x7fffffffe788, rbp = 0x7fffffffe810 ---
KDB: enter: panic
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32849
As stated in style(9): "Values in return statements should be enclosed
in parentheses."
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32848
If an if_gif exists and has an address assigned inside a vnet when the
vnet is shut down we failed to clean up the address, leading to a panic
when we ip_destroy() and the V_in_ifaddrhashtbl is not empty.
This happens because of the VNET_SYS(UN)INIT order, which means we
destroy the if_gif interface before the addresses can be purged (and
if_detach() does not remove addresses, it assumes this will be done by
the stack teardown code).
Set subsystem SI_SUB_PSEUDO just like if_bridge so the cleanup
operations happen in the correct order.
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32835
Some NICs might have limited capabilities when Jumbo frames are used.
For exampe some neta interfaces only support TX csum offload when the
packet size is lower than a value specified in DT.
Fix it by re-reading capabilities of children interfaces after MTU
has been successfully changed.
Found by: Jerome Tomczyk <jerome.tomczyk@stormshield.eu>
Reviewed by: jhb
Obtained from: Semihalf
Sponsored by: Stormshield
Differential revision: https://reviews.freebsd.org/D32724
Allow users to set a number on rules which will be exposed as part of
the pflog header.
The intent behind this is to allow users to correlate rules across
updates (remember that pf rules continue to exist and match existing
states, even if they're removed from the active ruleset) and pflog.
Obtained from: pfSense
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32750
Rework if_epair(4) to no longer use netisr and dpcpu.
Instead use mbufq and swi_net.
This simplifies the code and seems to make it work better and
no longer hang.
Work largely by bz@, with minor tweaks by kp@.
Reviewed by: bz, kp
MFC after: 3 weeks
Differential Revision: https://reviews.freebsd.org/D31077
Remove all (non-persistent) tags when we transmit a packet. Real network
interfaces do not carry any tags either, and leaving tags attached can
produce unexpected results.
Reviewed by: bz, glebius
MFC after: 3 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D32663
A BPF descriptor only has an associated interface descriptor once it is
attached to an interface, e.g., with BIOCSETIF. Avoid dereferencing a
NULL pointer in filt_bpfwrite() if the BPF descriptor is not attached.
Reviewed by: ae
Reported by: syzbot+ae45d5166afe15a5a21d@syzkaller.appspotmail.com
Fixes: ded77e0237 ("Allow the BPF to be select for write.")
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D32561
The modification to the hash are already naturally locked by
in_control_sx. Convert the hash lists to CK lists. Remove the
in_ifaddr_rmlock. Assert the network epoch where necessary.
Most cases when the hash lookup is done the epoch is already entered.
Cover a few cases, that need entering the epoch, which mostly is
initial configuration of tunnel interfaces and multicast addresses.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D32584
The last two drivers that required sppp are cp(4) and ce(4).
These devices are still produced and can be purchased
at Cronyx <http://cronyx.ru/hardware/wan.html>.
Since Roman Kurakin <rik@FreeBSD.org> has quit them, they no
longer support FreeBSD officially. Later they have dropped
support for Linux drivers to. As of mid-2020 they don't even
have a developer to maintain their Windows driver. However,
their support verbally told me that they could provide aid to
a FreeBSD developer with documentaion in case if there appears
a new customer for their devices.
These drivers have a feature to not use sppp(4) and create an
interface, but instead expose the device as netgraph(4) node.
Then, you can attach ng_ppp(4) with help of ports/net/mpd5 on
top of the node and get your synchronous PPP. Alternatively
you can attach ng_frame_relay(4) or ng_cisco(4) for HDLC.
Actually, last time I used cp(4) back in 2004, using netgraph(4)
instead of sppp(4) was already the right way to do.
Thus, remove the sppp(4) related part of the drivers and enable
by default the negraph(4) part. Further maintenance of these
drivers in the tree shouldn't be a big deal.
While doing that, remove some cruft and enable cp(4) compilation
on amd64. The ce(4) for some unknown reason marks its internal
DDK functions with __attribute__ fastcall, which most likely is
safe to remove, but without hardware I'm not going to do that, so
ce(4) remains i386-only.
Reviewed by: emaste, imp, donner
Differential Revision: https://reviews.freebsd.org/D32590
See also: https://reviews.freebsd.org/D23928
An IPv4 address is embedded into an ifaddr which is freed
via epoch. And the in_ifaddrhead is already a CK list. Use
the network epoch to protect against use after free.
Next step would be to CK-ify the in_addr hash and get rid of the...
Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D32434
The 'match' field is only used in the userspace version of the struct
(pf_anchor).
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Use atomic counters to ensure that we correctly track the number of half
open states and syncookie responses in-flight.
This determines if we activate or deactivate syncookies in adaptive
mode.
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D32134
Allow pf to use dummynet pipes and queues.
We re-use the currently unused IPFW_IS_DUMMYNET flag to allow dummynet
to tell us that a packet is being re-injected after being delayed. This
is needed to avoid endlessly looping the packet between pf and dummynet.
MFC after: 2 weeks
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31904
The error returned when a marker message can not be emitted on a port is not handled.
This cause the lacp to block all emissions until the timeout of 3 seconds is reached.
To fix this issue, I just clear the LACP_PORT_MARK flag when the packet could not be emitted.
Differential revision: https://reviews.freebsd.org/D30467
Obtained from: Stormshield
Move the type and function pointers for operations on existing send
tags (modify, query, next, free) out of 'struct ifnet' and into a new
'struct if_snd_tag_sw'. A pointer to this structure is added to the
generic part of send tags and is initialized by m_snd_tag_init()
(which now accepts a switch structure as a new argument in place of
the type).
Previously, device driver ifnet methods switched on the type to call
type-specific functions. Now, those type-specific functions are saved
in the switch structure and invoked directly. In addition, this more
gracefully permits multiple implementations of the same tag within a
driver. In particular, NIC TLS for future Chelsio adapters will use a
different implementation than the existing NIC TLS support for T6
adapters.
Reviewed by: gallatin, hselasky, kib (older version)
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D31572
tag2name() returns a uint16_t, so we don't need to use uint32_t for the
qid (or pqid). This reduces the size of struct pf_kstate slightly. That
in turn buys us space to add extra fields for dummynet later.
Happily these fields are not exposed to user space (there are user space
versions of them, but they can just stay uint32_t), so there's no ABI
breakage in modifying this.
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31873
When traversing a list of interface addresses, we need to be in a net
epoch section, and protocol ctlinput routines need a stable reference to
the address.
Reported by: syzbot+3219af764ead146a3a4e@syzkaller.appspotmail.com
Reviewed by: kp, melifaro
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D31889
Current logic always selects an IFA of the same family from the
outgoing interfaces. In IPv4 over IPv6 setup there can be just
single non-127.0.0.1 ifa, attached to the loopback interface.
Create a separate rt_getifa_family() to handle entire ifa selection
for the IPv4 over IPv6.
Differential Revision: https://reviews.freebsd.org/D31868
MFC after: 1 week
There's no reason to acquire the Giant lock while executing the ALTQ
callouts.
While here also remove a few backwards compatibility defines for long
obsolete FreeBSD versions.
Reviewed by: mav
Suggested by: mav
Differential Revision: https://reviews.freebsd.org/D31835
Count when we send a syncookie, receive a valid syncookie or detect a
synflood.
Reviewed by: kbowling
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31713
Adding such nexthops breaks calc_min_mpath_slots() assumptions,
thus resulting in the incorrect nexthop group creation and
eventually leading to panic.
Reported by: avg
MFC after: 1 week
Some software references outgoing interfaces by specifying name instead of
index.
Use rti_ifp from rt_addrinfo if provided instead of always using
address interface when constructing nexthop.
PR: 255678
Reported by: martin.larsson2 at gmail.com
MFC after: 1 week
Make it possible to extend the GETSTATUS call (e.g. when we want to add
new counters, such as for syncookie support) by introducing an
nvlist-based alternative.
MFC after: 1 week
Sponsored by: Modirum MDPay
Differential Revision: https://reviews.freebsd.org/D31694
Similar to the recent addition of ALTQ support to if_vlan.
Reviewed by: donner
Obtained from: pfsense
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31675
rmc_restart() is called from a timer, but can trigger traffic. This
means the curvnet context will not be set.
Use the vnet associated with the interface we're currently processing to
set it. We also have to enter net_epoch here, for the same reason.
Reviewed by: mjg
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31642
Implement kernel support for RFC 5549/8950.
* Relax control plane restrictions and allow specifying IPv6 gateways
for IPv4 routes. This behavior is controlled by the
net.route.rib_route_ipv6_nexthop sysctl (on by default).
* Always pass final destination in ro->ro_dst in ip_forward().
* Use ro->ro_dst to exract packet family inside if_output() routines.
Consistently use RO_GET_FAMILY() macro to handle ro=NULL case.
* Pass extracted family to nd6_resolve() to get the LLE with proper encap.
It leverages recent lltable changes committed in c541bd368f.
Presence of the functionality can be checked using ipv4_rfc5549_support feature(3).
Example usage:
route add -net 192.0.0.0/24 -inet6 fe80::5054:ff:fe14:e319%vtnet0
Differential Revision: https://reviews.freebsd.org/D30398
MFC after: 2 weeks
- make sure rings are disabled during resets
- introduce netmap_update_hostrings_mode(), with support
for multiple host rings
- always initialize ni_bufs_head in netmap_if
ni_bufs_head was not properly initialized when no external buffers were
requestedx and contained the ni_bufs_head from the last request. This
was causing spurious buffer frees when alternating between apps that
used external buffers and apps that did not use them.
- check na validitity under lock on detach
- netmap_mem: fix leak on error path
- nm_dispatch: fix compilation on Raspberry Pi
MFC after: 2 weeks
Currently we use pre-calculated headers inside LLE entries as prepend data
for `if_output` functions. Using these headers allows saving some
CPU cycles/memory accesses on the fast path.
However, this approach makes adding L2 header for IPv4 traffic with IPv6
nexthops more complex, as it is not possible to store multiple
pre-calculated headers inside lle. Additionally, the solution space is
limited by the fact that PCB caching saves LLEs in addition to the nexthop.
Thus, add support for creating special "child" LLEs for the purpose of holding
custom family encaps and store mbufs pending resolution. To simplify handling
of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
to the "parent" LLE - it is not possible to delete "child" when parent is alive.
Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
machine used by the standard LLEs.
nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
allowing to return such child LLEs. This change uses `LLE_SF()` macro which
packs family and flags in a single int field. This is done to simplify merging
back to stable/. Once this code lands, most of the cases will be converted to
use a dedicated `family` parameter.
Differential Revision: https://reviews.freebsd.org/D31379
MFC after: 2 weeks
When the lagg is being destroyed it is not necessary update the
lladdr of all the lagg members every time we update the primary
interface.
Reviewed by: scottl
Obtained from: pfSense
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31586
Use the early break to avoid else definitions. When RSS gains a
runtime option previous constructs would duplicate and convolute
the existing code.
While here init flowid and skip magic numbers and late default
assignment.
Reviewed by: melifaro, kbowling
Obtained from: OPNsense
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31584
Introduce m_get3() which is similar to m_get2(), but can allocate up to
MJUM16BYTES bytes (m_get2() can only allocate up to MJUMPAGESIZE).
This simplifies the bpf improvement in f13da24715.
Suggested by: glebius
Differential Revision: https://reviews.freebsd.org/D31455
When iflib devices are in netmap mode the driver
counters are no longer updated making it look from
userspace tools that traffic has stopped.
Reported by: Franco Fichtner <franco@opnsense.org>
Reviewed by: vmaffione, iflib (erj, gallatin)
Obtained from: OPNsense
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D31550
When a prefix gets deleted from the RIB, dpdk_lpm algo needs to know
the nexthop of the "parent" prefix to update its internal state.
The glue code, which utilises RIB as a backing route store, uses
fib[46]_lookup_rt() for the prefix destination after its deletion
to fetch the desired nexthop.
This approach does not work when deleting less-specific prefixes
with most-specific ones are still present. For example, if
10.0.0.0/24, 10.0.0.0/23 and 10.0.0.0/22 exist in RIB, deleting
10.0.0.0/23 would result in 10.0.0.0/24 being returned as a search
result instead of 10.0.0.0/22. This, in turn, results in the failed
datastructure update: part of the deleted /23 prefix will still
contain the reference to an old nexthop. This leads to the
use-after-free behaviour, ending with the eventual crashes.
Fix the logic flaw by properly fetching the prefix "parent" via
newly-created rt_get_inet[6]_parent() helpers.
Differential Revision: https://reviews.freebsd.org/D31546
PR: 256882,256833
MFC after: 1 week
These ones were unambiguous cases where the Foundation was the only
listed copyright holder (in the associated license block).
Sponsored by: The FreeBSD Foundation
Consistently use `nh` instead of always dereferencing
ro->ro_nh inside the if block.
Always use nexthop mtu, as it provides guarantee that mtu is accurate.
Pass `nh` pointer to rt_update_ro_flags() to allow upcoming uses
of updating ro flags based on different nexthop.
Differential Revision: https://reviews.freebsd.org/D31451
Reviewed by: kp
MFC after: 2 weeks
Factor out lltable locking logic from lltable_try_set_entry_addr()
into a separate lltable_acquire_wlock(), so the latter can be used
in other parts of the code w/o duplication.
Create nd6_try_set_entry_addr() to avoid code duplication in nd6.c
and nd6_nbr.c.
Move lle creation logic from nd6_resolve_slow() into a separate
nd6_get_llentry() to simplify the former.
These changes serve as a pre-requisite for implementing
RFC8950 (IPv4 prefixes with IPv6 nexthops).
Differential Revision: https://reviews.freebsd.org/D31432
MFC after: 2 weeks
Use newly-create llentry_request_feedback(),
llentry_mark_used() and llentry_get_hittime() to
request datapatch usage check and fetch the results
in the same fashion both in IPv4 and IPv6.
While here, simplify llentry_provide_feedback() wrapper
by eliminating 1 condition check.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
When certain multipath route begins flapping really fast, it may
result in creating multiple identical nexthop groups. The code
responsible for unlinking unused nexthop groups had an implicit
assumption that there could be only one nexthop group for the
same combination of nexthops with weights. This assumption resulted
in always unlinking the first "identical" group, instead of the
desired one. Such action, in turn, produced a used-but-unlinked
nhg along with freed-and-linked nhg, ending up in random crashes.
Similarly, it is possible that multiple identical nexthops gets
created in the case of high route churn, resulting in the same
problem when deleting one of such nexthops.
Fix by matching the nexthop/nexhop group pointer when deleting the item.
Reported by: avg
MFC after: 1 week
While nvlists are very useful in maximising flexibility for future
extensions their performance is simply unacceptably bad for the
getstates feature, where we can easily want to export a million states
or more.
The DIOCGETSTATESNV call has been MFCd, but has not hit a release on any
branch, so we can still remove it everywhere.
Reviewed by: mjg
MFC after: 1 week
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31099
debugnet_handle_arp:
An assertion is present to ensure the pcb is only modified when the state is
DN_STATE_INIT. Because debugnet_arp_gw() is asynchronous it is possible for
ARP replies to come in after the gateway address is known and the state
already changed.
debugnet_handle_ip:
Similarly it is possible for packets to come in, from the expected
server, during the gateway mac discovery phase. This can happen from
testing disconnects / reconnects in quick succession. This later
causes some acks to be sent back but hit an assertion because the
state is wrong.
Reviewed by: cem, debugnet_handle_arp: markj, vangyzen
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D31327
if_bridge member interfaces should always have the same MTU as the
bridge itself, so disallow MTU changes on interfaces that are part of an
if_bridge.
Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31304
if_bridge used to only allow MTU changes if the new MTU matched that of
all member interfaces. This doesn't really make much sense, in that we
really shouldn't be allowed to change the MTU of bridge member in the
first place.
Instead we now change the MTU of all member interfaces. If one fails we
revert all interfaces back to the original MTU.
We do not address the issue where bridge member interface MTUs can be
changed here.
Reviewed by: donner
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D31288