Most of them can be declared as static after the move out of in6_proto.c.
Keeping sysctl(9) declarations with their text descriptions next to the
variable declaration create self-documenting code. There should be no
functional changes.
Differential Revision: https://reviews.freebsd.org/D44481
Don't report a BACKUP CARP address as local. These two functions are used
only by source address validation for input packets, controlled by sysctls
net.inet.ip.source_address_validation and
net.inet6.ip6.source_address_validation. For this purpose we definitely
want to treat BACKUP addresses as non local.
This change is conservative and doesn't modify compat in_localip() and
in6_localip(). They are used more widely than the FIB-aware versions.
The change would modify the notion of ipfw(4) 'me' keyword. There might
be other consequences as in_localip() is used by various tunneling
protocols.
PR: 277349
Just like it was done for accept(2) in cfb1e92912, use same approach
for two simplier syscalls that return socket addresses. Although,
these two syscalls aren't performance critical, this change generalizes
some code between 3 syscalls trimming code size.
Following example of accept(2), provide VNET-aware and INVARIANT-checking
wrappers sopeeraddr() and sosockaddr() around protosw methods.
Reviewed by: tuexen
Differential Revision: https://reviews.freebsd.org/D42694
Remove ancient SCCS tags from the tree, automated scripting, with two
minor fixup to keep things compiling. All the common forms in the tree
were removed with a perl script.
Sponsored by: Netflix
The mac_ipacl policy module enables fine-grained control over IP address
configuration within VNET jails from the base system.
It allows the root user to define rules governing IP addresses for
jails and their interfaces using the sysctl interface.
Requested by: multiple
Sponsored by: Google, Inc. (GSoC 2019)
MFC after: 2 months
Reviewed by: bz, dch (both earlier versions)
Differential Revision: https://reviews.freebsd.org/D20967
Some context on the current IPv6 interface setup & address management:
There are two data path for IPv6 initialisation in context of assigning
LL addresses:
1) Userland explicitly requests IFF_UP for the interface w/o any addresses.
if_up() then calls in6_if_up(), which calls in6_ifattach().
The latter sets up some initial ND/IN6 state and disables IPv6 for the
interface if it’s not loopback. If the interface is loopback, then it
adds ::1/128 and LL addresses via in6_ifattach_loopback().
Then, devd notification is generated (if the VNET is the default one),
which triggers rc.network ifconfig_up(), causing ifdisabled to be removed
via SIOCSIFINFO_IN6 from ifconfig. The kernel SIOCSIFINFO_IN6 handler
calls in6_if_up() once again and it assigns the interface link-local address.
2) Userland adds IPv4 or IPv6 address to the interface. SIOCAIFADDR[_IN6]
kernel handler calls IPv4/IPv6 protocol handler to add the address.
Both then call if_ioctl() with SIOCSIFADDR. Ethernet/loopback ioctl handlers
silently sets IFF_UP for the interface. Finally, if.c:ifioctl() wrapper code
compares old and new interface flags and, if IFF_UP is added, it explicitly
calls in6_if_up(), which adds link-local address if either the original
address is IPv6 or the interface is loopback.
In the latter case, “formal” interface-up notifications are missing.
The kernel does not trigger event handler event, does not call carp hook
and does not provide any userland notification.
This diff unifies the event handling in both scenarios, providing the
necessary notifications to the kernel and userland.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D40332
MFC after: 2 weeks
This is a total hack/bare minimum which follows inet4.
Otherwise 2 threads removing the same address can easily crash.
Reviewed by: kp
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D39317
Summary:
In preparation of making if_t completely opaque outside of the netstack,
explicitly include the header. <net/if_var.h> will stop including the
header in the future.
Sponsored by: Juniper Networks, Inc.
Reviewed by: glebius, melifaro
Differential Revision: https://reviews.freebsd.org/D38200
Summary:
Move SIOCAIFADDR_IN6 (current "primary" ioctl to add an IPv6
interface address) handling code to the dedicated in6_addifaddr()
function and make it a part of KPI. This allows in-kernel users to
add/delete interfaces addresses without relying on ioctl interface.
Subscribers: imp, ae, glebius
Differential Revision: https://reviews.freebsd.org/D36713
Construct the desired hexthops directly instead of using the
"translation" layer in form of filling rt_addrinfo data.
Simplify V_rt_add_addr_allfibs handling by using recently-added
rib_copy_route() to propagate the routes to the non-primary address
fibs.
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D36166
While here remove recursive network epoch entry in mld_fasttimo_vnet(),
as this function is already in epoch.
Reviewed by: melifaro
Differential revision: https://reviews.freebsd.org/D36161
For some reason protosw.h is used during world complation and userland
is not aware of caddr_t, a relic from the first version of C. Broken
buildworld is good reason to get rid of yet another caddr_t in kernel.
Fixes: 886fc1e804
Commit d6cd20cc5 ("netinet6: fix ndp proxying") caused us to panic when
unloading pfsync:
Fatal trap 12: page fault while in kernel mode
cpuid = 19; apic id = 38
fault virtual address = 0x20
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff80dfe7f4
stack pointer = 0x28:0xfffffe015d4f8ac0
frame pointer = 0x28:0xfffffe015d4f8ae0
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 5477 (kldunload)
trap number = 12
panic: page fault
cpuid = 19
time = 1654023100
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe015d4f8880
vpanic() at vpanic+0x17f/frame 0xfffffe015d4f88d0
panic() at panic+0x43/frame 0xfffffe015d4f8930
trap_fatal() at trap_fatal+0x387/frame 0xfffffe015d4f8990
trap_pfault() at trap_pfault+0xab/frame 0xfffffe015d4f89f0
calltrap() at calltrap+0x8/frame 0xfffffe015d4f89f0
--- trap 0xc, rip = 0xffffffff80dfe7f4, rsp = 0xfffffe015d4f8ac0, rbp = 0xfffffe015d4f8ae0 ---
in6_purge_proxy_ndp() at in6_purge_proxy_ndp+0x14/frame 0xfffffe015d4f8ae0
if_purgeaddrs() at if_purgeaddrs+0x24/frame 0xfffffe015d4f8b90
if_detach_internal() at if_detach_internal+0x1c2/frame 0xfffffe015d4f8bf0
if_detach() at if_detach+0x71/frame 0xfffffe015d4f8c20
pfsync_clone_destroy() at pfsync_clone_destroy+0x1dd/frame 0xfffffe015d4f8c70
if_clone_destroyif() at if_clone_destroyif+0x239/frame 0xfffffe015d4f8cc0
if_clone_detach() at if_clone_detach+0xc8/frame 0xfffffe015d4f8cf0
vnet_pfsync_uninit() at vnet_pfsync_uninit+0xda/frame 0xfffffe015d4f8d10
vnet_deregister_sysuninit() at vnet_deregister_sysuninit+0x85/frame 0xfffffe015d4f8d40
linker_file_sysuninit() at linker_file_sysuninit+0x147/frame 0xfffffe015d4f8d70
linker_file_unload() at linker_file_unload+0x269/frame 0xfffffe015d4f8db0
kern_kldunload() at kern_kldunload+0x18d/frame 0xfffffe015d4f8e00
amd64_syscall() at amd64_syscall+0x12e/frame 0xfffffe015d4f8f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe015d4f8f30
--- syscall (444, FreeBSD ELF64, sys_kldunloadf), rip = 0x1601eab28cba, rsp = 0x1601e9c363f8, rbp = 0x1601e9c36c50 ---
This happens because ifp->if_afdata[AF_INET6] is NULL. Check for this,
just as we already do in a few other places.
See also c139b3c19b ("arp/nd: Cope with late calls to
iflladdr_event").
Reviewed by: melifaro
Sponsored by: Rubicon Communications, LLC ("Netgate")
Differential Revision: https://reviews.freebsd.org/D35374
We could insert proxy NDP entries by the ndp command, but the host
with proxy ndp entries had not responded to Neighbor Solicitations.
Change the following points for proxy NDP to work as expected:
* join solicited-node multicast addresses for proxy NDP entries
in order to receive Neighbor Solicitations.
* look up proxy NDP entries not on the routing table but on the
link-level address table when receiving Neighbor Solicitations.
Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35307
MFC after: 2 weeks
In order to decrease ifdef INET/INET6s in the lltable implementation,
introduce the llt_post_resolved callback and implement protocol-dependent
code in the protocol-dependent part.
Reviewed By: melifaro
Differential Revision: https://reviews.freebsd.org/D35322
MFC after: 2 weeks
Also convert raw epoch_call() calls to lltable_free_entry() calls, no
functional change intended. There's no need to asynchronously free the
LLEs in that case to begin with, but we might as well use the lltable
interfaces consistently.
Noticed by code inspection; I believe lltable_calc_llheader() failures
do not generally happen in practice.
Reviewed by: bz
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D34832
Introduce a new function, lltable_get(), to retrieve lltable pointer
for the specified interface and family.
Use it to avoid all-iftable list traversal when adding or deleting
ARP/ND records.
Differential Revision: https://reviews.freebsd.org/D33660
MFC after: 2 weeks
Currently we use pre-calculated headers inside LLE entries as prepend data
for `if_output` functions. Using these headers allows saving some
CPU cycles/memory accesses on the fast path.
However, this approach makes adding L2 header for IPv4 traffic with IPv6
nexthops more complex, as it is not possible to store multiple
pre-calculated headers inside lle. Additionally, the solution space is
limited by the fact that PCB caching saves LLEs in addition to the nexthop.
Thus, add support for creating special "child" LLEs for the purpose of holding
custom family encaps and store mbufs pending resolution. To simplify handling
of those LLEs, store them in a linked-list inside a "parent" (e.g. normal) LLE.
Such LLEs are not visible when iterating LLE table. Their lifecycle is bound
to the "parent" LLE - it is not possible to delete "child" when parent is alive.
Furthermore, "child" LLEs are static (RTF_STATIC), avoding complex state
machine used by the standard LLEs.
nd6_lookup() and nd6_resolve() now accepts an additional argument, family,
allowing to return such child LLEs. This change uses `LLE_SF()` macro which
packs family and flags in a single int field. This is done to simplify merging
back to stable/. Once this code lands, most of the cases will be converted to
use a dedicated `family` parameter.
Differential Revision: https://reviews.freebsd.org/D31379
MFC after: 2 weeks
Use newly-create llentry_request_feedback(),
llentry_mark_used() and llentry_get_hittime() to
request datapatch usage check and fetch the results
in the same fashion both in IPv4 and IPv6.
While here, simplify llentry_provide_feedback() wrapper
by eliminating 1 condition check.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D31390
This reverts a portion of 274579831b ("capsicum: Limit socket
operations in capability mode") as at least rtsol and dhcpcd rely on
being able to configure network interfaces while in capability mode.
Reported by: bapt, Greg V
Sponsored by: The FreeBSD Foundation
Capsicum did not prevent certain privileged networking operations,
specifically creation of raw sockets and network configuration ioctls.
However, these facilities can be used to circumvent some of the
restrictions that capability mode is supposed to enforce.
Add capability mode checks to disallow network configuration ioctls and
creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET
internet sockets.
Reviewed by: oshogbo
Discussed with: emaste
Reported by: manu
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D29423
P2P ifa may require 2 routes: one is the loopback route, another is
the "prefix" route towards its destination.
Current code marks loopback routes existence with IFA_RTSELF and
"prefix" p2p routes with IFA_ROUTE.
For historic reasons, we fill in ifa_dstaddr for loopback interfaces.
To avoid installing the same route twice, we preemptively set
IFA_RTSELF when adding "prefix" route for loopback.
However, the teardown part doesn't have this hack, so we try to
remove the same route twice.
Fix this by checking if ifa_dstaddr is different from the ifa_addr
and moving this logic into a separate function.
Reviewed By: kp
Differential Revision: https://reviews.freebsd.org/D29121
MFC after: 3 days
Currently ip6_input() calls in6ifa_ifwithaddr() for
every local packet, in order to check if the target ip
belongs to the local ifa in proper state and increase
its counters.
in6ifa_ifwithaddr() references found ifa.
With epoch changes, both `ip6_input()` and all other current callers
of `in6ifa_ifwithaddr()` do not need this reference
anymore, as epoch provides stability guarantee.
Given that, update `in6ifa_ifwithaddr()` to allow
it to return ifa without referencing it, while preserving
option for getting referenced ifa if so desired.
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28648
The only place where in6_ifawithifp() is used is ip6_output(),
which uses the returned ifa to bump traffic counters.
Given ifa stability guarantees is provided by epoch, do not refcount ifa.
This eliminates 2 atomic ops from IPv6 fast path.
Reviewed By: rstone
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D28649
it gets unused.
Currently if_purgeifaddrs() uses in6_purgeaddr() to remove IPv6
ifaddrs. in6_purgeaddr() does not trrigger prefix removal if
number of linked ifas goes to 0, as this is a low-level function.
As a result, if_purgeifaddrs() purges all IPv4/IPv6 addresses but
keeps corresponding IPv6 prefixes.
Fix this by creating higher-level wrapper which handles unused
prefix usecase and use it in if_purgeifaddrs().
Differential revision: https://reviews.freebsd.org/D28128
rtinit[1]() is a function used to add or remove interface address prefix routes,
similar to ifa_maintain_loopback_route().
It was intended to be family-agnostic. There is a problem with this approach
in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer,
nd6_prelist_(), providing interface for maintaining interface routes. Its part,
responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing
proper route attributes and handling iterations over multiple fibs, for the
non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag
for p2p connections, host routes and p2p routes are handled in the same way.
Additionally, mapping IFA flags to RTF flags makes the interface pretty messy.
It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface
aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(),
complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive"
ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions:
- Route attribute construction were moved to the per-address-family functions,
dealing with (2), (3) and (4).
- funnction providing net.add_addr_allfibs handling and route rtsock notificaions
is the new routing table inteface.
- rtsock ifa notificaion has been moved out as well. resulting set of funcion are only
responsible for the actual route notifications.
Side effects:
* /32 alias does not result in interface routes (/32 route and "host" route)
* RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
fib[46]_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees
over pointer validness and requiring on-stack data copying.
With no callers remaining, remove fib[46]_lookup_nh_ functions.
Submitted by: Neel Chauhan <neel AT neelc DOT org>
Differential Revision: https://reviews.freebsd.org/D25445
Move the include for sysctl.h out of the middle of the file to the
includes at the beginning. This is will make it easier to add new
sysctls.
No functional changes.
MFC after: 3 weeks
Sponsored by: Netflix
When epoch(9) was introduced to network stack, it was basically
dropped in place of existing locking, which was mutexes and
rwlocks. For the sake of performance mutex covered areas were
as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays
memory reclaim. So, there is no point to minimise epoch covered
areas in sense of performance. Meanwhile entering/exiting epoch
also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm
we can assume that at any stage of processing a packet, we are
inside network epoch. This makes coding both input and output
path way easier.
On output path we already enter epoch quite early - in the
ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing,
network related callouts, other ways of packet injection to the
network stack shall be performed in net_epoch. Any leaf function
that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They
also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE)
than we had before. They will be cleaned up separately, as several
of them aren't trivial. Note, that unlike a lock recursion the
epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof
Differential Revision: https://reviews.freebsd.org/D19111
stf(4) interfaces are not multicast-capable so they can't perform DAD.
They also did not set IFF_DRV_RUNNING when an address was assigned, so
the logic in nd6_timer() would periodically flag such an address as
tentative, resulting in interface flapping.
Fix the problem by setting IFF_DRV_RUNNING when an address is assigned,
and do some related cleanup:
- In in6if_do_dad(), remove a redundant check for !UP || !RUNNING.
There is only one caller in the tree, and it only looks at whether
the return value is non-zero.
- Have in6if_do_dad() return false if the interface is not
multicast-capable.
- Set ND6_IFF_NO_DAD when an address is assigned to an stf(4) interface
and the interface goes UP as a result. Note that this is not
sufficient to fix the problem because the new address is marked as
tentative and DAD is started before in6_ifattach() is called.
However, setting no_dad is formally correct.
- Change nd6_timer() to not flag addresses as tentative if no_dad is
set.
This is based on a patch from Viktor Dukhovni.
Reported by: Viktor Dukhovni <ietf-dane@dukhovni.org>
Reviewed by: ae
MFC after: 3 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D19751
After the afdata read lock was converted to epoch(9), readers could
observe a linked LLE and block on the LLE while a thread was
unlinking the LLE. The writer would then release the lock and schedule
the LLE for deferred free, allowing readers to continue and potentially
schedule the LLE timer. By the point the timer fires, the structure is
freed, typically resulting in a crash in the callout subsystem.
Fix the problem by modifying the lookup path to check for the LLE_LINKED
flag upon acquiring the LLE lock. If it's not set, the lookup fails.
PR: 234296
Reviewed by: bz
Tested by: sbruno, Victor <chernov_victor@list.ru>,
Mike Andrews <mandrews@bit0.com>
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D18906
- Remove macros that covertly create epoch_tracker on thread stack. Such
macros a quite unsafe, e.g. will produce a buggy code if same macro is
used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list
IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read
locking macros to what they actually are - the net_epoch.
Keeping them as is is very misleading. They all are named FOO_RLOCK(),
while they no longer have lock semantics. Now they allow recursion and
what's more important they now no longer guarantee protection against
their companion WLOCK macros.
Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed
functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter
epoch recursively.
Discussed with: jtl, gallatin
handler receives the type of event IFADDR_EVENT_ADD/IFADDR_EVENT_DEL,
and the pointer to ifaddr. Also ifaddr_event now is implemented using
ifaddr_event_ext handler.
MFC after: 3 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D17100
This deduplicates the code a bit, and also implicitly adds missing
callout_stop() to in[6]_lltable_delete_entry() functions.
PR: 209682, 225927
Submitted by: hselasky (previous version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D4605