some more. Similar to what we do for TCP check for v4-mapped
addresses and then handle them or the normal v6 address case.
For either set inp_vflags before calling into the pcb connect
function so that we have an unambiguous view in case we need to
set the local address or port.
Looked at: tuexen (as part of more)
MFC after: 3 days
callers. This also fixes a problem when the prison call could set
the inp->in6p_laddr (laddr) and a following priv_check_cred() call
would return an error and will allow us to merge the IPv4 and IPv6
implementation.
MFC after: 2 weeks
even after dropping the reference and unlocking. Previously we
have dereferenced a NULL pointer (after r121765).
Simply unlocking after the block does not work either because of
lock ordering (see r121765) and in addition we would still hold
a pointer to something that might be gone by the time we access it.
Thus take a copy of the value rather than just caching the pointer.
PR: kern/151908
Submitted by: chenyl (netstar2008 126.com) (initial version)
MFC after: 2 weeks
* Store the flowid when receiving an SCTP/IPv6 packet.
* Store the flowid when receiving an SCTP packet with wrong CRC.
* Initilize flowid correctly.
* Put test code under INVARIANTS.
MFC after: 3 months.
Call the handler function with the lock held, return unlocked as we
might free the entry. Rework functions later in the call graph to be
either called with the lock held or, only if needed, unlocked.
Place asserts to document and tighten assumptions on various lle locking,
which were not always true before.
We call nd6_ns_output() unlocked and the assignment of ip6->ip6_src was
decentralized to minimize possible complexity introduced with the formerly
missing locking there. This also resulted in a push down of local
variable scopes into smaller blocks.
Reported by: many
PR: kern/148857
Submitted by: Dmitrij Tejblum (tejblum yandex-team.ru) (original version)
MFC After: 4 days
DPCPU_DEFINE and VNET_DEFINE macros, as these cause problems for various
people working on the affected files. A better long-term solution is
still being considered. This reversal may give some modules empty
set_pcpu or set_vnet sections, but these are harmless.
Changes reverted:
------------------------------------------------------------------------
r215318 | dim | 2010-11-14 21:40:55 +0100 (Sun, 14 Nov 2010) | 4 lines
Instead of unconditionally emitting .globl's for the __start_set_xxx and
__stop_set_xxx symbols, only emit them when the set_vnet or set_pcpu
sections are actually defined.
------------------------------------------------------------------------
r215317 | dim | 2010-11-14 21:38:11 +0100 (Sun, 14 Nov 2010) | 3 lines
Apply the STATIC_VNET_DEFINE and STATIC_DPCPU_DEFINE macros throughout
the tree.
------------------------------------------------------------------------
r215316 | dim | 2010-11-14 21:23:02 +0100 (Sun, 14 Nov 2010) | 2 lines
Add macros to define static instances of VNET_DEFINE and DPCPU_DEFINE.
Consistently use the LLE_ prefix for lla_lookup() and the ND6_ prefix
for nd6_lookup() even though both are defined the same. Use the right
flag variable when checking each.
No real functional change.
MFC after: 4 days
legacy and IPv6 route destination address.
Previously in case of IPv6, there was a memory overwrite due to not enough
space for the IPv6 address.
PR: kern/122565
MFC After: 2 weeks
un-expiring.
The previous version of code have no locking when testing rt_refcnt.
The result of the lack of locking may result in a condition where
a routing entry have a reference count but at the same time have
RTPRF_OURS bit set and an expiration timer. These would eventually
lead to a panic:
panic: rtqkill route really not free
When the system have ICMP redirects accepted from local gateway
in a moderate frequency, for instance.
Commit this workaround for now until we have some better solution.
PR: kern/149804
Reviewed by: bz
Tested by: Zhao Xin, Pete French
MFC after: 2 weeks
In protosw we define pr_protocol as short, while on the wire
it is an uint8_t. That way we can have "internal" protocols
like DIVERT, SEND or gaps for modules (PROTO_SPACER).
Switch ipproto_{un,}register to accept a short protocol number(*)
and do an upfront check for valid boundries. With this we
also consistently report EPROTONOSUPPORT for out of bounds
protocols, as we did for proto == 0. This allows a caller
to not error for this case, which is especially important
if we want to automatically call these from domain handling.
(*) the functions have been without any in-tree consumer
since the initial introducation, so this is considered save.
Implement ip6proto_{un,}register() similarly to their legacy IP
counter parts to allow modules to hook up dynamically.
Reviewed by: philip, will
MFC after: 1 week
Fix the switching on/off of PF and NR-SACKs using sysctl.
Add minor improvement in handling malloc failures.
Improve the address checks when sending.
MFC after: 4 weeks
Add kernel side support for Secure Neighbor Discovery (SeND), RFC 3971.
The implementation consists of a kernel module that gets packets from
the nd6 code, sends them to user space on a dedicated socket and reinjects
them back for further processing.
Hooks are used from nd6 code paths to divert relevant packets to the
send implementation for processing in user space. The hooks are only
triggered if the send module is loaded. In case no user space
application is connected to the send socket, processing continues
normaly as if the module would not be loaded. Unloading the module
is not possible at this time due to missing nd6 locking.
The native SeND socket is similar to a raw IPv6 socket but with its own,
internal pseudo-protocol.
Approved by: bz (mentor)
and go to the next iteration early if multicast filtering would decide that
this socket shall not receive the data.
Unlock the pcb in that case or we leak the read lock and next time trying
to get a write lock, would hang forever.
PR: kern/149608
Submitted by: Chris Luke (chrisy flirble.org)
MFC after: 3 days
bridge(4), lagg(4) etc. and make use of function pointers and
pf_proto_register() to hook carp into the network stack.
Currently, because of the uncertainty about whether the unload path is free
of race condition panics, unloads are disallowed by default. Compiling with
CARPMOD_CAN_UNLOAD in CFLAGS removes this anti foot shooting measure.
This commit requires IP6PROTOSPACER, introduced in r211115.
Reviewed by: bz, simon
Approved by: ken (mentor)
MFC after: 2 weeks
Add proto spacers to inet6sw like we have for legacy IP. This allows us
to dynamically pf_proto_register() for INET6 from modules, needed by
upcoming CARP changes and SeND.
MC and SCTP could make use of it as well in theory in the future after
upcoming VIMAGE vnet teardown work.
Discussed with: will, anchie
MFC after: 10 days
nd6_llinfo_timer() functions with a KASSERT().
Note: there is no need to return after panic.
In the legacy IP case, only assign the arg after the check,
in the IPv6 case, remove the extra checks for the table and
interface as they have to be there unless we freed and forgot
to cancel the timer. It doesn't matter anyway as we would
panic on the NULL pointer deref immediately and the bug is
elsewhere.
This unifies the code of both address families to some extend.
Reviewed by: rwatson
MFC after: 6 days
working anymore. In addition more checks and operations were missing.
In case lla_lookup results in a match, get the ifaddr to update the
statistics counters, and check that the address is neither tentative,
duplicate or otherwise invalid before accepting the packet. If ok,
record the address information in the mbuf. [ as is done in case
lla_lookup does not return a result and we go through the FIB ].
Reported by: remko
Tested by: remko
MFC after: 2 weeks
We do not respect rules 3 and 4 in the required list:
1. omit leading zeros
2. "::" used to their maximum extent whenever possible
3. "::" used where shortens address the most
4. "::" used in the former part in case of a tie breaker
5. do not shorten one 16 bit 0 field
6. use lower case
http://tools.ietf.org/html/draft-ietf-6man-text-addr-representation-04.html
Submitted by: Kalluru Abhiram @ Juniper Networks
Obtained from: Juniper Networks
Reviewed by: hrs, dougb
"Whitspace" churn after the VIMAGE/VNET whirls.
Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.
Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.
This also removes some header file pollution for putatively
static global variables.
Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.
Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days
that we allow all possible jail IPs as source address rather than
forcing the "primary". While IPv6 naturally has source address
selection, for legacy IP we do not go through the pain in case
IP_HDRINCL was not set. People should bind(2) for that.
This will, for example, allow ping(|6) -S to work correctly for
non-primary addresses.
Reported by: (ten 211.ru)
Tested by: (ten 211.ru)
MFC after: 4 days
addresses while walking the IPv6 address list if in the jail case
something is connecting to ::1.
Reported by: Pieter de Boer (pieter thedarkside.nl)
Tested by: Pieter de Boer (pieter thedarkside.nl)
MFC after: 4 days
prevented the link-layer entry from being freed.
In both in.c and in6.c (though that code path seems to be basically dead)
plug a reference leak in case of a pending callout being drained.
In if_ether.c consistently add a reference before resetting the callout
and in case we canceled a pending one remove the reference for that.
In the final case in arptimer, before freeing the expired entry, remove
the reference again and explicitly call callout_stop() to clear the active
flag.
In nd6.c:nd6_free() we are only ever called from the callout function and
thus need to remove the reference there as well before calling into
llentry_free().
In if_llatbl.c when freeing entire tables make sure that in case we cancel
a pending callout to remove the reference as well.
Reviewed by: qingli (earlier version)
MFC after: 10 days
Problem observed, patch tested by: simon on ipv6gw.f.o,
Christian Kratzer (ck cksoft.de),
Evgenii Davidov (dado korolev-net.ru)
PR: kern/144564
Configurations still affected: with options FLOWTABLE
being embedded is in fact link-local, before attempting to embed it.
Note that this operation is a side-effect of trying to avoid recursion on
the IN6 scope lock.
PR: 144560
Submitted by: Petr Lampa
MFC after: 3 days
* Fix handling of mapping arrays when draining mbufs or processing
FORWARD-TSN chunks.
* Cleanup code (no duplicate code anymore for SACKs and NR-SACKs).
Part of this code was developed together with rrs.
MFC after: 2 weeks.
have the delayed function take an argument as to the offset
to the SCTP header. This allows it to work for V4 and V6.
This of course means changing all callers of the function
to either pass the header len, if they have it, or create
it (ip_hl << 2 or sizeof(ip6_hdr)).
PR: 144529
MFC after: 2 weeks
no delayed checksum was added to the ip6 output code. This
causes cards that do not support SCTP checksum offload to
have SCTP packets that are IPv6 NOT have the sctp checksum
performed. Thus you could not communicate with a peer. This
adds the missing bits to make the checksum happen for these cards.
PR: 144529
MFC after: 2 weeks
whether to use source address selection (default) or the primary
jail address for unbound outgoing connections.
This is intended to be used by people upgrading from single-IP
jails to multi-IP jails but not having to change firewall rules,
application ACLs, ... but to force their connections (unless
otherwise changed) to the primry jail IP they had been used for
years, as well as for people prefering to implement similar policies.
Note that for IPv6, if configured incorrectly, this might lead to
scope violations, which single-IPv6 jails could as well, as by the
design of jails. [1]
Reviewed by: jamie, hrs (ipv6 part)
Pointed out by: hrs [1]
MFC After: 2 weeks
Asked for by: Jase Thew (bazerka beardz.net)
for the interface address. This marker is necessary to properly support
PPP types of links where multiple links can have the same local end
IP address. The IFA_RTSELF flag bit maps to the RTF_HOST value, which
was combined into the route flag bits during prefix installation in
IPv6. This inclusion causing the prefix route to be unusable. This
patch fixes this bug by excluding the IFA_RTSELF flag during route
installation.
MFC after: 5 days
same interface. The first address will install the prefix route into
the kernel routing table and that prefix will be marked as on-link.
Without RADIX_MPATH enabled, the other address aliases of the same
prefix will update the prefix reference count but no other routes
will be installed. Consequently the prefixes associated with these
addresses would not be marked as on-link. As such, incoming packets
destined to these address aliases will fail the ND6 on-link check
on input. This patch fixes the above problem by searching the kernel
routing table and try to find an on-link prefix on the given interface.
MFC after: 5 days
IFF_POINTOPOINT link types. The reason was due to the routing
entry returned from the kernel covering the remote end is of an
interface type that does not support ARP. This patch fixes this
problem by providing a hint to the kernel routing code, which
indicates the prefix route instead of the PPP host route should
be returned to the caller. Since a host route to the local end
point is also added into the routing table, and there could be
multiple such instantiations due to multiple PPP links can be
created with the same local end IP address, this patch also fixes
the loopback route installation failure problem observed prior to
this patch. The reference count of loopback route to local end would
be either incremented or decremented. The first instantiation would
create the entry and the last removal would delete the route entry.
MFC after: 5 days
with SSM MLDv2 by default.
This is current practice and complies with RFC 4604, as well as being
required by production IPv6 networks in Japan.
The behaviour may be disabled by setting the net.inet6.mld.use_allow
sysctl/tunable to 0.
Requested by: Hideki Yamamoto
MFC after: 1 week
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.
Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.
We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.
Discussed with: julian, zec, jamie, rwatson (back in Sept)
MFC after: 5 days
Don't allow joins w/o source on an existing group.
This is almost always pilot error.
We don't need to check for group filter UNDEFINED state at t1,
because we only ever allocate filters with their groups, so we
unconditionally reject such calls with EINVAL.
Trying to change the active filter mode w/o going through IPV6_MSFILTER
is also disallowed.
MFC after: 1 day
Tighten input checking in in6p_join_group():
* Don't try to use the source address, when its family is unspecified.
* If we get a join without a source, on an existing inclusive
mode group, this is an error, as it would change the filter mode.
Fix a problem with the handling of in6_mfilter for new memberships:
* Do not rely on im6f being NULL; it is explicitly initialized to a
non-NULL pointer when constructing a membership.
* Explicitly initialize *im6f to EX mode when the source address
is unspecified.
This fixes a problem with in_mfilter slot recycling in the join path.
MFC after: 1 day
Fix an obvious logic error in the IPv4 multicast leave processing,
where the filter mode vector was not updated correctly after the leave.
MFC after: 1 day
we did not add. Call LLE_REMREF() only when callout_stop()
actually canceled a pending callout.
- callout_reset() may cancel a pending callout. When
callout_reset() canceled a pending callout, call LLE_REMREF()
to drop a reference for the canceled callout.
MFC after: 1 week
Adding a tentative address is useless.
- Comment out a confused warning message when
in6_ifattach_linklocal() fails. This can occur when the
interface does not support ioctl(SIOCAIFADDR) (interfaces
associated with 802.11 wireless network device drivers, for
example).
packet filters. ALso allows ipfw to be enabled on on ejail and disabled
on another. In 8.0 it's a global setting.
Sitting aroung in tree waiting to commit for: 2 months
MFC after: 2 months
Note that when the interface has ND6_IFF_IFDISABLED, a newly-added
address is always marked as IN6_IFF_TENTATIVE so that the interface
can perform DAD after the ND6_IFF_IFDISABLED is cleared.
and address aliases. After an interface is brought down and brought
back up again, those self pointing routes disappeared. This patch
ensures after an interface is brought back up, the loopback routes
are reinstalled properly.
Reviewed by: bz
MFC after: immediately
automatic link-local address configuration:
- Convert a sysctl net.inet6.ip6.accept_rtadv to one for the
default value of a per-IF flag ND6_IFF_ACCEPT_RTADV, not a
global knob. The default value of the sysctl is 0.
- Add a new per-IF flag ND6_IFF_AUTO_LINKLOCAL and convert a
sysctl net.inet6.ip6.auto_linklocal to one for its default
value. The default value of the sysctl is 1.
- Make ND6_IFF_IFDISABLED more robust. It can be used to disable
IPv6 functionality of an interface now.
- Receiving RA is allowed if ip6_forwarding==0 *and*
ND6_IFF_ACCEPT_RTADV is set on that interface. The former
condition will be revisited later to support a "host + router" box
like IPv6 CPE router. The current behavior is compatible with
the older releases of FreeBSD.
- The ifconfig(8) now supports these ND6 flags as well as "nud",
"prefer_source", and "disabled" in ndp(8). The ndp(8) now
supports "auto_linklocal".
Discussed with: bz and jinmei
Reviewed by: bz
MFC after: 3 days
scenario where an anycast address is assigned on one interface,
and a global address with the same scope is assigned on another
interface. In other words, the interface owns the anycast
address has only the link-local address as one other address.
Without this patch, "ping6" the anycast address from another
station will observe the source address of the returned ICMP6
echo reply has the link-local address, not the global address
that exists on the other interface in the same node.
Reviewed by: bz
MFC after: immediately
- Interface link-local address is not reachable within the
node that owns the interface, this is due to the mismatch
in address scope as the result of the installed interface
address loopback route. Therefore for each interface
address loopback route, the rt_gateway field (of AF_LINK
type) will be used to track which interface a given
address belongs to. This will aid the address source to
use the proper interface for address scope/zone validation.
- The loopback address is not reachable. The root cause is
the same as the above.
- Empty nd6 entries are created for the IPv6 loopback addresses
only for validation reason. Doing so will eliminate as much
of the special case (loopback addresses) handling code
as possible, however, these empty nd6 entries should not
be returned to the userland applications such as the
"ndp" command.
Since both of the above issues contain common files, these
files are committed together.
Reviewed by: bz
MFC after: immediately
configured prefixes. Since these statically configured prefixes
do not have any associated advertising routers, these prefixes
are treated as unreachable and those prefix routes are deleted
from the routing table. Therefore bypass prefixes that are not
learned from router advertisements during prefix on-link check.
Reviewed by: hrs
an IPv6 address assigned to it, and if an incoming packet received on
one interface has a packet destination address that belongs to another
interface, the routing table is consulted to determine how to reach this
packet destination. Since the packet destination is an interface address,
the route table will return a host route with the loopback interface as
rt_ifp. The input code must recognize this fact, instead of using the
loopback interface, the input code performs a search to find the right
interface that owns the given IPv6 address.
Reviewed by: bz, gnn, kmacy
MFC after: immediately
list/index locks, to protect link layer address tables. This avoids
lock order issues during interface teardown, but maintains the bug that
sysctl copy routines may be called while a non-sleepable lock is held.
Reviewed by: bz, kmacy
MFC after: 3 days
several critical bugs, including race conditions and lock order issues:
Replace the single rwlock, ifnet_lock, with two locks, an rwlock and an
sxlock. Either can be held to stablize the lists and indexes, but both
are required to write. This allows the list to be held stable in both
network interrupt contexts and sleepable user threads across sleeping
memory allocations or device driver interactions. As before, writes to
the interface list must occur from sleepable contexts.
Reviewed by: bz, julian
MFC after: 3 days
address is configured with a /128 prefix. This is no longer necessary due
to r192011. In fact that code conflicts with r192011. This patch removes
the host route installation when detecting the /128 prefix, and instead
let the code added by r192011 to install the loopback route for that IPv6
interface address.
Reviewed by: bz
Approved by: re
all pertinent statatistics for the subsystem. These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.
Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored. This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.
The following modules are affected by this change:
if_bridge
if_cxgb
if_gif
ip_mroute
ipdivert
pf
In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack. However, that change is too agressive for
this point in the release cycle.
Reviewed by: bz
Approved by: re (kib)
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks. Minor cleanups are done in the process,
and comments updated to reflect these changes.
Reviewed by: bz
Approved by: re (vimage blanket)
- Allow loopback route to be installed for address assigned to
interface of IFF_POINTOPOINT type.
- Install loopback route for an IPv4 interface addreess when the
"useloopback" sysctl variable is enabled. Similarly, install
loopback route for an IPv6 interface address when the sysctl variable
"nd6_useloopback" is enabled. Deleting loopback routes for interface
addresses is unconditional in case these sysctl variables were
disabled after an interface address has been assigned.
Reviewed by: bz
Approved by: re
network stacks, VNET_SYSINIT:
- Add VNET_SYSINIT and VNET_SYSUNINIT macros to declare events that will
occur each time a network stack is instantiated and destroyed. In the
!VIMAGE case, these are simply mapped into regular SYSINIT/SYSUNINIT.
For the VIMAGE case, we instead use SYSINIT's to track their order and
properties on registration, using them for each vnet when created/
destroyed, or immediately on module load for already-started vnets.
- Remove vnet_modinfo mechanism that existed to serve this purpose
previously, as well as its dependency scheme: we now just use the
SYSINIT ordering scheme.
- Implement VNET_DOMAIN_SET() to allow protocol domains to declare that
they want init functions to be called for each virtual network stack
rather than just once at boot, compiling down to DOMAIN_SET() in the
non-VIMAGE case.
- Walk all virtualized kernel subsystems and make use of these instead
of modinfo or DOMAIN_SET() for init/uninit events. In some cases,
convert modular components from using modevent to using sysinit (where
appropriate). In some cases, do minor rejuggling of SYSINIT ordering
to make room for or better manage events.
Portions submitted by: jhb (VNET_SYSINIT), bz (cleanup)
Discussed with: jhb, bz, julian, zec
Reviewed by: bz
Approved by: re (VIMAGE blanket)
non-vrtiualized sysctls so we cannot used one common function.
Add a macro to convert the arg1 in the virtualized case to
vnet.h to not expose the maths to all over the code.
Add a wrapper for the single virtualized call, properly handling
arg1 and call the default implementation from there.
Convert the two over places to use the new macro.
Reviewed by: rwatson
Approved by: re (kib)
nor destructors, as there's no actual work to do.
In most cases, the constructors weren't needed because of the existing
protocol initialization functions run by net_init_domain() as part of
VNET_MOD_NET, or they were eliminated when support for static
initialization of virtualized globals was added.
Garbage collect dependency references to modules without constructors or
destructors, notably VNET_MOD_INET and VNET_MOD_INET6.
Reviewed by: bz
Approved by: re (vimage blanket)
unused custom mutex/condvar-based sleep locks with two locks: an
rwlock (for non-sleeping use) and sxlock (for sleeping use). Either
acquired for read is sufficient to stabilize the vnet list, but both
must be acquired for write to modify the list.
Replace previous no-op read locking macros, used in various places
in the stack, with actual locking to prevent race conditions. Callers
must declare when they may perform unbounded sleeps or not when
selecting how to lock.
Refactor vnet sysinits so that the vnet list and locks are initialized
before kernel modules are linked, as the kernel linker will use them
for modules loaded by the boot loader.
Update various consumers of these KPIs based on whether they may sleep
or not.
Reviewed by: bz
Approved by: re (kib)
a valid zone ID or interface identifier in a v6 multicast leave, would
trigger a fairly paranoid KASSERT().
Observed with Boost++ regression tests on ref8.freebsd.org.
Approved by: re (kib)