1
0
mirror of https://git.FreeBSD.org/src.git synced 2025-01-17 15:27:36 +00:00
Commit Graph

1524 Commits

Author SHA1 Message Date
Eric van Gyzen
17a036563d Fix the handling of IPv6 On-Link Redirects.
On receipt of a redirect message, install an interface route for the
redirected destination.  On removal of the corresponding Neighbor Cache
entry, remove the interface route.

This requires changes in rtredirect_fib() to cope with an AF_LINK
address for the gateway and with the absence of RTF_GATEWAY.

This fixes the "Redirected On-Link" test cases in the Tahi IPv6 Ready Logo
Phase 2 test suite.

Unrelated to the above, fix a recursion on the radix node head lock
triggered by the Tahi Redirected to Alternate Router test cases.

When I first wrote this patch in October 2012, all Section 2
(Neighbor Discovery) test cases passed on 10-CURRENT, 9-STABLE,
and 8-STABLE.  cem@ recently rebased the 10.x patch onto head and reported
that it passes Tahi.  (Thanks!)

These other test cases also passed in 2012:

* the RTF_MODIFIED case, with IPv4 and IPv6 (using a
  RTF_HOST|RTF_GATEWAY route for the destination)

* the redirected-to-self case, with IPv4 and IPv6

* a valid IPv4 redirect

All testing in 2012 was done with WITNESS and INVARIANTS.

Tested by:    EMC / Isilon Storage Division via Conrad Meyer (cem) in 2015,
              Mark Kelley <mark_kelley@dell.com> in 2012,
              TC Telkamp <terence_telkamp@dell.com> in 2012
PR:           152791
Reviewed by:  melifaro (current rev), bz (earlier rev)
Approved by:  kib (mentor)
MFC after:    1 month
Relnotes:     yes
Sponsored by: Dell Inc.
Differential Revision: https://reviews.freebsd.org/D3602
2015-09-14 19:17:25 +00:00
Alexander V. Chernikov
3e7a2321e3 * Do more fine-grained locking: call eventhandlers/free_entry
without holding afdata wlock
* convert per-af delete_address callback to global lltable_delete_entry() and
  more low-level "delete this lle" per-af callback
* fix some bugs/inconsistencies in IPv4/IPv6 ifscrub procedures

Sponsored by:		Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3573
2015-09-14 16:48:19 +00:00
Hiroki Sato
120ff2d73d Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6 forgotten in the previous commit.
MFC after:	3 days
2015-09-10 08:37:03 +00:00
Hiroki Sato
e3884653f6 - Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6. These are quite old APIs and
there is no consumer now.

MFC after:	3 days
2015-09-10 06:31:24 +00:00
Hiroki Sato
d0bec2c522 - Remove SIOCGDRLST_IN6 and SIOCGPRLST_IN6. These are quite old APIs and
there is no consumer now.

- Simplify first and duplicate LLA check.

MFC after:	3 days
2015-09-10 06:29:18 +00:00
Hiroki Sato
1fce58fc62 Do not add IN6_IFF_TENTATIVE when ND6_IFF_NO_DAD.
MFC after:	3 days
2015-09-10 06:10:30 +00:00
Hiroki Sato
3ba7e4ce9c Remove IN6_IFF_NOPFX. This flag was no longer used.
MFC after:	3 days
2015-09-10 06:08:42 +00:00
Adrian Chadd
68bb8d6249 Add support for receiving flowtype, flowid and RSS bucket information as part of recvmsg().
Submitted by:	Tiwei Bie <btw@mail.ustc.edu.cn>
Differential Revision:	https://reviews.freebsd.org/D3562
2015-09-06 20:57:57 +00:00
Alexander V. Chernikov
26deb8826c Do not pass lle to nd6_ns_output(). Use newly-added
nd6_llinfo_get_holdsrc() to extract desired IPv6 source
  from holdchain and pass it to the nd6_ns_output().
2015-09-05 14:14:03 +00:00
Alexander V. Chernikov
deeedaa549 Do not skip entries without LLE_VALID flag.
This one fixes showing incomplete entries in ndp -an.

MFC after:	2 weeks
2015-09-05 06:24:00 +00:00
Alexander V. Chernikov
91bfd68e38 Make in6ifa_ifpwithaddr() take const param.
Remove unneded DECONST from in6_lltable_rtcheck().
2015-09-05 05:54:09 +00:00
Alexander V. Chernikov
3b0fd911fa Simplify lla_rt_output()/nd6_add_ifa_lle() by setting lle state in
alloc handler, based on flags.
2015-08-31 05:03:36 +00:00
Adrian Chadd
0be189151f Implement RSS hashing/re-hashing for IPv6 ingress packets.
This mirrors the basic IPv4 implementation - IPv6 packets under RSS
now are checked for a correct RSS hash and if one isn't provided,
it's done in software.

This only handles the initial receive - it doesn't yet handle
reinjecting / rehashing packets after being decapsulated from
various tunneling setups.  That'll come in some follow-up work.

For non-RSS users, this is almost a giant no-op.

It does change a couple of ipv6 methods to use const mbuf * instead of
mbuf * but it doesn't have any functional changes.

So, the following now occurs:

* If the NIC doesn't do any RSS hashing, it's all done in software.
  Single-queue, non-RSS NICs will now have the RX path distributed
  into multiple receive netisr queues.

* If the NIC provides the wrong hash (eg only IPv6 hash when we needed
  an IPv6 TCP hash, or IPv6 UDP hash when we expected IPv6 hash)
  then the hash is recalculated.

* .. if the hash is recalculated, it'll end up being injected into
  the correct netisr queue for v6 processing.

Submitted by:	Tiwei Bie <btw@mail.ustc.edu.cn>
Differential Revision:	https://reviews.freebsd.org/D3504
2015-08-29 07:14:29 +00:00
Bjoern A. Zeeb
196074f3b2 remove a left-over after r220463 empty #ifdef INET check.
MFC after:	1 week
2015-08-28 09:38:18 +00:00
Adrian Chadd
e5562eb934 Replace the printf()s with optional rate limited debugging for RSS.
Submitted by:	Tiwei Bie <btw@mail.ustc.edu.cn>
Differential Revision:	https://reviews.freebsd.org/D3471
2015-08-28 05:58:16 +00:00
Bjoern A. Zeeb
a86e5c96af get_inpcbinfo() and get_pcblist() are UDP local functions and
do not do what one would expect by name. Prefix them with "udp_"
to at least obviously limit the scope.

This is a non-functional change.

Reviewed by:		gnn, rwatson
MFC after:		2 weeks
Differential Revision:	https://reviews.freebsd.org/D3505
2015-08-27 15:27:41 +00:00
Adrian Chadd
2bf1d4880d Call the new RSS hash calculation function to correctly calculate a hash
based on the configured requirements for the protocol.

Tested:

* UDP IPv6 TX/RX testing, w/ RSS enabled, 82599 ixgbe(4) hardware
2015-08-25 06:12:59 +00:00
Adrian Chadd
20dbdf88a5 Implement the IPv6 RSS software hash function.
This isn't yet linked into the receive/transmit paths anywhere just yet.

This is part of a GSoC 2015 project.

Submitted by:	Tiwei Bie <btw@mail.ustc.edu.cn>
Reviewed by:	hiren, gnn
Differential Revision:	https://reviews.freebsd.org/D3423
2015-08-24 05:36:08 +00:00
Hiroki Sato
fb583bd228 - Deprecate IN6_IFF_NODAD. It was used to prevent DAD on a loopback
interface but in6if_do_dad() already had a check for IFF_LOOPBACK.

- Remove in6if_do_dad() check in in6_broadcast_ifa().  An address
  which needs DAD always has IN6_IFF_TENTATIVE there.

- in6if_do_dad() now returns EAGAIN when the interface is not ready
  since DAD callout handler ignores such an interface.

- In DAD callout handler, mark an address as IN6_IFF_TENTATIVE
  when the interface has ND6_IFF_IFDISABLED.  And Do IFF_UP and
  IFF_DRV_RUNNING check consistently when DAD is required.

- draft-ietf-6man-enhanced-dad is now published as RFC 7527.

- Fix some typos.
2015-08-24 05:21:49 +00:00
Alexander V. Chernikov
5a2555160f * Split allocation and table linking for lle's.
Before that, the logic besides lle_create() was the following:
  return existing if found, create if not. This behaviour was error-prone
  since we had to deal with 'sudden' static<>dynamic lle changes.
  This commit fixes bunch of different issues like:
  - refcount leak when lle is converted to static.
    Simple check case:
    console 1:
    while true;
      do for i in `arp -an|awk '$4~/incomp/{print$2}'|tr -d '()'`;
        do arp -s $i 00:22:44:66:88:00 ; arp -d $i;
      done;
    done
   console 2:
    ping -f any-dead-host-in-L2
   console 3:
    # watch for memory consumption:
    vmstat -m | awk '$1~/lltable/{print$2}'
  - possible problems in arptimer() / nd6_timer() when dropping/reacquiring
   lock.
  New logic explicitly handles use-or-create cases in every lla_create
  user. Basically, most of the changes are purely mechanical. However,
  we explicitly avoid using existing lle's for interface/static LLE records.
* While here, call lle_event handlers on all real table lle change.
* Create lltable_free_entry() calling existing per-lltable
  lle_free_t callback for entry deletion
2015-08-20 12:05:17 +00:00
Alexander V. Chernikov
0447c1367a Use single 'lle_timer' callout in lltable instead of
two different names of the same timer.
2015-08-11 12:38:54 +00:00
Alexander V. Chernikov
314294de5c Store addresses instead of sockaddrs inside llentry.
This permits us having all (not fully true yet) all the info
needed in lookup process in first 64 bytes of 'struct llentry'.

struct llentry layout:
BEFORE:
[rwlock .. state .. state .. MAC ] (lle+1) [sockaddr_in[6]]
AFTER
[ in[6]_addr MAC .. state .. rwlock ]

Currently, address part of struct llentry has only 16 bytes for the key.
However, lltable does not restrict any custom lltable consumers with long
keys use the previous approach (store key at (lle+1)).

Sponsored by:	Yandex LLC
2015-08-11 09:26:11 +00:00
Alexander V. Chernikov
41cb42a633 MFP r276712.
* Split lltable_init() into lltable_allocate_htbl() (alloc
  hash table with default callbacks) and lltable_link() (
  links any lltable to the list).
* Switch from LLTBL_HASHTBL_SIZE to per-lltable hash size field.
* Move lltable setup to separate functions in in[6]_domifattach.
2015-08-11 05:51:00 +00:00
Alexander V. Chernikov
2caee4be35 Rename rt_foreach_fib() to rt_foreach_fib_walk().
Suggested by:	julian
2015-08-10 20:50:31 +00:00
Alexander V. Chernikov
11cdad9873 Partially merge r274887,r275334,r275577,r275578,r275586 to minimize
differences between projects/routing and HEAD.

This commit tries to keep code logic the same while changing underlying
code to use unified callbacks.

* Add llt_foreach_entry method to traverse all entries in given llt
* Add llt_dump_entry method to export particular lle entry in sysctl/rtsock
  format (code is not indented properly to minimize diff). Will be fixed
  in the next commits.
* Add llt_link_entry/llt_unlink_entry methods to link/unlink particular lle.
* Add llt_fill_sa_entry method to export address in the lle to sockaddr
  format.
* Add llt_hash method to use in generic hash table support code.
* Add llt_free_entry method which is used in llt_prefix_free code.

* Prepare for fine-grained locking by separating lle unlink and deletion in
  lltable_free() and lltable_prefix_free().

* Provide lltable_get<ifp|af>() functions to reduce direct 'struct lltable'
 access by external callers.

* Remove @llt agrument from lle_free() lle callback since it was unused.
* Temporarily add L3_CADDR() macro for 'const' sockaddr typecasting.
* Switch to per-af hashing code.
* Rename LLE_FREE_LOCKED() callback from in[6]_lltable_free() to
  in_[6]lltable_destroy() to avoid clashing with llt_free_entry() method.
  Update description from these functions.
* Use unified lltable_free_entry() function instead of per-af one.

Reviewed by:	ae
2015-08-10 12:03:59 +00:00
Marius Strobl
6e4cd74673 Fix compilation after r286457 w/o INVARIANTS or INVARIANT_SUPPORT. 2015-08-08 21:41:59 +00:00
Alexander V. Chernikov
4bdf0b6a9a MFP r274295:
* Move interface route cleanup to route.c:rt_flushifroutes()
* Convert most of "for (fibnum = 0; fibnum < rt_numfibs; fibnum++)" users
  to use new rt_foreach_fib() instead of hand-rolling cycles.
2015-08-08 18:14:59 +00:00
Alexander V. Chernikov
e362cf0e9f MFP r274553:
* Move lle creation/deletion from lla_lookup to separate functions:
  lla_lookup(LLE_CREATE) -> lla_create
  lla_lookup(LLE_DELETE) -> lla_delete
lla_create now returns with LLE_EXCLUSIVE lock for lle.
* Provide typedefs for new/existing lltable callbacks.

Reviewed by:	ae
2015-08-08 17:48:54 +00:00
Alexander V. Chernikov
331dff0737 Simplify ip[6] simploop:
Do not pass 'dst' sockaddr to ip[6]_mloopback:
  - We have explicit check for AF_INET in ip_output()
  - We assume ip header inside passed mbuf in ip_mloopback
  - We assume ip6 header inside passed mbuf in ip6_mloopback
2015-08-08 15:58:35 +00:00
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Andrey V. Elsukov
51a01baf23 Properly handle IPV6_NEXTHOP socket option in selectroute().
o remove disabled code;
 o if nexthop address is link-local, use embedded scope zone id to
   determine outgoing interface;
 o properly fill ro_dst before doing route lookup;
 o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit.

Sponsored by:	Yandex LLC
2015-08-02 12:40:56 +00:00
Andrey V. Elsukov
a6f7dea1fe Remove redundant check. 2015-08-02 11:58:24 +00:00
Andrey V. Elsukov
10a0e0bf0a Eliminate the use of m_copydata() in gif_encapcheck().
ip_encap already has inspected mbuf's data, at least an IP header.
And it is safe to use mtod() and do direct access to needed fields.
Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that
mbuf has a packet header.
Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also
remove "martian filters" checks. According to RFC 4213 it is enough to
verify that the source address is the address of the encapsulator, as
configured on the decapsulator.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 14:07:43 +00:00
Andrey V. Elsukov
cc0a3c8ca4 Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
Michael Tuexen
4ff815b71c Move including netinet/icmp6.h around to avoid a problem when including
netinet/icmp6.h and net/netmap.h. Both use ni_flags...
This allows to build multistack with SCTP support.

MFC after: 1 week
2015-07-25 18:26:09 +00:00
Randall Stewart
f260c1b939 Fix inverted logic bug that David Wolfskill found (thanks David!)
MFC after:	3 Weeks
2015-07-22 09:29:50 +00:00
Randall Stewart
c0d1be08f6 When a tunneling protocol is being used with UDP we must release the
lock on the INP before calling the tunnel protocol, else a LOR
may occur (it does with SCTP for sure). Instead we must acquire a
ref count and release the lock, taking care to allow for the case
where the UDP socket has gone away and *not* unlocking since the
refcnt decrement on the inp will do the unlock in that case.

Reviewed by:	tuexen
MFC after:	3 weeks
2015-07-21 09:54:31 +00:00
Andrey V. Elsukov
30aee13117 Add LLE event handler to report ND6 events to userland via rtsock.
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:58:32 +00:00
Andrey V. Elsukov
585753c432 Invoke LLE event handler when entry is deleted.
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:54:50 +00:00
Andrey V. Elsukov
cb207f93ca Keep IPv6 address specified by IPV6_PKTINFO socket option in kernel
internal form to be able handle link-local IPv6 addresses.

Reported by:	kp
Tested by:	kp
2015-07-03 19:01:38 +00:00
Bjoern A. Zeeb
bfbc08b848 Move comment to the right position.
PR:		152791
Submitted by:	vangyzen (as part of the functional change)
MFC after:	3 days
2015-07-03 09:53:56 +00:00
Michael Tuexen
d089f9b915 Add FIB support for SCTP.
This fixes https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=200379

MFC after: 3 days
2015-06-17 15:20:14 +00:00
Andrey V. Elsukov
4e870f943f Move RTM announces into generic code to be independent from Layer2 code.
This fixes bug introduced in 274988, when announces about new addresses
don't sent for tunneling interfaces.

Reported by:	tuexen@
MFC after:	1 week
2015-05-29 10:24:16 +00:00
Michael Tuexen
b7d130befc Fix and cleanup the debug information. This has no user-visible changes.
Thanks to Irene Ruengeler for proving a patch.

MFC after: 3 days
2015-05-28 16:00:23 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Andrey V. Elsukov
c1b4f79dfa Add an ability accept encapsulated packets from different sources by one
gif(4) interface. Add new option "ignore_source" for gif(4) interface.
When it is enabled, gif's encapcheck function requires match only for
packet's destination address.

Differential Revision:	https://reviews.freebsd.org/D2004
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-05-15 12:19:45 +00:00
Hiroki Sato
59333867ff - Remove ND6_IFF_IGNORELOOP. This functionality was useless in practice
because a link where looped back NS messages are permanently observed
  does not work with either NDP or ARP for IPv4.

- draft-ietf-6man-enhanced-dad is now RFC 7527.

Discussed with:	hiren
MFC after:	3 days
2015-05-12 03:31:57 +00:00
Andrey V. Elsukov
654bdb5abb Mark data checksum as valid for multicast packets, that we send back
to myself via simloop.
Also remove duplicate check under #ifdef DIAGNOSTIC.

PR:		180065
MFC after:	1 week
2015-05-07 14:17:43 +00:00
Andrey V. Elsukov
db037aa4ed Remove unneded #ifdef INET6 and IPSEC. This file compiled only when
both options are defined.
Include opt_sctp.h and sctp_crc32.h to enable #ifdef SCTP code block
and delayed checksum calculation for SCTP.
2015-05-07 12:15:45 +00:00
Gleb Smirnoff
0fa5aacd8b Remove #ifdef IFT_FOO.
Submitted by:	Guy Yur <guyyur gmail.com>
2015-05-02 20:31:27 +00:00