1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-29 12:03:03 +00:00
Commit Graph

171 Commits

Author SHA1 Message Date
Eric Joyner
afb7737237 iflib: use default ntxd and nrxd when user value is not power of 2
From Jake:
A user may set a sysctl to override the default number of Tx or Rx
descriptors. However, certain calculations in the iflib core expect the
number of descriptors to be a power of 2.

Update _iflib_assert to verify that all of the shared context parameters
for the number of descriptors are powers of 2.

Modify iflib_reset_qvalues to check that the provided isc_nrxd value is
a power of 2. If it's not, print a warning message and then use the
default value.

An alternative might be to try rounding the number down instead.
However, this creates problems in case the rounded down value is below
the minimum value that the driver would support.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	marius@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19880
2019-05-10 00:41:42 +00:00
Marius Strobl
007b804fc7 Allow to build without INET and INET6 again after r347221.
Submitted by:	cam
2019-05-08 09:03:43 +00:00
Marius Strobl
3d10e9ed62 o Use iflib_fast_intr_rxtx() also for "legacy" interrupts, i. e. INTx and
MSI. Unlike as with iflib_fast_intr_ctx(), the former will also enqueue
  _task_fn_tx() in addition to _task_fn_rx() if appropriate, bringing TCP
  TX throughput of EM-class devices on par with the MSI-X case and, thus,
  close to wirespeed/pre-iflib(4) times again. [1]
  Note that independently of the interrupt type, the UDP performance with
  these MACs still is abysmal and nowhere near to where it was before the
  conversion of em(4) to iflib(4).
o In iflib_init_locked(), announce which free list failed to set up.
o In _task_fn_tx() when running netmap(4), issue ifdi_intr_enable instead
  of the ifdi_tx_queue_intr_enable method in case of a "legacy" interrupt
  as the latter is valid with MSI-X only.
o Instead of adding the missing - and apparently convoluted enough that a
  DBG_COUNTER_INC was put into a wrong spot in _task_fn_rx() - checks for
  ifdi_{r,t}x_queue_intr_enable being available in the MSI-X case also to
  iflib_fast_intr_rxtx(), factor these out to iflib_device_register() and
  make the checks fail gracefully rather than panic. This avoids invoking
  the checks at runtime over and over again in iflib_fast_intr_rxtx() and
  _task_fn_{r,t}x() - even if it's just in case of INVARIANTS - and makes
  these functions more readable.
o In iflib_rx_structures_setup(), only initialize LRO resources if device
  and driver have LRO capability in order to not waste memory. Also, free
  the LRO resources again if setting them up fails for one of the queues.
  However, don't bother invoking iflib_rx_sds_free() in that case because
  iflib_rx_structures_setup() doesn't call iflib_rxsd_alloc() either (and
  iflib_{device,pseudo}_register() will issue iflib_rx_sds_free() in case
  of failure via iflib_rx_structures_free(), but there definitely is some
  asymmetry left to be fixed, though).
o Similarly, free LRO resources again in iflib_rx_structures_free().
o In iflib_irq_set_affinity(), handle get_core_offset() errors gracefully
  instead of panicing (but only in case of INVARIANTS). This is a follow-
  up to r344132, as such driver bugs shouldn't be fatal.
o Likewise, handle unknown iflib_intr_type_t in iflib_irq_alloc_generic()
  gracefully, too.
o Bring yet more sanity to iflib_msix_init():
  - If the device doesn't provide enough MSI-X vectors or not all vectors
    can be allocate so the expected number of queues in addition to admin
    interrupts can't be supported, try MSI next (and then INTx) as proper
    MSI-X vector distribution can't be assured in such cases. In essence,
    this change brings r254008 forward to iflib(4). Also, this is the fix
    alluded to in the commit message of r343934.
  - If the MSI-X allocation has failed, don't prematurely announce MSI is
    going to be used as the latter in fact may not be available either.
  - When falling back to MSI, only release the MSI-X table resource again
    if it was allocated in iflib_msix_init(), i. e. isn't supplied by the
    driver, in the first place.
o In mp_ndesc_handler(), handle unknown type arguments gracefully, too.

PR:		235031 (likely) [1]
Reviewed by:	shurd
Differential Revision:	https://reviews.freebsd.org/D20175
2019-05-07 08:28:35 +00:00
Marius Strobl
1722eeac95 - Remove the unused ifc_link_irq and ifc_mtx_name members of struct iflib_ctx.
- Remove the only ever written to ift_db_mtx_name member of struct iflib_txq.
- Remove the unused or only ever written to ifr_size, ifr_cq_pidx, ifr_cq_gen
  and ifr_lro_enabled members of struct iflib_rxq.
- Consistently spell DMA, RX and TX uppercase in comments, messages etc.
  instead of mixing with some lowercase variants.
- Consistently use if_t instead of a mix of if_t and struct ifnet pointers.
- Bring the function comments of _iflib_fl_refill(), iflib_rx_sds_free() and
  iflib_fl_setup() in line with reality.
- Judging problem reports, people are wondering what on earth messages like:
  "TX(0) desc avail = 1024, pidx = 0"
  are trying to indicate. Thus, extend this string to be more like that of
  non-iflib(4) Ethernet MAC drivers, notifying about a watchdog timeout due
  to which the interface will be reset.
- Take advantage of the M_HAS_VLANTAG macro.
- Use false/true rather than FALSE/TRUE for variables of type bool.
- Use FALLTHROUGH as advocated by style(9).
2019-05-06 20:56:41 +00:00
Matt Macy
e2621d9657 Allow iflib drivers to pass a pointer to their own ifmedia structure.
Tested by: emaste@

Differential Revision:	https://reviews.freebsd.org/D19946
2019-05-03 20:05:31 +00:00
Ed Maste
ce3da455e9 iflib: remove assertion that isc_capabilities is nonzero
It's atypical, but not invalid, for a driver to pass no capabilities.

Submitted by:	Gerald Aryeetey <aryeeteygerald_rogers.com>
Reviewed by:	shurd
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D20142
2019-05-02 19:13:31 +00:00
Stephen Hurd
f154ece02e iflib: Better control over queue core assignment
By default, cores are now assigned to queues in a sequential
manner rather than all NICs starting at the first core. On a four-core
system with two NICs each using two queue pairs, the nic:queue -> core
mapping has changed from this:

0:0 -> 0, 0:1 -> 1
1:0 -> 0, 1:1 -> 1

To this:

0:0 -> 0, 0:1 -> 1
1:0 -> 2, 1:1 -> 3

Additionally, a device can now be configured to use separate cores for TX
and RX queues.

Two new tunables have been added, dev.X.Y.iflib.separate_txrx and
dev.X.Y.iflib.core_offset. If core_offset is set, the NIC is not part
of the auto-assigned sequence.

Reviewed by:	marius
MFC after:	2 weeks
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D20029
2019-04-25 21:24:56 +00:00
Andrew Gallatin
6d49b41ee8 iflib: Add pfil hooks
As with mlx5en, the idea is to drop unwanted traffic as early
in receive as possible, before mbufs are allocated and anything
is passed up the stack.  This can save considerable CPU time
when a machine is under a flooding style DOS attack.

The major change here is to remove the unneeded abstraction where
callers of rxd_frag_to_sd() get back a pointer to the mbuf ring, and
are responsible for NULL'ing that mbuf themselves. Now this happens
directly in rxd_frag_to_sd(), and it returns an mbuf. This allows us
to use the decision (and potentially mbuf) returned by the pfil
hooks. The driver can now recycle mbufs to avoid re-allocation when
packets are dropped.

Reviewed by:	marius  (shurd and erj also provided feedback)
Sponsored by:	Netflix
Differential Revision:	https://reviews.freebsd.org/D19645
2019-04-24 13:32:04 +00:00
Kyle Evans
1fd8c72c0a iflib: Use new ether_gen_addr, restricting addresses to that subset
Differential Revision:	https://reviews.freebsd.org/D19587
2019-04-17 17:19:54 +00:00
Eric Joyner
225eae1bb7 iflib: return ENETDOWN when the network device is down
From Jake:
iflib_if_transmit returns ENOBUFS when the device is down, or when the
link isn't active.

This was changed in r308792 from return (0), so that the function
correctly reports an error that it was unable to transmit.

However, using ENOBUFS can cause some network applications to produce
the following or similar errors:

"ping: sendto: No buffer space available"

This is a bit confusing as the real cause of the issue is that the
network device is down.

Replace the ENOBUFS return with ENETDOWN to indicate more clearly that
the reason for the failure to send is due to the network device is
offline.

This will cause the error message to be reported as

"ping: sendto: Network is down"

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, sbruno@, bz@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19652
2019-03-28 20:46:45 +00:00
Eric Joyner
aac9c817af iflib: hold the CTX lock in iflib_pseudo_register
From Jake:
The iflib_device_register function takes the CTX lock before calling
IFDI_ATTACH_PRE, and releases it upon finishing the registration.

Mirror this process in iflib_pseudo_register, so that we always hold the
CTX lock during the attach process when registering a pseudo interface
or a regular interface.

This was caught by code inspection while attempting to analyze where the
CTX lock was held.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, erj@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19604
2019-03-28 20:43:47 +00:00
Eric Joyner
10a1e981d4 iflib: mark isc_driver_version as constant
From Jake:
The iflib core never modifies the isc_driver_version string. Allow
drivers to safely assign pointers to constant buffers by marking this
parameter const.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	erj@, gallatin@, jhb@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19577
2019-03-19 23:44:26 +00:00
Eric Joyner
1b9d93948a iflib: expose the Rx mbuf buffer size to drivers
From Jake:
iflib_fl_setup calculates a suitable buffer size for the Rx mbufs based
on the isc_max_frame_size value that drivers setup. This calculation is
repeated by drivers when programming their hardware with the size of
each Rx buffer.

This can lead to a mismatch where the iflib mbuf size is different from
the expected size of the buffer as programmed by the hardware. This can
lead to unexpected results.

If iflib ever wants to support mbuf sizes larger than one page, every
driver must be updated to account for the new possible buffer sizes.

Fix this by calculating the mbuf size prior to calling IFDI_INIT, and
adding the iflib_get_rx_mbuf_sz function which will expose this value to
drivers, so that they do not repeat the same calculation.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, erj@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19489
2019-03-19 17:59:56 +00:00
Eric Joyner
3e8d1bae5f iflib: prevent possible infinite loop in iflib_encap
From Jake:
iflib_encap calls bus_dmamap_load_mbuf_sg. Upon it returning EFBIG, an
m_collapse and an m_defrag are attempted to shrink the mbuf cluster to
fit within the DMA segment limitations.

However, if we call m_defrag, and then bus_dmamap_load_mbuf_sg returns
EFBIG on the now defragmented mbuf, we will continuously re-call
bus_dmamap_load_mbuf_sg over and over.

This happens because m_head isn't NULL, and remap is >1, so we don't try
to m_collapse or m_defrag again. The only way we exit the loop is if
m_head is NULL. However, m_head can't be modified by the call to
bus_dmamap_load_mbuf_sg, because we don't pass it as a double pointer.

I believe this will be an incredibly rare occurrence, because it is
unlikely that bus_dmamap_load_mbuf_sg will actually fail on the second
defragment with an EFBIG error. However, it still seems like
a possibility that we should account for.

Fix the exit check to ensure that if remap is >1, we will also exit,
even if m_head is not NULL.

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@, gallatin@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19468
2019-03-19 17:49:03 +00:00
Eric Joyner
bc408c7d61 Remove references to CONTIGMALLOC_WORKS in iflib and em
From Jake:
"The iflib_fl_setup() function tries to pick various buffer sizes based
on the max_frame_size value defined by the parent driver. However, this
code was wrapped under CONTIGMALLOC_WORKS, which was never actually
defined anywhere.

This same code pattern was used in if_em.c, likely trying to match
what iflib uses.

Since CONTIGMALLOC_WORKS is not defined, remove this dead code from
iflib_fl_setup and if_em.c

Given that various iflib drivers appear to be using a similar
calculation, it might be worth making this buffer size a value that the
driver can peek at in the future."

Submitted by:	Jacob Keller <jacob.e.keller@intel.com>
Reviewed by:	shurd@
MFC after:	1 week
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D19199
2019-03-05 19:12:51 +00:00
Stephen Hurd
ca62461bc6 iflib: Improve return values of interrupt handlers.
iflib was returning FILTER_HANDLED, in cases where FILTER_STRAY was more
correct. This potentially caused issues with shared legacy interrupts.

Driver filters returning FILTER_STRAY are now properly handled.

Submitted by:	Augustin Cavalier <waddlesplash@gmail.com>
Reviewed by:	marius, gallatin
Obtained from:	Haiku (a84bb9, 4947d1)
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D19201
2019-02-15 18:51:43 +00:00
Marius Strobl
a6611c938b Fix the build with ALTQ after r344060. 2019-02-12 22:33:17 +00:00
Marius Strobl
f855ec814d Make taskqgroup_attach{,_cpu}(9) work across architectures
So far, intr_{g,s}etaffinity(9) take a single int for identifying
a device interrupt. This approach doesn't work on all architectures
supported, as a single int isn't sufficient to globally specify a
device interrupt. In particular, with multiple interrupt controllers
in one system as found on e. g. arm and arm64 machines, an interrupt
number as returned by rman_get_start(9) may be only unique relative
to the bus and, thus, interrupt controller, a certain device hangs
off from.
In turn, this makes taskqgroup_attach{,_cpu}(9) and - internal to
the gtaskqueue implementation - taskqgroup_attach_deferred{,_cpu}()
not work across architectures. Yet in turn, iflib(4) as gtaskqueue
consumer so far doesn't fit architectures where interrupt numbers
aren't globally unique.
However, at least for intr_setaffinity(..., CPU_WHICH_IRQ, ...) as
employed by the gtaskqueue implementation to bind an interrupt to a
particular CPU, using bus_bind_intr(9) instead is equivalent from
a functional point of view, with bus_bind_intr(9) taking the device
and interrupt resource arguments required for uniquely specifying a
device interrupt.
Thus, change the gtaskqueue implementation to employ bus_bind_intr(9)
instead and intr_{g,s}etaffinity(9) to take the device and interrupt
resource arguments required respectively. This change also moves
struct grouptask from <sys/_task.h> to <sys/gtaskqueue.h> and wraps
struct gtask along with the gtask_fn_t typedef into #ifdef _KERNEL
as userland likes to include <sys/_task.h> or indirectly drags it
in - for better or worse also with _KERNEL defined -, which with
device_t and struct resource dependencies otherwise is no longer
as easily possible now.
The userland inclusion problem probably can be improved a bit by
introducing a _WANT_TASK (as well as a _WANT_MOUNT) akin to the
existing _WANT_PRISON etc., which is orthogonal to this change,
though, and likely needs an exp-run.

While at it:
- Change the gt_cpu member in the grouptask structure to be of type
  int as used elswhere for specifying CPUs (an int16_t may be too
  narrow sooner or later),
- move the gtaskqueue_enqueue_fn typedef from <sys/gtaskqueue.h> to
  the gtaskqueue implementation as it's only used and needed there,
- change the GTASK_INIT macro to use "gtask" rather than "task" as
  argument given that it actually operates on a struct gtask rather
  than a struct task, and
- let subr_gtaskqueue.c consistently use __func__ to print functions
  names.

Reported by:	mmel
Reviewed by:	mmel
Differential Revision:	https://reviews.freebsd.org/D19139
2019-02-12 21:23:59 +00:00
Marius Strobl
95dcf343b7 Further correct and optimize the bus_dma(9) usage of iflib(4):
o Correct the obvious bugs in the netmap(4) parts:
  - No longer check for the existence of DMA maps as bus_dma(9)
    is used unconditionally in iflib(4) since r341095.
  - Supply the correct DMA tag and map pairs to bus_dma(9)
    functions (see also the commit message of r343753).
  - In iflib_netmap_timer_adjust(), add synchronization of the
    TX descriptors before calling the ift_txd_credits_update
    method as the latter evaluates the TX descriptors possibly
    updated by the MAC.
  - In _task_fn_tx(), wrap the netmap(4)-specific bits in
    #ifdef DEV_NETMAP just as done in _task_fn_admin() and
    _task_fn_rx() respectively.
o In iflib_fast_intr_rxtx(), synchronize the TX rather than
  the RX descriptors before calling the ift_txd_credits_update
  method (see also above).
o There's no need to synchronize an RX buffer that is going to
  be recycled in iflib_rxd_pkt_get(), yet; it's sufficient to
  do that as late as passing RX buffers to the MAC via the
  ift_rxd_refill method. Hence, combine that synchronization
  with the synchronization of new buffers into a common spot
  in _iflib_fl_refill().
o There's no need to synchronize the RX descriptors of a free
  list in preparation of the MAC updating their statuses with
  every invocation of rxd_frag_to_sd(); it's enough to do this
  once before handing control over to the MAC, i. e. before
  calling ift_rxd_flush method in _iflib_fl_refill(), which
  already performs the necessary synchronization.
o Given that the ift_rxd_available method evaluates the RX
  descriptors which possibly have been altered by the MAC,
  synchronize as appropriate beforehand. Most notably this
  is now done in iflib_rxd_avail(), which in turn means that
  we don't need to issue the same synchronization yet again
  before calling the ift_rxd_pkt_get method in iflib_rxeof().
o In iflib_txd_db_check(), synchronize the TX descriptors
  before handing them over to the MAC for transmission via
  the ift_txd_flush method.
o In iflib_encap(), move the TX buffer synchronization after
  the invocation of the ift_txd_encap() method. If the MAC
  driver fails to encapsulate the packet and we retry with
  a defragmented mbuf chain or finally fail, the cycles for
  TX buffer synchronization have been wasted. Synchronizing
  afterwards matches what non-iflib(4) drivers typically do
  and is sufficient as the MAC will not actually start with
  the transmission before - in this case - the ift_txd_flush
  method is called.
  Moreover, for the latter reason the synchronization of the
  TX descriptors in iflib_encap() can go as it's enough to
  synchronize them before passing control over to the MAC by
  issuing the ift_txd_flush() method (see above).
o In iflib_txq_can_drain(), only synchronize TX descriptors
  if the ift_txd_credits_update method accessing these is
  actually called.

Differential Revision:	https://reviews.freebsd.org/D19081
2019-02-12 21:08:44 +00:00
Marius Strobl
bfce461ee9 o As illustrated by e. g. figure 7-14 of the Intel 82599 10 GbE
controller datasheet revision 3.3, in the context of Ethernet
  MACs the control data describing the packet buffers typically
  are named "descriptors". Each of these descriptors references
  one buffer, multiple of which a packet can be composed of.
  By contrast, in comments, messages and the names of structure
  members, iflib(4) refers to DMA resources employed for RX and
  TX buffers (rather than control data) as "desc(riptors)".
  This odd naming convention of iflib(4) made reviewing r343085
  and identifying wrong and missing bus_dmamap_sync(9) calls in
  particular way harder than it already is. This convention may
  also explain why the netmap(4) part of iflib(4) pairs the DMA
  tags for control data with DMA maps of buffers and vice versa
  in calls to bus_dma(9) functions.
  Therefore, change iflib(4) to refer to buf(fers) when buffers
  and not the usual understanding of descriptors is meant. This
  change does not include corrections to the DMA resources used
  in the netmap(4) parts. However, it revises error messages to
  state which kind of allocation/creation failed. Specifically,
  the "Unable to allocate tx_buffer (map) memory" copy & pasted
  inappropriately on several occasions was replaced with proper
  messages.
o Enhance some other error messages to indicate which half - RX
  or TX - they apply to instead of using identical text in both
  cases and generally canonicalize them.
o Correct the descriptions of iflib_{r,t}xsd_alloc() to reflect
  reality; current code doesn't use {r,t}x_buffer structures.
o In iflib_queues_alloc():
  - Remove redundant BUS_DMA_NOWAIT of iflib_dma_alloc() calls,
  - change the M_WAITOK from malloc(9) calls into M_NOWAIT. The
    return values are already checked, deferred DMA allocations
    not being an option at this point, BUS_DMA_NOWAIT has to be
    used anyway and prior malloc(9) calls in this function also
    specify M_NOWAIT.

Reviewed by:	shurd
Differential Revision:	https://reviews.freebsd.org/D19067
2019-02-04 20:46:57 +00:00
Marius Strobl
b97de13ae0 - Stop iflib(4) from leaking MSI messages on detachment by calling
bus_teardown_intr(9) before pci_release_msi(9).
- Ensure that iflib(4) and associated drivers pass correct RIDs to
  bus_release_resource(9) by obtaining the RIDs via rman_get_rid(9)
  on the corresponding resources instead of using the RIDs initially
  passed to bus_alloc_resource_any(9) as the latter function may
  change those RIDs. Solely em(4) for the ioport resource (but not
  others) and bnxt(4) were using the correct RIDs by caching the ones
  returned by bus_alloc_resource_any(9).
- Change the logic of iflib_msix_init() around to only map the MSI-X
  BAR if MSI-X is actually supported, i. e. pci_msix_count(9) returns
  > 0. Otherwise the "Unable to map MSIX table " message triggers for
  devices that simply don't support MSI-X and the user may think that
  something is wrong while in fact everything works as expected.
- Put some (mostly redundant) debug messages emitted by iflib(4)
  and em(4) during attachment under bootverbose. The non-verbose
  output of em(4) seen during attachment now is close to the one
  prior to the conversion to iflib(4).
- Replace various variants of spelling "MSI-X" (several in messages)
  with "MSI-X" as used in the PCI specifications.
- Remove some trailing whitespace from messages emitted by iflib(4)
  and change them to consistently start with uppercase.
- Remove some obsolete comments about releasing interrupts from
  drivers and correct a few others.

Reviewed by:	erj, Jacob Keller, shurd
Differential Revision:	https://reviews.freebsd.org/D18980
2019-01-30 13:21:26 +00:00
Marius Strobl
3db348b54a - In _iflib_fl_refill(), don't mark an RX buffer as available in the
corresponding bitmap before adding an mbuf has actually succeeded.
  Previously, m_gethdr(M_NOWAIT, ...) failing caused a "hole" in the
  RX ring but not in its bitmap. One implication of such a hole was
  that in a subsequent call to _iflib_fl_refill() with the RX buffer
  accounting still indicating another reclaimable buffer, bit_ffc(3)
  nevertheless returned -1 in frag_idx which in turn caused havoc
  when used as an index. Thus, additionally assert that frag_idx is
  0 or greater.
  Another possible consequence of a hole in the RX ring was a NULL-
  dereference when trying to use the unallocated mbuf, for example
  in iflib_rxd_pkt_get().

  While at it, make the variable declarations in _iflib_fl_refill()
  conform to style(9) and remove redundant checks already performed
  by bit_ffc{,_at}(3).

- In iflib_queues_alloc(), don't pass redundant M_ZERO to bit_alloc(3).

Reported and tested by: pho
2019-01-26 21:35:51 +00:00
Andrew Gallatin
77102fd6a2 Fix an iflib driver unload panic introduced in r343085
The new loop to sync and unload descriptors was indexed
by "i", rather than "j".   The panic was caused by "i"
being advanced rather than "j", and eventually becoming
out of bounds.

Reviewed by:	kib
MFC after:	3 days
Sponsored by:	Netflix
2019-01-25 15:02:18 +00:00
Patrick Kelsey
8f82136aec onvert vmx(4) to being an iflib driver.
Also, expose IFLIB_MAX_RX_SEGS to iflib drivers and add
iflib_dma_alloc_align() to the iflib API.

Performance is generally better with the tunable/sysctl
dev.vmx.<index>.iflib.tx_abdicate=1.

Reviewed by:	shurd
MFC after:	1 week
Relnotes:	yes
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D18761
2019-01-22 01:11:17 +00:00
Patrick Kelsey
7f3eb9dab3 Fix various resource leaks that can occur in the error paths of
iflib_device_register() and iflib_pseudo_register().

Reviewed by:	shurd
MFC after:	1 week
Sponsored by:	RG Nets
Differential Revision:	https://reviews.freebsd.org/D18760
2019-01-22 00:56:44 +00:00
Konstantin Belousov
8a04b53dce Improve iflib busdma(9) KPI use.
- Specify BUS_DMA_NOWAIT for bus_dmamap_load() on rx refill, since
  callbacks are not supposed to be used.
- Match tso/non-tso tags to corresponding tx map operations.  Create
  separate tso maps for tx descriptors.  In particular, do not use
  non-tso tag to load, unload, or destroy a map created with tso tag.
- Add missed bus_dmamap_sync() calls.
  Submitted by: marius.

Reported and tested by:	pho
Reviewed by:	marius
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2019-01-16 05:44:14 +00:00
Stephen Hurd
cd28ea929a Use iflib_if_init_locked() during resume instead of iflib_init_locked().
iflib_init_locked() assumes that iflib_stop() has been called, however,
it is not called for suspend.  iflib_if_init_locked() calls stop then init,
so fixes the problem.

This was causing errors after a resume from suspend.

PR:		224059
Reported by:	zeising
MFC after:	1 week
Sponsored by:	Limelight Networks
2019-01-07 23:46:54 +00:00
Konstantin Belousov
85f3b801e9 Fix typo, use boolean operator instead of bit-wise.
Reviewed by:	marius, shurd
MFC after:	3 days
Sponsored by:	The FreeBSD Foundation
2019-01-03 01:01:03 +00:00
Stephen Hurd
7124b5ba04 Fix !tx_abdicate error from r336560
r336560 was supposed to restore pre-r323954 behaviour when tx_abdicate is
not set (the default case). However, it appears that rather than the drainage
check being made conditional on tx_abdicate being set, it was duplicated
so it occured twice if tx_abdicate was set and once if it was not.

Now when !tx_abdicate, drainage is only checked if the doorbell isn't
pending.

Reported by:    lev
MFC after:      1 week
Sponsored by:   Limelight Networks
2018-12-11 17:46:01 +00:00
Andrew Gallatin
fbec776de0 Use busdma unconditionally in iflib
- Remove the complex mechanism to choose between using busdma
and raw pmap_kextract at runtime.   The reduced complexity makes
the code easier to read and maintain.

- Fix a bug in the small packet receive path where clusters were
repeatedly mapped but never unmapped. We now store the cluster's
bus address and avoid re-mapping the cluster each time a small
packet is received.

This patch fixes bugs I've seen where ixl(4) will not even
respond to ping without seeing DMAR faults.

I see a small improvement (14%) on packet forwarding tests using
a Haswell based Xeon E5-2697 v3.  Olivier sees a small
regression (-3% to -6%) with lower end hardware.

Reviewed by:	mmacy
Not objected to by:	sbruno
MFC after:	8 weeks
Sponsored by:	Netflix, Inc
Differential Revision:		https://reviews.freebsd.org/D17901
2018-11-27 20:01:05 +00:00
Stephen Hurd
0efb1a464f Clear RX completion queue state veriables in iflib_stop()
iflib_stop() was not resetting the rxq completion queue state variables.
This meant that for any driver that has receive completion queues, after a
reinit, iflib would start asking what's available on the rx side starting at
whatever the completion queue index was prior to the stop, instead of at 0.

Submitted by:	pkelsey
Reported by:	pkelsey
MFC after:	3 days
Sponsored by:	Limelight Networks
2018-11-14 20:36:18 +00:00
Stephen Hurd
8d4ceb9cc4 Prevent POLA violation with TSO/CSUM offload
Ensure that any time CSUM_IP_TSO or CSUM_IP6_TSO is set that the corresponding
CSUM_IP6?_TCP / CSUM_IP flags are also set.

Rather than requireing drivers to bake-in an understanding that TSO implies
checksum offloads, make it explicit.

This change requires us to move the IFLIB_NEED_ZERO_CSUM implementation to
ensure it's zeroed for TSO.

Reported by:	Jacob Keller <jacob.e.keller@intel.com>
MFC after:	1 week
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17801
2018-11-14 15:23:39 +00:00
Stephen Hurd
4d261ce278 Fix leaks caused by ifc_nhwtxqs never being initialized
r333502 removed initialization of ifc_nhwtxqs, and it's not clear
there's a need to copy it into the struct iflib_ctx at all. Use
ctx->ifc_sctx->isc_ntxqs instead.

Further, iflib_stop() did not clear the last ring in the case where
isc_nfl != isc_nrxqs (such as when IFLIB_HAS_RXCQ is set). Use
ctx->ifc_sctx->isc_nrxqs here instead of isc_nfl.

Reported by:	pkelsey
Reviewed by:	pkelsey
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17979
2018-11-14 15:16:45 +00:00
Stephen Hurd
a42546df88 Fix rxcsum issue introduced in r338838
r338838 attempted to fix issues with rxcsum and rxcsum6.
However, the rxcsum bits were set as though if_setcapenablebit() was
being called, not if_togglecapenable() which is in use. As a result,
it was not possible to disable rxcsum when rxcsum6 was supported.

PR:		233004
Reported by:	lev
Reviewed by:	lev
MFC after:	3 days
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17881
2018-11-07 19:31:48 +00:00
Eric Joyner
46fa0c2552 Revert r339634.
That commit is causing kernel panics in em(4), so this will be reverted
until those are fixed.

Reported by:	ae@, pho@, et al
Sponsored by:	Intel Corporation
2018-10-23 17:06:36 +00:00
Eric Joyner
940f62d616 iflib: drain enqueued tasks before detaching from taskqgroup
The taskqgroup_detach function does not check if task is already enqueued when
detaching it. This may lead to kernel panic if enqueued task starts after
context state lock is destroyed. Ensure that the already enqueued admin tasks
are executed before detaching them.

The issue was discovered during validation of D16429. Unloading of if_ixlv
followed by immediate removal of VFs with iovctl -D may lead to panic on
NODEBUG kernel.

As well, check if iflib is in detach before enqueueing new admin or iov
tasks, to prevent new tasks from executing while the taskqgroup tasks
are being drained.

Submitted by:	Krzysztof Galazka <krzysztof.galazka@intel.com>
Reviewed by:	shurd@, erj@
Sponsored by:	Intel Corporation
Differential Revision:	https://reviews.freebsd.org/D17404
2018-10-23 04:37:29 +00:00
Eric Joyner
77c1fcec91 ixl/iavf(4): Change ixlv to iavf and update it to use iflib(9)
Finishes the conversion of the 40Gb Intel Ethernet drivers to iflib(9) for
FreeBSD 12.0, and fixes numerous bugs in both ixl(4) and iavf(4).

This commit also re-adds the VF driver to GENERIC since it now compiles and
functions.

The VF driver name was changed from ixlv(4) to iavf(4) because the VF driver is
now intended to be used with future products, not just with Fortville/Fort Park
VFs.

A man page update that documents these drivers is forthcoming in a separate
commit.

Reviewed by:    sbruno@, kbowling@
Tested by:      jeffrey.e.pieper@intel.com
Approved by:	re (gjb@)
Relnotes:       yes
Sponsored by:   Intel Corporation
Differential Revision: https://reviews.freebsd.org/D16429
2018-10-12 22:40:54 +00:00
Stephen Hurd
0c919c2370 Fix capabilities handling for iflib drivers
Various capabilities were not being handled correctly in the
SIOCSIFCAP handler. Specifically:

IFCAP_RXCSUM and IFCAP_RXCSUM_IPV6 could be set even if not supported

It was impossible to disable IFCAP_RXCSUM and/or IFCAP_RXCSUM_IPV6 via
ifconfig since it does ioctl() per command-line flag rather than combine
them into a single call.

IFCAP_VLAN_HWCSUM could not be modified via the ioctl()

Setting any combination of the three IFCAP_WOL flags would set only
IFCAP_WOL_MCAST | IFCAP_WOL_MAGIC. For example, setting only
IFCAP_WOL_UCAST would result in both IFCAP_WOL_MCAST and IFCAP_WOL_MAGIC
being enabled, but IFCAP_WOL_UCAST would not be enabled.

Because if_vlancap() was called before if_togglecapenable(), vlan flags
were sometimes not applied correctly.

Interfaces were being unnecessarily stopped and restarted for WoL

PR:		231151
Submitted by:	Kaho Toshikazu <kaho@elam.kais.kyoto-u.ac.jp>
Reported by:	Shirkdog <mshirk@daemon-security.com>
Reviewed by:	galladin
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D17158
2018-09-20 19:35:35 +00:00
Stephen Hurd
64e6fc1379 Clean up iflib sysctls
Remove sysctls:
txq_drain_encapfail - now a duplicate of encap_txd_encap_fail
intr_link - was never incremented
intr_msix - was never incremented
rx_zero_len - was never incremented

The following were not incremented in all code-paths that apply:
m_pullups, mbuf_defrag, rxd_flush, tx_encap, rx_intr_enables, tx_frees,
encap_txd_encap_fail.

Fixes:
Replace the broken collapse_pkthdr() implementation with an MPASS().
fl_refills and fl_refills_large were not incremented when using netmap.

Reviewed by:	gallatin
Approved by:	re (marius)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16733
2018-09-06 18:51:52 +00:00
Stephen Hurd
bc0e855bd9 Fix compile error due to missing parenthesis in r338372
Approved by:	re (gjb)
2018-08-29 16:21:34 +00:00
Stephen Hurd
a520f8b6fe Fix potential data corruption in iflib
The MP ring may have txq pointers enqueued.  Previously, these were
passed to m_free() when IFC_QFLUSH was set.  This patch checks for
the value and doesn't call m_free().

Reviewed by:	gallatin
Approved by:	re (gjb)
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16882
2018-08-29 15:55:25 +00:00
Patrick Kelsey
8f410865b8 Mark the send queue ready so ALTQ is available. 2018-08-04 01:45:17 +00:00
Patrick Kelsey
b8ca475604 ALTQ support for iflib.
Reviewed by:	jmallett, mmacy
Differential Revision:	https://reviews.freebsd.org/D16433
2018-07-25 22:46:36 +00:00
Marius Strobl
c9a49a4fd8 Since r336611, n is only used for INET in iflib_parse_header().
Reported by:	rpokala
2018-07-24 23:40:27 +00:00
Marius Strobl
7474544bac Use the maximum of isc_tx_{nsegments,tso_segments_max} for MAX_TX_DESC.
Since r336313, TSO support for LEM-class devices is removed again as it
was before the conversion of {l,}em(4) to iflib(4) in r311849 and as a
result, isc_tx_tso_segments_max is 0 for LEM-class devices now. Thus,
inappropriate watermarks were used for this class.

This is really only a band-aid, though, because so far iflib(9) doesn't
fully take into account that DMA engines can support different maxima
of segments for transfers of TSO and non-TSO packets. For example, the
DESC_RECLAIMABLE macro is based on isc_tx_nsegments while MAX_TX_DESC
used isc_tx_tso_segments_max only. For most in-tree consumers that
doesn't make a difference as the maxima are the same for both kinds of
transfers (that is, apart from the fact that TSO may require up to 2
sentinel descriptors but also not with every MAC supported). However,
isc_tx_nsegments is 8 but isc_tx_tso_segments_max is 85 by default
with ixl(4).
2018-07-22 17:51:11 +00:00
Marius Strobl
8b8d90931d - Given that the controlling expression of the receive loop in iflib_rxeof()
tests for avail > 0, avail can never be 0 within that loop. Thus, move
  decrementing avail and budget_left into the loop and before the code which
  checks for additional descriptors having become available in case all the
  previous ones have been processed but there still is budget left so the
  latter code works as expected. [1]
- In iflib_{busdma_load_mbuf_sg,parse_header}(), remove dead stores to m
  and n respectively. [2, 3]
- In collapse_pkthdr(), ensure that m_next isn't NULL before dereferencing
  it. [4]
- Remove a duplicate assignment of segs in iflib_encap().

Reported by:	Coverity
CID:		1356027 [1], 1356047 [2], 1368205 [3], 1356028 [4]
2018-07-22 17:45:44 +00:00
Stephen Hurd
fe51d4cdfe Add knob to control tx ring abdication.
r323954 changed the mp ring behaviour when 64-bit atomics were
available to abdicate the TX ring rather than having one become a
consumer thereby running to completion on TX. The consumer of the mp
ring was then triggered in the tx task rather than blocking the TX call.
While this significantly lowered the number of RX drops in small-packet
forwarding, it also negatively impacts TX performance.

With this change, the default behaviour is reverted, causing one TX ring
to become a consumer during the enqueue call. A new sysctl,
dev.X.Y.iflib.tx_abdicate is added to control this behaviour.

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16302
2018-07-20 17:45:26 +00:00
Stephen Hurd
dd7fbcf193 Improve netmap TX handling when TX IRQs are not used/supported
Use the timer to poll for TX completions when there are
outstanding TX slots. Track when the last driver timer was called
to prevent overcalling it. Also clean up some kring vs NIC ring
usage.

Reviewed by:	marius, Johannes Lundberg <johalun0@gmail.com>
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D16300
2018-07-20 17:24:45 +00:00
Marius Strobl
7f87c0406d Assorted TSO fixes for em(4)/iflib(9) and dead code removal:
- Ever since the workaround for the silicon bug of TSO4 causing MAC hangs
  was committed in r295133, CSUM_TSO always got disabled unconditionally
  by em(4) on the first invocation of em_init_locked(). However, even with
  that problem fixed, it turned out that for at least e. g. 82579 not all
  necessary TSO workarounds are in place, still causing MAC hangs even at
  Gigabit speed. Thus, for stable/11, TSO usage was deliberately disabled
  in r323292 (r323293 for stable/10) for the EM-class by default, allowing
  users to turn it on if it happens to work with their particular EM MAC
  in a Gigabit-only environment.
  In head, the TSO workaround for speeds other than Gigabit was lost with
  the conversion to iflib(9) in r311849 (possibly along with another one
  or two TSO workarounds). Yet at the same time, for EM-class MACs TSO4
  got enabled by default again, causing device hangs. Therefore, change the
  default for this hardware class back to have TSO4 off, allowing users
  to turn it on manually if it happens to work in their environment as
  we do in stable/{10,11}. An alternative would be to add a whitelist of
  EM-class devices where TSO4 actually is reliable with the workarounds in
  place, but given that the advantage of TSO at Gigabit speed is rather
  limited - especially with the overhead of these workarounds -, that's
  really not worth it. [1]
  This change includes the addition of an isc_capabilities to struct
  if_softc_ctx so iflib(9) can also handle interface capabilities that
  shouldn't be enabled by default which is used to handle the default-off
  capabilities of e1000 as suggested by shurd@ and moving their handling
  from em_setup_interface() to em_if_attach_pre() accordingly.
- Although 82543 support TSO4 in theory, the former lem(4) didn't have
  support for TSO4, presumably because TSO4 is even more broken in the
  LEM-class of MACs than the later EM ones. Still, TSO4 for LEM-class
  devices was enabled as part of the conversion to iflib(9) in r311849,
  causing device hangs. So revert back to the pre-r311849 behavior of
  not supporting TSO4 for LEM-class at all, which includes not creating
  a TSO DMA tag in iflib(9) for devices not having IFCAP_TSO4 set. [2]
- In fact, the FreeBSD TCP stack can handle a TSO size of IP_MAXPACKET
  (65535) rather than FREEBSD_TSO_SIZE_MAX (65518). However, the TSO
  DMA must have a maxsize of the maximum TSO size plus the size of a
  VLAN header for software VLAN tagging. The iflib(9) converted em(4),
  thus, first correctly sets scctx->isc_tx_tso_size_max to EM_TSO_SIZE
  in em_if_attach_pre(), but later on overrides it with IP_MAXPACKET
  in em_setup_interface() (apparently, left-over from pre-iflib(9)
  times). So remove the later and correct iflib(9) to correctly cap
  the maximum TSO size reported to the stack at IP_MAXPACKET. While at
  it, let iflib(9) use if_sethwtsomax*().
  This change includes the addition of isc_tso_max{seg,}size DMA engine
  constraints for the TSO DMA tag to struct if_shared_ctx and letting
  iflib_txsd_alloc() automatically adjust the maxsize of that tag in case
  IFCAP_VLAN_MTU is supported as requested by shurd@.
- Move the if_setifheaderlen(9) call for adjusting the maximum Ethernet
  header length from {ixgbe,ixl,ixlv,ixv,em}_setup_interface() to iflib(9)
  so adjustment is automatically done in case IFCAP_VLAN_MTU is supported.
  As a consequence, this adjustment now is also done in case of bnxt(4)
  which missed it previously.
- Move the reduction of the maximum TSO segment count reported to the
  stack by the number of m_pullup(9) calls (which in the worst case,
  can add another mbuf and, thus, the requirement for another DMA
  segment each) in the transmit path for performance reasons from
  em_setup_interface() to iflib_txsd_alloc() as these pull-ups are now
  done in iflib_parse_header() rather than in the no longer existing
  em_xmit(). Moreover, this optimization applies to all drivers using
  iflib(9) and not just em(4); all in-tree iflib(9) consumers still
  have enough room to handle full size TSO packets. Also, reduce the
  adjustment to the maximum number of m_pullup(9)'s now performed in
  iflib_parse_header().
- Prior to the conversion of em(4)/igb(4)/lem(4) and ixl(4) to iflib(9)
  in r311849 and r335338 respectively, these drivers didn't enable
  IFCAP_VLAN_HWFILTER by default due to VLAN events not being passed
  through by lagg(4). With iflib(9), IFCAP_VLAN_HWFILTER was turned on
  by default but also lagg(4) was fixed in that regard in r203548. So
  just remove the now redundant and defunct IFCAP_VLAN_HWFILTER handling
  in {em,ixl,ixlv}_setup_interface().
- Nuke other redundant IFCAP_* setting in {em,ixl,ixlv}_setup_interface()
  which is (more completely) already done in {em,ixl,ixlv}_if_attach_pre()
  now.
- Remove some redundant/dead setting of scctx->isc_tx_csum_flags in
  em_if_attach_pre().
- Remove some IFCAP_* duplicated either directly or indirectly (e. g.
  via IFCAP_HWCSUM) in {EM,IGB,IXL}_CAPS.
- Don't bother to fiddle with IFCAP_HWSTATS in ixgbe(4)/ixgbev(4) as
  iflib(9) adds that capability unconditionally.
- Remove some unused macros from em(4).
- Bump __FreeBSD_version as some of the above changes require the modules
  of drivers using iflib(9) to be recompiled.

Okayed by:	sbruno@ at 201806 DevSummit Transport Working Group [1]
Reviewed by:	sbruno (earlier version), erj
PR:	219428 (part of; comment #10) [1], 220997 (part of; comment #3) [2]
Differential Revision:	https://reviews.freebsd.org/D15720
2018-07-15 19:04:23 +00:00
Eric Joyner
dfae03b5b5 iflib: Style fixes
MFC after:	1 week
2018-06-18 17:27:43 +00:00