1
0
mirror of https://git.FreeBSD.org/src.git synced 2024-12-20 11:11:24 +00:00

Break the RX processing up into smaller chunks of 128 frames each.

Right now processing a full 512 frame queue takes quite a while (measured
on the order of milliseconds.) Because of this, the TX processing ends up
sometimes preempting the taskqueue:

* userland sends a frame
* it goes in through net80211 and out to ath_start()
* ath_start() will end up either direct dispatching or software queuing a
  frame.

If TX had to wait for RX to finish, it would add quite a few ms of
additional latency to the packet transmission.  This in the past has
caused issues with TCP throughput.

Now, as part of my attempt to bring sanity to the TX/RX paths, the first
step is to make the RX processing happen in smaller 'parts'. That way
when TX is pushed into the ath taskqueue, there won't be so much latency
in the way of things.

The bigger scale change (which will come much later) is to actually
process the frames in the ath_intr taskqueue but process _frames_ in
the ath driver taskqueue.  That would reduce the latency between
processing and requeuing new descriptors. But that'll come later.

The actual work:

* Add ATH_RX_MAX at 128 (static for now);
* break out of the processing loop if npkts reaches ATH_RX_MAX;
* if we processed ATH_RX_MAX or more frames during the processing loop,
  immediately reschedule another RX taskqueue run.  This will handle
  the further frames in the taskqueue.

This should have very minimal impact on the general throughput case,
unless the scheduler is being very very strange or the ath taskqueue
ends up spending a lot of time on non-RX operations (such as TX
completion.)
This commit is contained in:
Adrian Chadd 2012-10-14 20:31:38 +00:00
parent 9b233e2307
commit 516f67965a
Notes: svn2git 2020-12-20 02:59:44 +00:00
svn path=/head/; revision=241558

View File

@ -797,6 +797,8 @@ ath_rx_pkt(struct ath_softc *sc, struct ath_rx_status *rs, HAL_STATUS status,
return (is_good);
}
#define ATH_RX_MAX 128
static void
ath_rx_proc(struct ath_softc *sc, int resched)
{
@ -832,6 +834,15 @@ ath_rx_proc(struct ath_softc *sc, int resched)
sc->sc_stats.ast_rx_noise = nf;
tsf = ath_hal_gettsf64(ah);
do {
/*
* Don't process too many packets at a time; give the
* TX thread time to also run - otherwise the TX
* latency can jump by quite a bit, causing throughput
* degredation.
*/
if (npkts >= ATH_RX_MAX)
break;
bf = TAILQ_FIRST(&sc->sc_rxbuf);
if (sc->sc_rxslink && bf == NULL) { /* NB: shouldn't happen */
if_printf(ifp, "%s: no buffer!\n", __func__);
@ -942,11 +953,22 @@ ath_rx_proc(struct ath_softc *sc, int resched)
}
#undef PA2DESC
/*
* If we hit the maximum number of frames in this round,
* reschedule for another immediate pass. This gives
* the TX and TX completion routines time to run, which
* will reduce latency.
*/
if (npkts >= ATH_RX_MAX)
taskqueue_enqueue(sc->sc_tq, &sc->sc_rxtask);
ATH_PCU_LOCK(sc);
sc->sc_rxproc_cnt--;
ATH_PCU_UNLOCK(sc);
}
#undef ATH_RX_MAX
/*
* Only run the RX proc if it's not already running.
* Since this may get run as part of the reset/flush path,